[OSM-talk] Osmosis running forever with completeWays=yes?
brett at bretth.com
Mon Feb 2 04:18:19 GMT 2009
Karl Newman wrote:
> On Sun, Feb 1, 2009 at 1:55 PM, Frederik Ramm <frederik at remote.org
> <mailto:frederik at remote.org>> wrote:
> (f'up set to osmosis-dev)
> Karl Newman wrote:
> Anyway, the tee can choke things up with all the temporary
> files. It would
> be nice to be able to share the stored node and ways files
> between tee
> tasks, but I haven't created that infrastructure yet.
> It would be even better to have an extended --bp task that somehow
> takes a list of disjoint polygons and uses some kind of point
> location algorithm to determine which node belongs to which
> polygon. The rationale being of course that with the classic
> --bp/--tee approach, each node is duplicated n times and tested
> against each of the polygons which is a waste of time, especially
> with a large input file and many polygons (e.g. split up the US
> into counties or so).
> Does the task and stream model that osmosis uses theoretically
> support tasks where the number of output streams they create is
> not fixed, but dependent on their parameters? So that e.g. a "bp
> file=a.poly file=b.poly" (or "bp files=a.poly,b.poly") creates two
> entity streams and so on?
> What you're asking is possible. The number of input and output pipes
> has to be known at invocation because the pipes are connected before
> any tasks are run, but if it's a parameter passed to the task, then
> the task can report to the pipeline manager how many output pipes it
> has. The tricky part might be connecting the downstream tasks. It
> might be confusing because of the stack-based pipeline ordering.
If you want to see how this works, check out the SinkMultiSource
interface which defines a task with a single input and multiple
outputs. It is implemented by the EntityTee class which is the --tee
task. It is integrated into the pipeline by the SinkMultiSourceManager
The SinkMultiSource interface defines a method called getSourceCount
which allow tasks to tell the manager how many pipe outputs they have.
It is called by SinkMultiSourceManager during pipeline startup.
More information about the talk