[OSM-talk] [osmosis-dev] Osmosis running forever with completeWays=yes?
brett at bretth.com
Mon Feb 2 04:14:11 GMT 2009
Frederik Ramm wrote:
> (f'up set to osmosis-dev)
> Karl Newman wrote:
>> Anyway, the tee can choke things up with all the temporary files. It would
>> be nice to be able to share the stored node and ways files between tee
>> tasks, but I haven't created that infrastructure yet.
> It would be even better to have an extended --bp task that somehow takes
> a list of disjoint polygons and uses some kind of point location
> algorithm to determine which node belongs to which polygon. The
> rationale being of course that with the classic --bp/--tee approach,
> each node is duplicated n times and tested against each of the polygons
> which is a waste of time, especially with a large input file and many
> polygons (e.g. split up the US into counties or so).
Just to be clear, the --tee task doesn't duplicate nodes. It simply
passes nodes to multiple downstream tasks. The problem with the --bp
task is that when it is used downstream of a --tee task each instance
persists the node information.
This exact problem is why I originally created the customdb tasks which
aimed to create a single random access dataset (with appropriate
indexing) which could be built once then queried many times for many
bounding boxes. When that provided dismal performance I created the
pgsql tasks instead. There is a --dataset-bounding-box task which can
be used to read a bbox from a database.
osmosis --read-pgsql --data-bounding-box left=xx ..... --write-xml
I've been distracted by other things recently so haven't spent much time
on the bounding box implementation for a while. I've been meaning to
load up a full pgsql db to see how it performs for tile cutting.
> Does the task and stream model that osmosis uses theoretically support
> tasks where the number of output streams they create is not fixed, but
> dependent on their parameters? So that e.g. a "bp file=a.poly
> file=b.poly" (or "bp files=a.poly,b.poly") creates two entity streams
> and so on?
Hmm, perhaps this is a better way to do it. I hadn't thought of keeping
a single copy of nodes with references to the polygons they reside in.
If somebody can come up with a faster implementation than pgsql I'll be
ecstatic. I've wasted a lot of time on this one.
Note that the pgsql (and customdb) implementations solve the problem of
ways crossing bounding boxes without having nodes in them which might be
difficult to solve using an alternative solution.
Yes, osmosis can support variable numbers of output streams so long as
they are known at startup time. This is pretty much what the --tee task
More information about the talk