[OSM-talk] [osmosis-dev] Osmosis running forever with completeWays=yes?

Brett Henderson brett at bretth.com
Mon Feb 2 04:14:11 GMT 2009


Frederik Ramm wrote:
> Hi,
>
>     (f'up set to osmosis-dev)
>
> Karl Newman wrote:
>   
>> Anyway, the tee can choke things up with all the temporary files. It would
>> be nice to be able to share the stored node and ways files between tee
>> tasks, but I haven't created that infrastructure yet.
>>     
>
> It would be even better to have an extended --bp task that somehow takes 
> a list of disjoint polygons and uses some kind of point location 
> algorithm to determine which node belongs to which polygon. The 
> rationale being of course that with the classic --bp/--tee approach, 
> each node is duplicated n times and tested against each of the polygons 
> which is a waste of time, especially with a large input file and many 
> polygons (e.g. split up the US into counties or so).
>   
Just to be clear, the --tee task doesn't duplicate nodes.  It simply 
passes nodes to multiple downstream tasks.  The problem with the --bp 
task is that when it is used downstream of a --tee task each instance 
persists the node information.

This exact problem is why I originally created the customdb tasks which 
aimed to create a single random access dataset (with appropriate 
indexing) which could be built once then queried many times for many 
bounding boxes.  When that provided dismal performance I created the 
pgsql tasks instead.  There is a --dataset-bounding-box task which can 
be used to read a bbox from a database.

osmosis --read-pgsql --data-bounding-box left=xx ..... --write-xml 
myextract.osm

I've been distracted by other things recently so haven't spent much time 
on the bounding box implementation for a while.  I've been meaning to 
load up a full pgsql db to see how it performs for tile cutting.
> Does the task and stream model that osmosis uses theoretically support 
> tasks where the number of output streams they create is not fixed, but 
> dependent on their parameters? So that e.g. a "bp file=a.poly 
> file=b.poly" (or "bp files=a.poly,b.poly") creates two entity streams 
> and so on?
>   
Hmm, perhaps this is a better way to do it.  I hadn't thought of keeping 
a single copy of nodes with references to the polygons they reside in.  
If somebody can come up with a faster implementation than pgsql I'll be 
ecstatic.  I've wasted a lot of time on this one.

Note that the pgsql (and customdb) implementations solve the problem of 
ways crossing bounding boxes without having nodes in them which might be 
difficult to solve using an alternative solution.

Yes, osmosis can support variable numbers of output streams so long as 
they are known at startup time.  This is pretty much what the --tee task 
does.

Brett





More information about the talk mailing list