[OSM-dev] Patch for Osmosis--complete ways and relations

Brett Henderson brett at bretth.com
Tue Oct 23 11:03:16 BST 2007


I've applied the patch and committed to svn.  I haven't tested it but 
I'll take your word that it works :-)  I might revisit it over the next 
few days with a view to making it more configurable but what you've 
provided appears to be a significant improvement over the version I 
had.  The only issue is that the code is getting hard to follow so I'm 
wondering if there are ways of making this task more modular, but I'll 
sit on it for a few days and see if anything comes to me.

A couple of things you may be interested in:
1. I've introduced custom serialisation to improve temp file performance.
2. I've written a new store called IndexedObjectStore for random access 
to temporary data.

I'm doing some performance testing now, but you should notice some 
fairly major performance improvements to all temporary file handling.  
It should improve performance by approx 5 times, SimpleObjectStore will 
benefit from that.

The new IndexedObjectStore allows you to write objects and randomly 
retrieve them based on their identifier.  It has two caveats:
1. Objects must be written in sorted order (FileBasedSort can be used 
first if this is an issue).
2. All ids must be unique.  This means that for OSM data a separate 
store instance will be required for nodes, ways and relations.

IndexedObjectStore may provide a mechanism for optimising area filtering 
tasks but I haven't gotten that far yet.  There may be ways of 
optimising it by using memory mapped file io or similar, for now it 
performs a number of file seeks on two index files and the data file to 
find the right location using a binary search algorithm.

Karl Newman wrote:
> Brett,
>
> I have modified the Osmosis area filter task to optionally include all
> available nodes which are part of a way which has at least one node in
> the filtered area (this is similar to how the API returns data in a
> bounding box). It does this by streaming all nodes, then all ways,
> then all relations into temporary files (using SimpleObjectStore),
> then reading them back and passing on the appropriate items to the
> sink. To enable this mode, just add the parameter "completeWays=yes"
> to the --bounding-box or --bounding-polygon task.
>
>   





More information about the dev mailing list