[OSM-dev] Osmosis: Bounding polygon does not support change data as input?

Brett Henderson brett at bretth.com
Thu Sep 18 13:33:30 BST 2008


Andreas Kalsch wrote:
> This is a clear fact, of course. But Osmosis could retrive the nodes 
> from the database instead and try to find it out, like you described 
> below.
>>
>> All is not lost however.
>> The simple answer is just to import the diff for the entire world.  
>> It's approximately 10MB of compressed data per day so the database 
>> will only grow steadily.
>> The slightly more complicated answer is to import the entire world 
>> diffs but then run an additional query to delete data not inside the 
>> bounding box.  This would only have to be run occasionally.
> OK, so a solution could be:
> - load planet diff
> - apply it completely
> - make copy of current database with only points within the polygon 
> and delete the old database - every n days
> - after that triggering changes to my application database which uses 
> just a subset of the whole data, so that I use no data which is 
> outlide the polygon
>
> This is a compromise that would work for me.
I didn't understand your fourth point but otherwise that sounds okay.  I 
would have thought it would be quicker to delete the data outside the 
polygon than to copy from and delete the old database though ...
>>
>> A more complicated solution (involving some coding) could be as follows:
>> Before applying a changeset, re-order the changeset to preserve 
>> referential integrity.  Osmosis can do this already with the --sort 
>> task.
>> Only write nodes that are inside the bounding box and check to see if 
>> each node already exists in the database.
>> For every way write, check the nodes that should by this point 
>> already exist in the database to see if the way is inside the 
>> bounding box.
>> For every relation write, check to see if any members already exist 
>> in the database.
> I am sure this would take very much time. Sorting a file is surely not 
> a very quick task ;)
Osmosis uses a file-based merge sort for all sorting operations.  It 
would take a huge amount of time for a file the size of a planet but 
sorting small changesets is fast.  Each daily changeset is approx 10MB 
compressed.  I just tested it and it took approximately 30 seconds to 
sort on the dev server so plenty fast enough.
But thinking about it a bit more it's probably not necessary to sort 
anyway.  Sorting is necessary if you wish to maintain referential 
integrity but in this case we just need all new nodes to be written to 
the database before the ways.  It will result in incomplete ways at the 
boundary of the box but that is typical for most bounding tasks anyway.

Anyway, I'm sure this will come up more and more.  I'll take a look at 
it one of these days but if anybody urgently requires it they might have 
to code it themselves.

Brett




More information about the dev mailing list