[OSM-dev] Deleting TIGER node tags

Frederik Ramm frederik at remote.org
Thu Jul 30 11:01:17 BST 2009


Hi,

Frederik Ramm wrote:
> I just did a little test, prepared an .osc document that removed the 
> node tags from about 1000 nodes:
> http://www.openstreetmap.org/browse/changeset/1894387
> It came out at roughly 10 node changes per second. 

Some more tests made directly from the dev server suggest that 
performance is around 20 changes per second, slightly deteriorating if 
you upload too many changes in one diff upload (the peak performance 
seems to be at around 1k-2k changes per diff upload). Anything larger 
than 10k changes per diff upload is not feasible (you get into a 
territory where you have to manually increase default timeouts and all 
that), and also takes performance down into the 10-15 changes per second 
range PLUS increases the probability of having edit conflicts.

If we wanted to do this cleanup through normal API requests, the best 
way  thus seems to be dividing the data into roughly 88k batches of 2k 
edits each and uploading them as diff uploads; possibly grouping them in 
changesets of up to 25 batches each (=50k edits), which would result in 
roughly 3500 changesets.

Each diff upload would take about 100 seconds, each changeset would take 
about 40 minutes, we'd be doing about 30-35 changesets per day and 
finish the thing after about 100 days (some time in November if we start 
soon).

An average day in OSM currently has roughly 150k node modifications. For 
the 100 days of this operation, this would increase to 1.5m node 
modifications (factor 10).

An average daily OSM diff currently has roughly 200 MB uncompressed 
(somedays it's 100 MB, some days it's 400 MB). For the 100 days of this 
operation, daily diffs would be approximately 150 MB larger, increasing 
the strain on downstream systems by roughly 75%.

I have not done any osm2pgsql testing. If it is clever then it will 
detect that no geometry change has been effected by the node 
modification and the additional cost would mainly result from having to 
parse 75% more node updates. If however it automatically re-calculates 
the geometry of every way that contains a modified node, then it is 
likely that any osm2pgsql based sites running incremental updates would 
take anywhere between 2 and 10 times as long to process updates during 
the 100 days of this operation.

Everything said here is of course highly speculative and based on the 
haphazard assumption that our systems always perform roughly as they did 
when I did my tests.

Bye
Frederik




More information about the dev mailing list