[OSM-dev] Deleting TIGER node tags
Frederik Ramm
frederik at remote.org
Thu Jul 30 11:01:17 BST 2009
Hi,
Frederik Ramm wrote:
> I just did a little test, prepared an .osc document that removed the
> node tags from about 1000 nodes:
> http://www.openstreetmap.org/browse/changeset/1894387
> It came out at roughly 10 node changes per second.
Some more tests made directly from the dev server suggest that
performance is around 20 changes per second, slightly deteriorating if
you upload too many changes in one diff upload (the peak performance
seems to be at around 1k-2k changes per diff upload). Anything larger
than 10k changes per diff upload is not feasible (you get into a
territory where you have to manually increase default timeouts and all
that), and also takes performance down into the 10-15 changes per second
range PLUS increases the probability of having edit conflicts.
If we wanted to do this cleanup through normal API requests, the best
way thus seems to be dividing the data into roughly 88k batches of 2k
edits each and uploading them as diff uploads; possibly grouping them in
changesets of up to 25 batches each (=50k edits), which would result in
roughly 3500 changesets.
Each diff upload would take about 100 seconds, each changeset would take
about 40 minutes, we'd be doing about 30-35 changesets per day and
finish the thing after about 100 days (some time in November if we start
soon).
An average day in OSM currently has roughly 150k node modifications. For
the 100 days of this operation, this would increase to 1.5m node
modifications (factor 10).
An average daily OSM diff currently has roughly 200 MB uncompressed
(somedays it's 100 MB, some days it's 400 MB). For the 100 days of this
operation, daily diffs would be approximately 150 MB larger, increasing
the strain on downstream systems by roughly 75%.
I have not done any osm2pgsql testing. If it is clever then it will
detect that no geometry change has been effected by the node
modification and the additional cost would mainly result from having to
parse 75% more node updates. If however it automatically re-calculates
the geometry of every way that contains a modified node, then it is
likely that any osm2pgsql based sites running incremental updates would
take anywhere between 2 and 10 times as long to process updates during
the 100 days of this operation.
Everything said here is of course highly speculative and based on the
haphazard assumption that our systems always perform roughly as they did
when I did my tests.
Bye
Frederik
More information about the dev
mailing list