[OSM-dev] Deleting TIGER node tags

Apollinaris Schoell aschoell at gmail.com
Thu Jul 30 17:24:54 BST 2009


Hi Frederik,

when a way is deleted in JOSM the nodes with tags remain in the db.
  this causes tons of orphans.
is it possible to delete these nodes in the same cleanup.
after deleting the tiger tags it's difficult to identify these useless  
nodes without a full history lookup.

check should be
- no other tags remain on node
- no way or relation uses this node

apo

On Jul 30, 2009, at 3:01 AM, Frederik Ramm wrote:

> Hi,
>
> Frederik Ramm wrote:
>> I just did a little test, prepared an .osc document that removed the
>> node tags from about 1000 nodes:
>> http://www.openstreetmap.org/browse/changeset/1894387
>> It came out at roughly 10 node changes per second.
>
> Some more tests made directly from the dev server suggest that
> performance is around 20 changes per second, slightly deteriorating if
> you upload too many changes in one diff upload (the peak performance
> seems to be at around 1k-2k changes per diff upload). Anything larger
> than 10k changes per diff upload is not feasible (you get into a
> territory where you have to manually increase default timeouts and all
> that), and also takes performance down into the 10-15 changes per  
> second
> range PLUS increases the probability of having edit conflicts.
>
> If we wanted to do this cleanup through normal API requests, the best
> way  thus seems to be dividing the data into roughly 88k batches of 2k
> edits each and uploading them as diff uploads; possibly grouping  
> them in
> changesets of up to 25 batches each (=50k edits), which would result  
> in
> roughly 3500 changesets.
>
> Each diff upload would take about 100 seconds, each changeset would  
> take
> about 40 minutes, we'd be doing about 30-35 changesets per day and
> finish the thing after about 100 days (some time in November if we  
> start
> soon).
>
> An average day in OSM currently has roughly 150k node modifications.  
> For
> the 100 days of this operation, this would increase to 1.5m node
> modifications (factor 10).
>
> An average daily OSM diff currently has roughly 200 MB uncompressed
> (somedays it's 100 MB, some days it's 400 MB). For the 100 days of  
> this
> operation, daily diffs would be approximately 150 MB larger,  
> increasing
> the strain on downstream systems by roughly 75%.
>
> I have not done any osm2pgsql testing. If it is clever then it will
> detect that no geometry change has been effected by the node
> modification and the additional cost would mainly result from having  
> to
> parse 75% more node updates. If however it automatically re-calculates
> the geometry of every way that contains a modified node, then it is
> likely that any osm2pgsql based sites running incremental updates  
> would
> take anywhere between 2 and 10 times as long to process updates during
> the 100 days of this operation.
>
> Everything said here is of course highly speculative and based on the
> haphazard assumption that our systems always perform roughly as they  
> did
> when I did my tests.
>
> Bye
> Frederik
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/dev





More information about the dev mailing list