[OSM-talk] Some measure to prevent duplicate uploads of same data ...

MP singularita at gmail.com
Fri Mar 5 23:16:00 GMT 2010


While API 0.6 have implemented object versioning, preventing
accidentally overwriting someone else's changes, with introduction of
atomic uploads now I see many problems with duplicate data.

These come often with imports of data or generally if someone uploads
any new data without modifying any existing data (like if someone just
traces hundreds of buildings from ortophoto, or alike ....)

Since in JOSM (and possibly in other tools) the atomic upload is the
default method, that user presses some "upload" button and in few
seconds all the changes are uploaded to the server, which then starts
processing it (this could take some time for larger changes) and once
it is finished, it will send new node ID's back to the editor.

Unfortunately, sometimes while waiting for server to process the
uploaded data, the connection will timeout, so the user sees some
error message  - thinking the upload failed, he presses "upload"
again, starting to push new copy of all the objects to the server.
Later, the server want to return ID's from first upload, but nobody is
listening on the orher end anymore.

Ultimate result is sometimes having 2 to 4 identical copies of some
data, sometimes it is thousands of duplicate nodes and ways.

Suggestion for one possible countermeasure:
 after server receives complete succesful atomic upload from user,
compute SHA1, MD5, or some other checksum of the uploaded XML. Store
it and if user tries uploading exactly the same thing again (because
he thinks the upload have failed, which is not true), send him just
some error message instead, like: "You have already uploaded this
data".

Or alternatively, send the user whatever result was there from the
last upload (either new set of ID's, or some error message in case
that previous upload failed because of some error)

I think perhaps last 2 or 3 checksums could be stored in case someone
have multiple parallel uploads in multiple editors.

Martin




More information about the talk mailing list