[OSM-talk] RFC: what are empty nodes and how should we use them?

Lennard ldp at xs4all.nl
Mon Aug 16 11:45:01 BST 2010


**
Adding josm-dev to the list. Please post technical follow-ups there.
**

On 16-8-2010 12:15, Peter Körner wrote:

> The POST /api/0.6/changeset/#id/upload call is atomic in a transaction.
> Why not split your upload into multiple OSC Parts and post thems via
> this call. That way no imcomplete data would be visible to other users
> at any time.

Uploading 30k+ objects in a single chunk with JOSM(1) is just too 
unreliable to make that workable. So either we have to split the data in 
smaller chunks by hand, or use JOSM's native chunked upload mode. If you 
have 40k nodes and 5k ways, and upload in 5k chunks, you will upload 8 
chunks with nodes, and 1 chunk with ways. Each chunk is atomic, and 
that's where atomicity ends, as far as the API is involved.

JOSM makes no attempt to sort the data in a smart way, to keep all nodes 
and associated ways and relations close together, in the same chunk when 
possible. I asked about such a feature before(2), but nothing has come 
of it as of yet.

If such a sorting feature is added to JOSM, the chunk size should be a 
soft size, able to vary slightly if that means related objects end up in 
the same chunk. May I point out smarter-sort.py(3)(4) as an example?

Sorted uploads would mostly prevent these 'fields of empty nodes' that 
appear to other mappers during a chunked upload, limiting the 
opportunity they have to wreak havoc on an ongoing upload by 'helpfully' 
deleting the nodes.

JOSM's chunked upload mode is an answer to API timeout issues, but it 
does have its own issues to keep in mind.


(1) It's not exactly more reliable with dedicated bulk upload scripts 
either. If the API takes too long to check the uploaded osmChange for 
validity, the TCP session appears to timeout. The script/JOSM never 
receives the OK from the API, including the new object IDs. The next 
time you hit upload to resume, it will reupload that failed chunk in its 
entirety, leading to (in my example) 5k duplicate objects on the server.
(2) http://josm.openstreetmap.de/ticket/4299
(3) http://wiki.openstreetmap.org/wiki/Upload.py
(4) 
http://svn.openstreetmap.org/applications/utils/import/bulkupload/smarter-sort.py

-- 
Lennard




More information about the talk mailing list