[OSM-dev] Preferred Method of Bulk Upload?

Matt Amos zerebubuth at gmail.com
Sat May 9 01:24:08 BST 2009


On Sat, May 9, 2009 at 12:25 AM, Ian Dees <ian.dees at gmail.com> wrote:
> What is the preferred method of bulk upload nowadays? By bulk, I mean
> millions thousands of features with millions of nodes.

probably best to use the diff upload feature. i think ivan has a script for it.

more GNIS stuff?

> Assuming the preferred pseudocode for upload looks like this:
>
> open changeset
>   while more uploads:
>     upload a diff file
> close changeset
>
> ... how many changes should we put inside of each diff file? How many
> uploads should we make in one changeset?

the present limit on changeset size is 50,000 edits (i.e: nodes, ways
and relations) - so you'll probably need to use more than one
changeset, complicating the code a little. you can upload a single
diff with all 50,000 changes in it, but that would be huge. its
probably better to split it into a number of smaller diffs.

> I assume that the capabilities API call will tell me some of this, but I'm
> not entirely sure which piece of the capabilities call lines up with which
> piece in my example.

the only relevant bits are:
<waynodes maximum="2000"/>
<changesets maximum_elements="50000"/>

waynodes maximum is the maximum number of nodes per way - any more and
you'll need to split the way.
changesets maximum is, well, the maximum number of edits in a changeset.

> Is the data written to the database after each diff upload or is it stored
> in memory, then written out at the close of a changeset?

its written atomically at each diff upload, i.e: each diff upload
either succeeds or fails entirely.

cheers,

matt




More information about the dev mailing list