[OSM-dev] Why are so many changeset so large?
Matt Amos
zerebubuth at gmail.com
Wed Oct 17 13:53:23 BST 2012
On Wed, 2012-10-17 at 00:28 +0100, Tom Hughes wrote:
> On 17/10/12 00:04, Alex Barth wrote:
> > - Are there technical reasons why changesets should tend to be
> > large? Are they expensive on some level?
>
> I believe it's entirely because we've got so many people doing
> mechanical or semi-mechanical edits.
>
> That includes bots but also things like people using xapi or overpass to
> download all objects matching some set of tags, then change those tags
> and reupload.
the historical answer to this is that when changesets were added to the
OSM API there were two different intentions for their use which got
conflated: first, that changesets were structures for grouping edits
sharing common attributes. and second, that changesets were VCS-style
'commits' which would be uploaded in a single request and applied
atomically.
effectively, the first use-case was for users, and tried to make
changesets as open-ended as possible. from this, we get tags on
changesets for comments, editor, bot-ness, etc... and the ability to
keep uploading into an open changeset.
the second use-case was a technical thing - the sheer number of API
calls to individual elements, even from normal-sized editing sessions,
could cause problems. and, for small calls, HTTP headers and round-trip
latencies would dominate the cost of an upload. further, editors had to
cope with the situation where an upload failed half-way through and to
re-try the failed calls. from this, we get a single changeset/#id/upload
call which applies atomically.
at the time, this seemed like a good way to satisfy both use-cases. and,
while it does what it set out to, i think we should consider splitting
these in the next API version; explicitly reifying uploads at which
bboxes / coverage sets and change counts can be stored. changesets can
then simply be collections of uploads.
getting to the point: this might to some extent mitigate the "large
changesets" issue, as it would allow bboxes to be collected at a smaller
granularity. however, it wouldn't be a full solution and we'd probably
still need something like OWL to break down the geographic footprint of
changesets further.
cheers,
matt
More information about the dev
mailing list