[OSM-dev] Why are so many changeset so large?

Alex Barth alex at mapbox.com
Wed Oct 17 17:20:49 BST 2012


Jochen Topf wrote:

> I think one reason people add bad changeset comments and organize their
> changesets in a bad way is that for most people those changesets and the
> comments just disappear into a black hole.

Agreed. This is why I'm curious to find out how we can changesets more useful on osm.org, right now the history tab doesn't quite cut the mustard, one issue is the large changesets covering your bounding box even without any data of such a changeset directly affecting it.

Matt Amos wrote:

> from this, we get a single changeset/#id/upload
> call which applies atomically.

Is that so? I thought changesets were not applied atomically leading to issues where it is hard to find out what data got applied when a connection breaks down or an editor crashes.

Andy Allan wrote:

> However, if you want to know which changesets affect a given
> area, this reverse question is much less easily answered. Hence OWL,
> etc.

Pavel Paprota wrote:

> This has the desired effect you write about: that is, with a changeset that contains changes in Sydney and in Canada, you will only get it in the query result for those two places, not for anywhere in the world like it is right now in the History tab.
> 
> I am bit concerned about scalability of this, Matt clearly stated in one of the earlier discussions that dumping every changeset to one table won't scale.

Right on, most of the time only changesets whose changes _actually_ affect a bbox are interesting. I'm curious what a fast and scalable solution for this looks like. It seems that OWL and Activity Streams have the exact same problem here...

BTW, I did some cursory digging in the changesets dump and found that actually only a relatively small percentage of changesets are geographically large. Trying to use the history tab they seem to be more numerous. I don't have numbers yet, but I hope I can share some soon.

On Oct 17, 2012, at 9:46 AM, Paweł Paprota <ppawel at fastmail.fm> wrote:

> On 10/17/2012 03:30 PM, Andy Allan wrote:
>> 
>> Basically, I see no need to worry about the extent of bounding boxes,
>> and no need to move to having bboxes on uploads instead of changesets
>> or other complications. No matter what we do, if your interest in a
>> changeset extends beyond the details of its extent, you need a
>> mechanism (again, e.g. OWL) to detail the actual locations of the
>> edits to the entities, and different interests (and different
>> entities) will have even have different buffers of interest around
>> them. Lets focus on things like that.
>> 
> 
> Exactly. What I do right now with the Activity Server is I store the whole geometry of a changeset. When a bounding box query comes, I use ST_Intersects between the bbox and geometries. This has the desired effect you write about: that is, with a changeset that contains changes in Sydney and in Canada, you will only get it in the query result for those two places, not for anywhere in the world like it is right now in the History tab.
> 
> I am bit concerned about scalability of this, Matt clearly stated in one of the earlier discussions that dumping every changeset to one table won't scale.
> 
> I'm now looking to dig into OWL's code and see how my work relates to it - I think it potentially could make sense to somehow bring the two projects together or at least integrate them at some level (OWL publishing activities to the Activity Server?).
> 
> Paweł
> 
> 
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/dev

Alex Barth
http://twitter.com/lxbarth
tel (+1) 202 250 3633







More information about the dev mailing list