[OSM-dev] Why are so many changeset so large?
Jochen Topf
jochen at remote.org
Wed Oct 17 08:15:17 BST 2012
On Tue, Oct 16, 2012 at 07:04:37PM -0400, Alex Barth wrote:
> I understand that there will always be cases where a large changeset makes sense (e. g. bot changes), but it seems that we have many unnecessarily large changesets that make changesets a not very useful granularity for looking at data history.
Changesets come in all shapes and sizes because edits come in all shapes and
sizes. When I am editing these days a typical changeset is often rather small,
just a few POIs added or so. But I also have huge changesets when I go to the
OSM Inspector and fix some problem worldwide, for instance broken coastlines.
I do break down the work into smaller changesets usually, but sometimes they
can get rather large and can affect widely separated places in the world. And
I think this does make sense, because they - in a way - all belong together
because I fix lots of small things here and there all turned up by one OSM
Inspector view. Thats a typical pattern for me these days, but other people
have other patterns for other kinds of work. I think to a large extent this
is just the way things are and you can't do much about it. Look at commits in
source code, it is very similar there. Many small commmits for small changes
but also large commits over huge parts of the code when you are refactoring.
One problem with changesets in OSM compared to source code commits is that you
can review source code changes easily before commiting ("git diff" ...) while
this is not practically possible with our current editors. When writing code I
tend to work on a specific issue but also find other things to fix on the way,
such as typos etc. Once I am done, I'll review my changes and commit the typos
or whatever in a different commit than the main thing I was working on. The
same kind of thing happens in OSM all the time. I want to add something but
then get a validator message and fix that, too, or whatever. But I can't
usefully separate these things any more so they all end up in the same
changeset with a comment thats not as good as it could be.
I think one reason people add bad changeset comments and organize their
changesets in a bad way is that for most people those changesets and the
comments just disappear into a black hole. They never see the changesets or the
comments again. It is only a tiny minority of people who dig through old
changesets, for instance to revert bad edits. So why would you care to improve
changesets, if you don't see what they are used for? I think once we have
better tools to work with changesets, people will see how they are used and
why good comments can be useful, they will learn to organize their changesets
better.
One thing I have wished for several times for instance is a way to look at
changesets and their changes in the editor. I come across a problem in the data
in JOSM, I want to see what triggered the change that lead to the problem and
what other places might be affected. It would be nice if I can click on an
object and JOSM highlights all other objects affected by the same changeset and
show me who made this change when and with what comment. I can get this
information today of course, but it is rather cumbersome to go to the special
web page etc.
The more tools we have like this, the more people will actually work with
changesets, the better they will become. That being said, OSM ist just too big
and messy and there will always be changesets falling outside any rules we make
up. So every tool working with those changesets has to be very robust and work
with changesets that are problematic. I could, for instance, imagine that a
tool that shows the area affected by a changeset normally shows the bounding
box but if that gets larger than some threshold it tries to identify the
different areas where edits actually have been made and shows them. We'll
certainly have to do a lot of experimentation to hash all of this out.
Jochen
--
Jochen Topf jochen at remote.org http://www.remote.org/jochen/ +49-721-388298
More information about the dev
mailing list