[Rebuild] Do I win a prize if I am the first to post?
errt at gmx.de
errt at gmx.de
Wed Jan 11 20:06:19 GMT 2012
So, I'll give this another try, hopefully this goes to the whole list now.
Hi everyone,
thanks Frederik for that initial impulse.
First of all, I'm not that deeply into all that technical points, so
whatever I say might be total nonsense, I'll just write up what I think :p
That's quite a list of possible ways to handle objects edited by both,
agreers and decliners, all with their benefits and problems, I numbered
them in the quote below for easier reference. My own idea would be
somewhere in between some of your listed ones, much like what your
method [2] is about:
Let's start with a clean database and then go through all changesets,
just applying those done by agreers. If something isn't there to be
changed (e.g. a changeset removed a tag, but that tag hasn't been added
to the new database, as it was entered by a decliner), just ignore that
change.
That's probably not technically trivial, especially e.g. in cases where
nodes are added to a way, but the way isn't in the same structure as it
was when the real changset happened. Changing tags isn't easy either, as
small changes should probably not be made (e.g. a decliner entered a
street name and someone just addent an accent or something like that),
but biggers ones should be (e.g. a decliner added a street name and
someone exchanged it completely). But if we can figure out, how to solve
such cases, I think this method would have some clear benefits.
First of all, it would leave us with an unbroken history. No needs to
change any consumers of the data (as in [1] and [6], possibly also in
[2]), as the history appears to be continuous (and is, somewhat). The
API also doesn't need any changes, as the database won't have any holes
or objects flagged to be invisible. Of course, this would falsify
history as changesets that happened aren't included anymore and later
changesets will change to something they not really have been. But I
think we should do that step. Some decliners probably wouldn't want to
appear in the history anymore after the changeover and some might even
reconsider their decision if not only the objects they created and the
data they brought into the project ist deleted, but also their name will
be removed from the history (note this
should not be the reason for going this way, but this sideeffect might
be considered positive).
Secondly, the history would also be clean, in that no non-relicensed
data will be in there. Nobody will be tempted to recover data from the
history that's not clean (of course, it could be recovered from the last
CC planet dump, but that's a lot more difficult), especially not new
mappers in the future that might not know about the licensechange
exactly and just discover some information in the history and recover
it. If done this way, the missing versions in the history probably won't
do much harm, as no information from them lives on, but if we sort of
merge all versions of an object into one or two versions (as in [4] and
[5]), lots of information about the data, the changeset comments, time
and creators, the source information and more ist lost and this could
lead to problems if anything has to be traced back.
So for your example, the result would look like that:
Version 1 of way created by woodpeck
Version 2: woodpeck adds "oneway=yes" (and just "oneway=yes" no
streetname in this version, anywhere)
No data by a decliner lives on and the history is continuous and clear.
That was an easy one, I know, other changes will be much more difficult,
but perhaps we can find ways to deal with at least most of the problems.
Another problem is that it's not easy to recover data if a decliner
agrees past the changedate, but I'm not sure we even want that option
(not the possibilty to agree past the changedate, just the ability to
recover their previously dropped data).
As I said, I'm not that deeply technically involved and all this might
be plain bullshit, but still I think this might be the best option for a
clean history, no changes to any programs consuming the data, as much
data kept as possible and the possibility for a fine granularity of
exceptions to keep even more data if there are no legal problems, even
if it's the most technically challenging method.
And now some final thougths on your list, just to classify:
There are in fact two possible ways for the change (or does one see
more?), either dropping decliners changesets and object versions or
flagging them so they won't be delivered any more, but leaving all
others in their place, so information is retained but programs will need
to account for the history holes or really rebuilding the database so
version numbers change and information is effectively lost, not just
hidden, but with the history being continuous and no changes needed for
the consumers.
So if there are just these two models (or a small number of them),
perhaps we should first decide on the general way to go and do the
details of 'what will happen to this tag or that node in this specific
example' later. As stated above, I currently favor the latter option of
a real rebuild, but let's see what the discussion will lead to.
As for your mentioning of things like densifying the id space, I think
this could be a real option especially in case of a real rebuild (as
defined above), as the references at least to object versions will be
broken already, so we can just go forward and break the references to
the objects too, this wouldn't do that much more harm. Or does anyone
know of external databases having references to our objects that would
be broken but should not be? This would have to be done in a second
process step after the actual rebuild, though, I think, as the
references to old objects that will be dropped have to be removed
before. Other changes like a world-wide deletion of created_by tags or
similar could also be done without too much more effort, we could fix
common typos or anything like that in a world-wide scale if we already
have to touch every object.
Well, just my two cents, and probably enough for my first post on this
list, too,
Regards,
Dominik
Am 11.1.2012 01:05, schrieb Frederik Ramm:
> [...]
> [1]One could think "let's just keep those versions done by agreers,
> and drop those by decliners, and let's make a new version of all
> objects that contains only the content not added by decliners."
>
> This would lead to a situation where some versions are missing. Parts
> of our Rails code might have to be hardened against that - it is
> possible that somewhere we have code that just counts versions from 1
> to n. Also it is possible that client software out in the wild has
> such problems, and if we decide to go this way it would be good to
> offer something like relicensing.dev.openstreetmap.org with such a
> "database with holes" so that clients can be tested against that.
>
> Then there is the issue that data by decliners might affect more than
> the current version, e.g.
> [example]
>
> We would now delete version 2 from our database, so only 1,3,4 are
> kept. But what happens to the "name=Blah Road" tag that is still
> present in version 3?
>
> [2]We can either remove that tag from version 3, thereby falsifying
> history (making it look like the tag was never there) - probably a bad
> idea.
>
> [3]Or we can remove all versions that contain any information
> contributed by non-agreers, which might be a lot, and we would lose a
> lot of history along the way.
>
> [4]Another option is dropping the whole history for everything now,
> and start with a clean database where version 1 (or version n) is the
> current version and no other versions exist. (We could keep a
> read-only version of the last CC-BY-SA database with full Rails port
> functions on a simple server somehow, doesn't matter if it's slow -
> just so that people can still access history if they want, but that
> would all be under CC-BY-SA.)
>
> [5]Or we could opt for a limited keeping of history whereby every
> object with more than one historic version is reduced to having
> exactly two versions - v1 is the very first, and v2 is the current
> one, and everything in between is removed.
> [...]
>
> [6]My final idea is a slightly outlandish variant of the above but
> even easier: Simply make the new API return *no* pre-changeover
> versions at all, and keep all the pre-changeover versions in a special
> CC-BY-SA-only API.
> [...]
More information about the Rebuild
mailing list