[Rebuild] Idea for ODbL transition strategy

errt at gmx.de errt at gmx.de
Wed Feb 1 14:00:40 GMT 2012


Am 1.2.2012 14:42, schrieb Frederik Ramm:
>> I'll reexamine my proposed approach in the same style for better
>> comparison, but not in that detail for now:
>
> I'm not completely against that approach, as I said before, having a 
> "pure" database in the end is a nice thing. But I think that the side 
> effect of changing history without any accounting is not acceptable.
There's no difference to your approach if you want to conserve most of 
the history. If you want to keep the history as it is, you will have to 
flag lots of historic versions invisible because they contain data 
derived from tainted data.  This would keep the history intact, but 
still falsify what is delivered as history view: It will show that 
version Z followed version X with the following changes, but without the 
tainted version Y that had been there before. I think the effects of 
falsified history in my approach aren't that great, as what they change 
is just the interpretation of a historic version from 'on date X, user Y 
uploaded this object with properties Z' to 'on date X, user Y would have 
uploaded this object with properties Z if there had been no versions in 
between' or 'on date X, user Y uploaded this object changes to 
properties Z'. This is something that would be nice for the future if 
the data structures can be adjusted for this working: Having changesets 
not uploading new versions, but just changes from the previous version 
and show them in history view only. But that's a different topic for the 
future.
>
> The history database says clearly "on date X, user Y uploaded this 
> object with properties Z". Simply changing Z ex-post is falsifying 
> history. That's just wrong. We can choose to hide Z, but we must not 
> change Z to be something else and then claim the user uploaded that.
>
>> This approach would probably quite a lot free database space and
>> renumbering objects is possible, but not needed.
>
> I don't have a proper simulation either but let's assume that we lose 
> 5% of data in the relicensing process, then that would bring us down 
> from 2.4 TB to 2.15 TB; in about 4 months we's be back at 2.4 TB 
> (http://munin.openstreetmap.org/openstreetmap/smaug.openstreetmap/postgres_size_openstreetmap.html) 
> so the saving is probably not that big.
Not saying it would be reducing the database size by half, still, 4 
months more before new hardware is needed is 4 months more. Also, that's 
not the main benefit of this approach, it's just a nice side effect. The 
main benefit clearly is the clean database and the conserved data 
structures.
>
>> P.S.: I still think it's a major question whether we want to keep but
>> hide non-ODbL versions or whether to drop them completely for the sake
>> of a smaller, clean database and no needs for any patches for holes or
>> invisible versions. We should probably have a voting on that fundamental
>> decision, at the best a community vote.
>
> I don't think this is a good idea.
Well, someone will have to make a decision. In a community driven 
project, this should be the community if possible.

Dominik



More information about the Rebuild mailing list