[Rebuild] Idea for ODbL transition strategy
errt at gmx.de
errt at gmx.de
Wed Feb 1 14:00:40 GMT 2012
Am 1.2.2012 14:42, schrieb Frederik Ramm:
>> I'll reexamine my proposed approach in the same style for better
>> comparison, but not in that detail for now:
> I'm not completely against that approach, as I said before, having a
> "pure" database in the end is a nice thing. But I think that the side
> effect of changing history without any accounting is not acceptable.
There's no difference to your approach if you want to conserve most of
the history. If you want to keep the history as it is, you will have to
flag lots of historic versions invisible because they contain data
derived from tainted data. This would keep the history intact, but
still falsify what is delivered as history view: It will show that
version Z followed version X with the following changes, but without the
tainted version Y that had been there before. I think the effects of
falsified history in my approach aren't that great, as what they change
is just the interpretation of a historic version from 'on date X, user Y
uploaded this object with properties Z' to 'on date X, user Y would have
uploaded this object with properties Z if there had been no versions in
between' or 'on date X, user Y uploaded this object changes to
properties Z'. This is something that would be nice for the future if
the data structures can be adjusted for this working: Having changesets
not uploading new versions, but just changes from the previous version
and show them in history view only. But that's a different topic for the
> The history database says clearly "on date X, user Y uploaded this
> object with properties Z". Simply changing Z ex-post is falsifying
> history. That's just wrong. We can choose to hide Z, but we must not
> change Z to be something else and then claim the user uploaded that.
>> This approach would probably quite a lot free database space and
>> renumbering objects is possible, but not needed.
> I don't have a proper simulation either but let's assume that we lose
> 5% of data in the relicensing process, then that would bring us down
> from 2.4 TB to 2.15 TB; in about 4 months we's be back at 2.4 TB
> so the saving is probably not that big.
Not saying it would be reducing the database size by half, still, 4
months more before new hardware is needed is 4 months more. Also, that's
not the main benefit of this approach, it's just a nice side effect. The
main benefit clearly is the clean database and the conserved data
>> P.S.: I still think it's a major question whether we want to keep but
>> hide non-ODbL versions or whether to drop them completely for the sake
>> of a smaller, clean database and no needs for any patches for holes or
>> invisible versions. We should probably have a voting on that fundamental
>> decision, at the best a community vote.
> I don't think this is a good idea.
Well, someone will have to make a decision. In a community driven
project, this should be the community if possible.
More information about the Rebuild