[Rebuild] Do I win a prize if I am the first to post?
errt at gmx.de
errt at gmx.de
Sat Jan 14 17:49:13 GMT 2012
Am 13.1.2012 23:38, schrieb Frederik Ramm:
> Hi,
>
> On 01/11/2012 09:06 PM, errt at gmx.de wrote:
>> Let's start with a clean database and then go through all changesets,
>> just applying those done by agreers. If something isn't there to be
>> changed (e.g. a changeset removed a tag, but that tag hasn't been added
>> to the new database, as it was entered by a decliner), just ignore that
>> change.
>
> That sounds like a very elegant and simple approach and I would love
> to be able to use it, but I see a few problems.
>
> As you rightly say, it does have the disadvantage of totally changing
> history; you will look at a node history in disbelief that, for
> example, claims that you have at some time in 2010 changed a place of
> worship from denomination=protestant to denomination=sunni while
> leaving religion=christian in place. You will think "was I really
> *that* old in 2010?" until it dawns on you that when you changed the
> node in 2010, it *was* religion=muslim - just that purging non-agreers
> from history has now hidden this fact.
That's correct, but your suggestions [1] and [2] would do the same, [3]
would set it back to denomination=protestant (which is even worse, as
the incorrect combination can be found by QA tools and then be fixed
easily), I'm not sure about [4] and [5], but as I understand, that's one
of [1]-[3] but with dropped history, so the same applies here, but even
more information is lost, even for objects where there are no problems
like the mentioned. The history would be a bit more cosistent of course,
as there wouldn't be a changeset introducing the wrong combination that
refers to you as the author. [6] would in fact not have this problem but
it still in fact means losing the history as it would be tainted and not
be accessible with the standard tools. So I guess, none of them offers a
real benefit.
> It could happen that someone puts something in a note tag like
> "reverted to version 8 because the information from version 9 was not
> factual"; re-numbering the versions would break that.
[1] and [3] would do the same, [4] and [5] completely drop the history,
and therefore break a tag like that too (would not break a changeset
tag, but remove it completely - this could be considered better than a
broken tag or not), [6] would not directly have this problem (if the
delivered version will get a new, higher version number) but still put
that versions somewhat out of reach. So [6] might have real benefits
here, but on the cost of not having good access to the interesting parts
of the history.
> These problems are not show-stoppers; one could live with them, and
> some of the options that I listed were even more radical.
The same is probably true with the problems your suggestions, I just
wanted to add mine, so we had another one to discuss about ;)
> Some restrictions would also have to be applied. For example, when a
> decliner adds "name=Johann Wolfgang von Goethe-Strasse" and an
> acceptor updates that, based on general spelling rules, to
> "name=Johann-Wolfgang-von-Goethe-Strasse", then we probably should not
> accept that tag into our database.
That's correct too. I did think about some implications of what I
suggested and do have a slight change, or better, clarification: Instead
of directly not applying unclean changesets, they should be applied
silently but with their data marked taintedand the old data (if
existant) kept until the end of the update process, then the tainted
data would be dropped. I'll give you an example (user1 (a) will mean
user1 has accepted, user2 (d) that user2 has declined, etc., // marks
comments just for better comprehensibility):
v1: user1 (a): highway=road
v2: user2 (d): highway=tertary, name="Main St.", maxspeed=50
v3: user3 (a): highway=tertiary, name="Main St.", maxspeed=50
v4: user4 (a): highway=secondary, name="Main Street", maxspeed=50
v5: user5 (a): highway=secondary, name="King George Street", maxspeed=50
would become:
v1: user1: highway=road
(temporary: highway=tertary (tainted, "road"), name="Main St."
(tainted), maxspeed=50 (tainted) //tainted because came from a decliner,
but value "road" kept internally as it's clean
v2: user3: highway=tertiary (tainted, "road"), name="Main St."
(tainted), maxspeed = 50 (tainted) //still tainted because of short
distance between "tertary" and "tertiary" (we might have a table that
defines some special cases and for the rest of changes use something
like the hamming distance)
v3: user4: highway=secondary, name="Main Street" (tainted), maxspeed=50
(tainted) //no tainted highway anymore, as it's replaced, but name still
tainted because of short distance (no clean value to be kept)
v4: user5: highway=secondary, name="King George Street", maxspeed=50
(tainted) //name no clean, too)
which would finally be delivered as
v1: user1: highway=road
v2: user3: highway=road
v3: user4: highway=secondary
v4: user5: highway=secondary, name="King George Street"
But as already said, I think we probably should first decide whether we
want to renumber historic versions or if it's important to keep them as
they are/were. Once we decided on that issue, we can focus on the ways
to get that done instead of weighing the pros and cons of a hundred
detailed plans. We might also come up with a few more basic questions
about what is ok and what should be avoided, so we can just discuss and
vote on them before going too much into detail. Remember there's only 2
1/2 months left until the planned changeover date and we'll need some
time to write the changeover tools and test them and do any changes that
might have to be done to other components of the OSM infrastructure
(such as the API) and communicate to other developers and users anything
that might break their software.
Regards,
Dominik
More information about the Rebuild
mailing list