[Rebuild] Extending Richard's "v0 for relations" suggestion

Ed Loach ed at loach.me.uk
Tue Feb 7 15:46:11 GMT 2012


I've been trying to get my head around how we could perhaps have a
v0 for any given object, be it node, way or relation. Each object
type has tags and members (for nodes the members are lat/lon, for
ways the members are nodes, and for relations the members - whether
they be ways, nodes or other relations).

So in each case we could think of a v0 item with no tags and no
members for a given object type and id (except for when ways are
split when the v0 object for the new id would be the pre-split
parent way; merging ways effectively deletes a way so doesn't need
special consideration).

What I can't see obviously is how at any stage you can tell for a
given way version (for example) which node versions were members -
so were they for example clean at the time of splitting a way, or
dirtied afterwards. And I'm trying to work out if it matters. I'm
guessing it will for history calls, perhaps.

For current tables...

If we start by checking all the nodes:
* find the last clean lat and the last clean lon for that node; if
either of these are the v0 empty object then delete the node.
* work out which tags are clean like the deep diff tool does and
only keep them (if empty)
* remove any nodes with no tags that are not part of a way or
relation (e.g. a POI tagged by a non-accepter, but positioned by an
accepter).

Then for each way:
* keep only clean nodes from the above step - if any way has 1 or
fewer nodes remaining then delete it
* work out which tags are clean as the deep diff tool does and only
keep those tags - except perhaps also handle splits by considering
pre-split way as the v0 version; this means knowing which tags were
clean at the time of the split, and identifying. This appears to be
non-trivial. Some thoughts on that below.
* if no tags remain, perhaps add FIXME=untagged way, or just leave
it untagged.

Then for each relation:
* keep any node members that still exist if added by an accepter
* keep any way members that still exist if added by an accepter
* recurse same checks through any sub-relations if added by an
accepter, else remove from parent relation.
* treat tags on relations in the same way as Richard suggested
starting with v0 empty set.
* delete any relations that have no remaining members

This might mean that not all v1 items created by non-accepters are
automatically dirty (such as nodes added that were later moved, or
tags added that were later amended - either key or value). I think
these are listed as "Edge cases" on the wiki "What is clean?" page. 

My thoughts on identifying splits.
* In a given changeset if a split occurred you will have a non-v1
way which shares an end node with a v1 way, and that v1 way also
contains at least one other node that was in the previous version of
the non-v1 way before the changeset which wasn't in it afterwards.
* I have not found an example to see what gets saved if you split a
way into more than two pieces in the same changeset; it may be that
you have multiple v1 ways which link together at end nodes and each
contain at least one node from the end non-v1 way. Or you may have
multiple versions of the new way in the same changeset (in which
case no problem - falls back to above case where the v1 way is in
the changeset as split from non-v1 way, and also later version as
non-v1 way is in changeset as parent of another v1 way in the
changeset). I am guessing this might vary by editor or even
different versions of the same editor.

Issues with splits and the v0 idea
* If the split is done by a non-accepter, then the new way may roll
back to the pre-split v0 way, but if all the tags and nodes on the
original way id have since become clean this may result in duplicate
ways over the pre-split section, if you see what I mean.

Issues with merges
* If a way is merged by someone who hasn't accepted, do all later
tag amendments that are clean need applying to both unmerged ways?
* If so then identifying merges becomes necessary, in probably a
similar manner to the thoughts on identifying splits (there were two
ways in a changeset - one deleted that shared an end node with
another way which post-changeset contains more than one node from
the deleted way's pre-deleted version)

In terms of doing this practically, I like the idea of the flags
against database objects discussed previously, only exposing clean
items to the API, with placeholder XML for anything that has been
removed due to licence change. As was suggested here recently, we
will not be able to come up with algorithms that can handle splits
and merges and correctly guess in all instances why a mapper did
such a thing, and the flag suggestion that I read in the online
archive (it was before I subscribed) seems to be able to allow
relatively easy flagging of objects that any mappers identify that
mistakenly remain after the switch. I am aware of a well-known
mapper's threats of legal action should he identify anything that
should have been removed; having a mechanism to remove such items on
request, providing the case can be made to do so, would make such
action less likely to succeed, and I believe the flags idea provides
this. 

Ed







More information about the Rebuild mailing list