[Rebuild] We're better off detecting splits and merges

ant antofosm at gmail.com
Sat Feb 4 17:30:25 GMT 2012


Hi,

detecting splits and merges will leave us with more clean data, 
especially better geometries. Here's why.

If splits and merges are detected, the result will differ in the 
following cases:

1. Dirty way is split by agreeing mapper
- Currently: The newly created way is marked as clean.
- If detected: The newly created way is marked as dirty.

2. Clean way is split by non-agreeing mapper
- Currently: The newly created way is marked as dirty.
- If detected: The newly created way is marked as clean.

3. Dirty ways are merged by agreeing mapper
- Currently: 1/2 of the resulting way is marked as clean.
- If detected: The resulting way is marked as dirty.

4. Clean ways are merged by non-agreeing mapper
- Currently: 1/2 of the resulting way is marked as dirty.
- If detected: The resulting way is marked as clean.

5. Dirty way and clean way are merged by agreeing mapper, and the clean 
way survives
- Currently: The resulting way is marked as clean.
- If detected: The resulting way is only partially marked as clean.

6. Dirty way and clean way are merged by non-agreeing mapper, and the 
dirty way survives
- Currently: The resulting way is marked as dirty.
- If detected: The resulting way is only partially marked as dirty.


It seems that the cases 1 and 2 equal out, just like 3 and 4 and 5 and 6 
do. We can indeed assume that cases 1 and 2 (3 and 4, 5 and 6) have 
approximately the same probability, because p("object is dirty") is 
likely to approximate p("object has been split/merged by a non-agreer").

As far as the tags are concerned, I believe that it comes out even.

But for the nodes it is different, because in the cases 1, 3 and 5 nodes 
that are currently marked as clean in the way's node lists might still 
not survive with the current algorithm, because the nodes themselves 
might be dirty (i.e. unless they have been cleaned by moving), and all 
references to them will have to be deleted due to referential integrity. 
Meanwhile references to clean nodes are deleted with no reason. Thus the 
benefits that can be gained from split detection in cases 2, 4 and 6 
outweigh the drawbacks coming from 1, 3 and 5.

(Disclaimer: In my opinion the core benefit in detecting splits and 
merges is the higher accuracy in terms of asserting people's copyright, 
not the percentage of clean objects we get in the end.)

A related idea: Shouldn't cleaning a node's position also clean all 
references in ways to it?

cheers
ant



More information about the Rebuild mailing list