# [Rebuild] We're better off detecting splits and merges

ant antofosm at gmail.com
Sat Feb 4 17:30:25 GMT 2012

Hi,

detecting splits and merges will leave us with more clean data,
especially better geometries. Here's why.

If splits and merges are detected, the result will differ in the
following cases:

1. Dirty way is split by agreeing mapper
- Currently: The newly created way is marked as clean.
- If detected: The newly created way is marked as dirty.

2. Clean way is split by non-agreeing mapper
- Currently: The newly created way is marked as dirty.
- If detected: The newly created way is marked as clean.

3. Dirty ways are merged by agreeing mapper
- Currently: 1/2 of the resulting way is marked as clean.
- If detected: The resulting way is marked as dirty.

4. Clean ways are merged by non-agreeing mapper
- Currently: 1/2 of the resulting way is marked as dirty.
- If detected: The resulting way is marked as clean.

5. Dirty way and clean way are merged by agreeing mapper, and the clean
way survives
- Currently: The resulting way is marked as clean.
- If detected: The resulting way is only partially marked as clean.

6. Dirty way and clean way are merged by non-agreeing mapper, and the
dirty way survives
- Currently: The resulting way is marked as dirty.
- If detected: The resulting way is only partially marked as dirty.

It seems that the cases 1 and 2 equal out, just like 3 and 4 and 5 and 6
do. We can indeed assume that cases 1 and 2 (3 and 4, 5 and 6) have
approximately the same probability, because p("object is dirty") is
likely to approximate p("object has been split/merged by a non-agreer").

As far as the tags are concerned, I believe that it comes out even.

But for the nodes it is different, because in the cases 1, 3 and 5 nodes
that are currently marked as clean in the way's node lists might still
not survive with the current algorithm, because the nodes themselves
might be dirty (i.e. unless they have been cleaned by moving), and all
references to them will have to be deleted due to referential integrity.
Meanwhile references to clean nodes are deleted with no reason. Thus the
benefits that can be gained from split detection in cases 2, 4 and 6
outweigh the drawbacks coming from 1, 3 and 5.

(Disclaimer: In my opinion the core benefit in detecting splits and
merges is the higher accuracy in terms of asserting people's copyright,
not the percentage of clean objects we get in the end.)

A related idea: Shouldn't cleaning a node's position also clean all
references in ways to it?

cheers
ant