[OSM-talk] How to start to remove non-CT compliant data..

Ian Sergeant isergean at hih.com.au
Wed Aug 31 01:19:56 BST 2011


I think the strategy to remove all non-CT compliant data in one big bang 
is flawed.  The best result for OSM is going to be obtained if the core 
data is nearly clean by the day of the relicencing, so that the  removal 
of the remainder has the least possible impact.  However, to accomplish 
that, some incremental deletions or revision hiding could help us get to 
that point with substantially less effort.

The category I think we should address now is where the v1 is created by a 
person who has agreed to the CT, and subsequent revisions are by a person 
who has explicitly declined the CT. 

Specific sub-categories where an automated process could possibly remove 
or hide non-CT latest revisions are..

1. Where additional nodes have been added to or removed from a way that 
are members of only that way and do not extend the way.

2. Where a node has been moved from its previous location by less than, 
say,  1m.

3. (With more complexity) Where change to a way results in no part of the 
way moving by more than, say, 1m, and no additional connections have been 
added.

4. Where  tags outside a set of defined core tags have been added or 
removed.

If we don't like the idea of this automated deletion/revision hiding, an 
alternative (or a complementary strategy) that would make the 
corresponding manual task easier would be for the API to permit hiding of 
the last version of a object if it is non-CT compliant.

The only way that I can think of to effectively manually deal with this 
data now is to delete the object, load an earlier version and copy it, and 
re-upload to the database.  The current reversion strategies all keep the 
non-CT data in the version chain, making it vulnerable.  In doing this 
manual process, valuable CT-compliant history information is lost.

In some areas the amount of data where there is a CT-compliant v1 but 
non-CT-compliant later revision can be over 50% of objects.   In my 
limited experience examining these areas it appears that many of these 
changes are quite small, often small node movements, or a couple of nodes 
added to smooth a way, or single tags added to a large number of objects. 
The manual effort to ascertain what has actually changed is currently 
large, and risk of wasting the effort of future editors in modifying a 
non-CT compliant object is real.

Ian.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20110831/80f2b87b/attachment.html>


More information about the talk mailing list