[OSM-talk] How to start to remove non-CT compliant data..
isergean at hih.com.au
Wed Aug 31 01:19:56 BST 2011
I think the strategy to remove all non-CT compliant data in one big bang
is flawed. The best result for OSM is going to be obtained if the core
data is nearly clean by the day of the relicencing, so that the removal
of the remainder has the least possible impact. However, to accomplish
that, some incremental deletions or revision hiding could help us get to
that point with substantially less effort.
The category I think we should address now is where the v1 is created by a
person who has agreed to the CT, and subsequent revisions are by a person
who has explicitly declined the CT.
Specific sub-categories where an automated process could possibly remove
or hide non-CT latest revisions are..
1. Where additional nodes have been added to or removed from a way that
are members of only that way and do not extend the way.
2. Where a node has been moved from its previous location by less than,
3. (With more complexity) Where change to a way results in no part of the
way moving by more than, say, 1m, and no additional connections have been
4. Where tags outside a set of defined core tags have been added or
If we don't like the idea of this automated deletion/revision hiding, an
alternative (or a complementary strategy) that would make the
corresponding manual task easier would be for the API to permit hiding of
the last version of a object if it is non-CT compliant.
The only way that I can think of to effectively manually deal with this
data now is to delete the object, load an earlier version and copy it, and
re-upload to the database. The current reversion strategies all keep the
non-CT data in the version chain, making it vulnerable. In doing this
manual process, valuable CT-compliant history information is lost.
In some areas the amount of data where there is a CT-compliant v1 but
non-CT-compliant later revision can be over 50% of objects. In my
limited experience examining these areas it appears that many of these
changes are quite small, often small node movements, or a couple of nodes
added to smooth a way, or single tags added to a large number of objects.
The manual effort to ascertain what has actually changed is currently
large, and risk of wasting the effort of future editors in modifying a
non-CT compliant object is real.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the talk