[Imports] Harmful elements in taginfo tag cleanup process

Ilpo Järvinen ilpo.jarvinen at helsinki.fi
Tue May 5 21:36:48 UTC 2015


On Tue, 5 May 2015, Bryce Nesbitt wrote:

> Eventually, I feel the project will need to move to either
> a reputation system (where others rate the quality of edits) or 
> a double approval system (where a second mapper must endorse an edit before
> it goes live -- perhaps only
> for edits that remove information).

Right, I'd mostly want to limit the power of single person doing a 
removal. However, even double approval might not effectively filter
much if such a process collects similarly minded cleanup focused
persons more likely than other types of mappers.

> If you have a tag that's commonly deleted, then try adding a wiki page for
> it.  That might help.

I've thought that too. However, I think this puts the burden on incorrect 
end in the loop. Given that many of the deletes seem to occur for words 
with are well known in local context it would be duplicate effort to 
document them rather than finding the "global" alternative which is hard 
than some might understand (especially for non-natives).

...If some tag gets deleted very often I might consider that though.

> If you list key names that have been improperly deleted, perhaps that will
> give a clue also.  
> Do they all have funny characters?  Are they all seemingly non-English?  Is
> there a pattern?

Not really (I'm listing them below with lots of thoughts too) but I think 
these details are irrelevant as the proper action would be replace them 
with a tag that is intented for "global use" (if one is applicable). Only 
thing common seems to be that there are quite few of each in the DB so 
every tag which is not used very frequently might get targeted I suppose.

My main point is that these "fixers" did not do replace but deleted 
instead which is what gets me alarmed as then nobody else has the 
opportunity to replace either (I admit that one seemed to tried very hard 
to figure out local context but most seemingly didn't that much :-(). 
Sadly they also seem to claim/think they "fix" stuff by removing given the
changeset comments.


Below is a verbose list of case I remember/have had changeset comment 
exchanges with (this is rather verbose list with my own comments on each 
case, so please don't read if you have better things to do :-). And BTW, 
I've also listed those few good deletes in the end):

arava=yes
Name of a local state sponsored subsidy system. The naming comes from 
Finnish Law and is well known to almost any local who is past childhood 
already. This have been removed twice (no odd characters, used just 
few times and incomprehensible word which neither of the deleters 
understood in the first place). However, I think that perhaps fi:arava=yes 
might be better tag for this or some form of "global" 
building:subsidied:system=arava or like (if wiki has something along those 
lines I don't know of) but it's noteworthy that none of the deleters have 
changed it to these alternative (so this kind of "fixing" is useless 
without local input that should IMHO be asked or waited patiently for 
rather than forced by deleting keys). Tricky to acquire after the building 
is completed. The few that are currently in the DB are mostly based on 
information that is available during construction time (otherwise very 
local local knowledge is essential).

vuokra-asuntoja=yes
Means rented apartments (in contrast to different forms of ownership
of the apartments in the building). Useful at least for statistics 
purposes and is not easily available in many cases (as such it's every 
valuable to acquire from some legal source). An incomplete "fix" was 
tried which changed this to rental=yes (some of them were changed and some 
deleted in the same changeset deleting arava=yes) but I don't know if 
that's correct or not as there's no wikipage for that either. Also this 
has been deleted twice already.

välkky=yes
Name of a blinker device that is tested for highlighting highway=crossings
in sensitive places such as near a school. Few were installed for testing 
purposes. It might be same as flashing_lights=yes but given the lack of 
what exactly it means (does it cover traffic lights type of flashers only)
it's unclear to me. However, flashing_lights was not discovered by the 
fixer him/herself but me so the "fix" effort should not be credited over 
locals here either! Also, flashing_lights is rather recent addition 
compare with välkky addition and I don't think I've seen it e.g. on 
tagging@ so it's unreasonable to assume that locals would learn everything 
from wiki so quickly for the tags they're already familiar with.
In addition, the user who put välkky to db explained that multiple device
types are/were tested, this was just one of them and it is/was unclear   
if one would be selected over another eventually, if any (which might mean
dismantling the other types so removing type information would make
further update more complicated).

rocks=yes
roots=yes
Were tagged to a highway=path. Claimed to provide no added value by the 
fixer. I somewhat agree but again, this decision to remove information 
that was acquired through local survey was removed by a decision of an 
individual only which I find dangerous practice even if I somewhat agree
with the reasoning done by the fixer.

molok=yes
Underground waste collector brand well known to locals. This delete was 
updated to something better in a follow up change by the fixer, which I 
find very positive experience (he even looked the "global tag" up all by 
himself rather than encumbering locals for the work that the fixer should 
follow through, IMHO)!

name:francais (or some form of that with a fancy non-Finnish letter)
Probably some editor auto-completion issue. Should have been changed
to name:fi rather than deleted but as the fixer didn't understand local 
context he/she thought that it's simply a duplicate as name was
already equal to the value. Not a big issue since the name still
persisted but again highlights how important it's to understand 
the local context when deleting something. Based on the follow up 
discussion it seems that the particular fixer even lacked
understanding on how the naming works in bilingual countries as the 
information was claimed to be "redundant".

was:*=* (we use these to prevent remapping form imagery, not for
history preservation like the "fixer" kept claiming multiple times).
It has been useful for me personally and I've even encountered one 
incorrect "redraw" of a feature (which is not that likely given that you 
need to do the survey and detection of the "redraw" yourself as anybody 
else would just correct the redraw damage without detection for
"redraw" event taking place).

last_robbed_date=*
I asked the user adding these and some statistics/correlation related
use case was visioned by him. Obviously the data is in no way complete
but it's not incorrect either. I understand that some here probably 
disagree that this should be kept in the DB in the first place (but 
remember that you were not asked by the "fixer", it could be tags you'd 
like to keep next time).


I remember seeing two clearly good removals so far (there might have
been one or two others but I fail to remember more as I have no formal
exchanges with the fixer about them):

building=residential or building=apartments information duplicated
into some other key that was deleted by the fixer.

mapper:accident
Details about mapping related accidents I used two times. The irony is 
that I've added those (if you want to know why, this is my hobby only so 
just had some fun back then).


...and plenty of those fixes that really fixed typos, updated to better 
"global tags" and such but that's out of scope w.r.t. deleting tags.


-- 
 i.


More information about the Imports mailing list