[OSM-talk] [Talk-us] Redacting 75, 000 street names contributed by user chdr
frederik at remote.org
Mon Aug 28 12:43:44 UTC 2017
On 08/27/2017 08:51 PM, Mikel Maron wrote:
> Also, Frederik, I think your script picked up false positives. Spot
> checked in DC, and these are expansions of both the street and the
> quadrant ("St NW" -> "Street Northwest"(. Can we fix the script and
> regen the list?
I have modified my "name equality rule" to consider "N" equal to "North"
etc., also it will ignore case, whitespace, and as before the usual
street type expansions (St->Street etc).
This brings the number of problematic objects down by around 5500, and
practically all of them are in the US. However, I noticed that I forgot
to account for "Saint"->"St", and will re-do the numbers yet again
before publishing an updated list.
I think the best course of action would be:
1. Wait a while, until various communities (potentially pointed to this
conversation via the widely-read weekly new roundup) have had the time
to check whether my automated assessment of which names count as
"contributed" by chdr is correct. Mikel has found the issue above and I
fixed it; it is quite possible that there are others.
2. Run the redaction, and remove all names contributed by chdr. At
present it looks as if less than 10% of these objects had a different
name before; more than 90% had not name at all. Perhaps it is indeed
best to remove the name in these cases as well instead of reverting to
the old name.
3. Load the IDs of all affected objects in a MapRoulette task or
similar, so people can check the names by survey, or from different
sources. (I assume that, as Simon pointed out, open data will not be
available for all countries affected. I fear that, with MapRoulette
geared towards armchair mapping, there might be a temptation for people
to yet again fill in the blanks from inadmissible sources. Maybe we
should limit the use of MapRoulette to countries where we know that open
sources exist, and use fixme tags or notes for other countries?)
I think that would be cleaner than verifying the names ahead of time.
Also it would create an audit trail - from the object history, you could
then see that the name was removed for copyright reasons, and you could
then see that user XYZ has added a new name. If it should later turn out
that this name was also copied from an indadmissible source, we know
that user XYZ is at fault, whereas people creating lists with
independently verified names is not something that would give us such a
I must apologize for not having given a time frame in my initial email;
there's absolutely no reason to panic. This matter has been sitting idle
for years, and a few more weeks won't kill us. We can sort this out
calmly and then do the right thing.
Frederik Ramm ## eMail frederik at remote.org ## N49°00'09" E008°23'33"
More information about the talk