[OSM-talk] Redacting 75,000 street names contributed by user chdr

Frederik Ramm frederik at remote.org
Sun Aug 27 13:49:59 UTC 2017


Hi,

   in 2010 I was privately contacted by another OSM user with the
suspicion that user "chdr" might be copying names from Google maps
(there were few "easter eggs" in Oman that were only on Google and not
in the real world, and they suddenly popped up on OSM). "chdr" was
contacted at the time, but continued unfazed. In 2013 another mapper
lodged a complaint with DWG about edits by chdr, and I emailed chdr
asking him about his sources. At that point chdr stopped mapping. He
never replied about his sources though, even when I set an ultimatum (of
31st August 2013) threatening to remove all names he contributed if he
can't tell us his source. We do have to assume that all names
contributed by chdr are copyright violations.

(chdr has added names all around the world, making a harmless survey
unlikely.)

For various reasons I neglected to act on this, and was only reminded
now, 5 years later, when DWG received a complaint from a user in Brazil
where chdr has even used "source=google" occasionally. (But as I said,
the suspicion is that Google was used throughout.)

I have now compiled a list of all street names that were contributed by
chdr and are still visible today; we're talking about almost 75,000
street names world wide. The most affected countries are:

  18023 "United States of America"
  16345 "Mexico"
  15109 "Brazil"
   6791 "RSA"
   2802 "Spain"
   2614 "Australia"
   1923 "Argentina"
   1673 "Nigeria"
   1569 "India"
   1441 "Canada"
    954 "Malaysia"
    744 "Botswana"
    717 "Philippines"
    619 "Indonesia"
    553 "Italy"
    414 "Turkey"
    290 "Hungary"
    284 "Chile"
    250 "Kenya"
    127 "Saudi Arabia"
    107 "Paraguay"
    106 "Panama"
    100 "Morocco"

I've left out those countries with less than 100 affected ways.

For the US, I can break it down by state:

   5696 "Arizona"
   5116 "Texas"
   2294 "New York"
   1164 "District of Columbia"
    740 "Iowa"
    494 "Colorado"
    416 "New Jersey"
    339 "Illinois"
    268 "Michigan"
    239 "Pennsylvania"
    181 "Missouri"
    147 "Georgia"
    129 "New Mexico"
    123 "North Carolina"
    115 "California"
    106 "Virginia"

The breakdown for Mexico:

   7749 "Baja California"
   2084 "Puebla"
   1964 "Chihuahua"
   1539 "Coahuila"
   1161 "Mexico"
   1040 "Chiapas"
    342 "Tamaulipas"
    241 "Sonora"
    185 "San Luis Potosi"
    129 "New Mexico"

and Brazil:

  10904 "São Paulo"
   2605 "Paraná"
    945 "Rio de Janeiro"
    270 "Rio Grande do Sul"
    154 "Goiás"

and South Africa:

   4422 "Gauteng"
    750 "KwaZulu-Natal"
    600 "Eastern Cape"
    439 "Western Cape"
    400 "Northern Cape"
    179 "Mpumalanga"

- each time leaving out a couple others under 100.

We believe that only names, not geometries have been taken from other
maps so we'll remove and redact the names only. In identifying "names
contributed by chdr" I took care to really only pick up the names that
were introduced by them, not names that were there before, and also when
chdr split a way that had a name I will make sure that the newly created
way doesn't count as "named by chdr". Additionally, I have ignored those
cases where chdr simply performed a TIGER expansion (St->Street etc) of
a name that was there before.

My process has two weak points (that I am aware of):

1. It doesn't properly "follow" a chrdr-contributed name through way
splits performed by other users; if someone has split a way created by
chdr, then the name will remain on the bit that was created by this
user. This is somewhat unsatisfying but after having manually checked a
random sample I think the problem is small enough to be ignored.

2. It is possible that, like with a recent case in Switzerland where I
had to do a similar redaction, some of these chdr-contributed names will
have been confirmed by others in a survey, i.e. someone else surveyed
the area and checked the name, but saw no need to change it in any way
since it was already correct. Sadly my process will now remove the name
even though, had the name not been there in the first place, that person
could have added the name. This is not nice but I don't see how it could
be avoided.

Here's a list of way IDs affected, with country and state:

http://www.remote.org/frederik/tmp/chdr.details

I am trying to keep the damage to OSM to a minimum while at the same
time respecting copyright. If anyone wants to spot check a few names in
their area and can suggest a refinement of the process that would leave
more names in place because there's reason to assume they are legit, I'm
all ears.

It has been suggested to me that even if names in the US were taken
from Google, Google would in turn have had them from TIGER and hence we
could simply leave them be. I am not convinced of this reasoning but
willing to hear that case argued.

It is sad that chdr isn't available for comment but I must take their
silence as an admission of wrongdoing. I will fire off another message
to them pointing to this thread.

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00'09" E008°23'33"




More information about the talk mailing list