[Talk-us] SEO Damage to OSM

Frederik Ramm frederik at remote.org
Wed Jul 5 21:05:29 UTC 2017


Hi,

> These spam changes do not need that complexity to detect.

I've done some numbers, maybe it helps.

I counted all users that only ever commited one changeset with one edit
inside. This number is 140352.

Then I discarded those where the changeset comment was shorter than 50
characters or where the content had been redacted long time ago, leaving
me with 12173.

Then I looked at the objects modified/created, and discarded all where
the object had neither website, nor description, nor note tag. This left
me with 3323 objects.

Then I looked at the list and found a broad range of edits. Some, while
having an advertising slant, seem a legit addition of someone's own
business:

user=Martin Merkur
changeset=38362589
comment=Our doors are always open.  Come and visit, taste our coffee,
see what we do
object=node 4103514010
addr:city=Berlin;addr:housenumber=38;addr:postcode=12435;addr:street=Elsenstraße;amenity=cafe;cuisine=coffee_shop;internet_access=no;name=passenger
coffee;note=https://www.facebook.com/PassengerEspresso/;opening_hours=7:30-15:00
Uhr;smoking=outside;website=passenger-coffee.de

or

user=otheryan
changeset=13150739
comment=Added in West Town Bikes as it is at the same address and has
enough of its own activity that it needs to be recognized on the map.
object=node 1585399965
addr:housenumber=2459;addr:postcode=60622;addr:street=W
Division;name=Ciclo Urbano/West Town
Bikes;shop=bicycle;website=http://ciclourbanochicago.com/

some look more SEO-y

user=northcarolinahealth
changeset=43324244
comment=Updated Osborne Insurance Services at Raleigh, NC
object=node 4474950186
addr:city=Raleigh;addr:housenumber=5316;addr:postcode=27609;addr:state=NC;addr:street=Six
Forks Road;hours=Mon-Fri
:8.00AM-6.00PM;name=Osborne Insurance
Services;phone=919-845-9955;suite=110;website=http://northcarolinahealth.org

or

user=blakemanhart
changeset=43027180
comment=Updated State Farm - Blake Manhart at Springfield, VA
object=node 4456153164
addr:city=Springfield;addr:housenumber=8322;addr:postcode=22152;addr:state=VA;addr:street=Traford
Ln #B;name=State Farm -
Blake Manhart;Owner=Blake
Manhart;phone=703-992-9664;website=http://blakemanhart.com

I had a look at trying to automatically match website and user name; 457
of them actually contain the user name in the web site. but that is a
too coarse check. I fear that it might be necessary to look through the
rest manually to detect the dodgy ones.

Of the 3323, 208 have a highway tag. But here it bites me that I took
everything that had either note or description or website, because some
of the edits with highway=* are legit and have a description/note where
the newbie mapper explained what they did. 170 of the 208 do have a
website tag, and finally, they *all* seem dodgy. (Interestingly it was
not all ways - some highway=traffic_signals too!)

I've run a revert on these 170 but the majority had already been fixed
by others!

That leaves us with a good 3115 objects to investigate. Many do clearly
violate our "no advertising" rules but then again we don't want to bee
to harsh with the cycle shop owner who maybe oversteps the line.

I've put my interim results here

http://www.remote.org/frederik/tmp/username-in-url.csv

(for those where the username is in the URL) - do you think we should
revert them all automatically? (Keep in mind many may have been reverted
already - we'd only work on those where the spam version is still current.)

and

http://www.remote.org/frederik/tmp/other.csv

for those where the username is not (fully) in the URL.

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00'09" E008°23'33"



More information about the Talk-us mailing list