[Talk-us] SEO Damage to OSM
frederik at remote.org
Wed Jul 5 21:05:29 UTC 2017
> These spam changes do not need that complexity to detect.
I've done some numbers, maybe it helps.
I counted all users that only ever commited one changeset with one edit
inside. This number is 140352.
Then I discarded those where the changeset comment was shorter than 50
characters or where the content had been redacted long time ago, leaving
me with 12173.
Then I looked at the objects modified/created, and discarded all where
the object had neither website, nor description, nor note tag. This left
me with 3323 objects.
Then I looked at the list and found a broad range of edits. Some, while
having an advertising slant, seem a legit addition of someone's own
comment=Our doors are always open. Come and visit, taste our coffee,
see what we do
comment=Added in West Town Bikes as it is at the same address and has
enough of its own activity that it needs to be recognized on the map.
Division;name=Ciclo Urbano/West Town
some look more SEO-y
comment=Updated Osborne Insurance Services at Raleigh, NC
comment=Updated State Farm - Blake Manhart at Springfield, VA
Ln #B;name=State Farm -
I had a look at trying to automatically match website and user name; 457
of them actually contain the user name in the web site. but that is a
too coarse check. I fear that it might be necessary to look through the
rest manually to detect the dodgy ones.
Of the 3323, 208 have a highway tag. But here it bites me that I took
everything that had either note or description or website, because some
of the edits with highway=* are legit and have a description/note where
the newbie mapper explained what they did. 170 of the 208 do have a
website tag, and finally, they *all* seem dodgy. (Interestingly it was
not all ways - some highway=traffic_signals too!)
I've run a revert on these 170 but the majority had already been fixed
That leaves us with a good 3115 objects to investigate. Many do clearly
violate our "no advertising" rules but then again we don't want to bee
to harsh with the cycle shop owner who maybe oversteps the line.
I've put my interim results here
(for those where the username is in the URL) - do you think we should
revert them all automatically? (Keep in mind many may have been reverted
already - we'd only work on those where the spam version is still current.)
for those where the username is not (fully) in the URL.
Frederik Ramm ## eMail frederik at remote.org ## N49°00'09" E008°23'33"
More information about the Talk-us