[OSM-talk] OSM SPAM detector

Pierre Béland pierzenh at yahoo.fr
Mon Mar 5 19:56:45 UTC 2018


It would help to add a comment column to motivate flagging changeset/object content has a spam (SEO marketing description/infos, etc.).
 
Pierre 
 

    Le lundi 5 mars 2018 13:05:41 HNE, Jason Remillard <remillard.jason at gmail.com> a écrit :  
 
 Hi Dave,
The detector needs to be "trained" on what a spam changeset looks like versus what a normal changeset looks like. Training really means programming the detector by example. 
Once we have a good set of example changesets, going forward, it will find them on its own. 
Rather than having me or Fredrick decide what is SPAM is or not, getting a diverse set of changeset from many people will insure that the algorithm is not biased relative to where the consensus is in the project. That is why I posed this to talk not dev. People that map are needed for this task.
Finally, this is just a software component. It will still need to be integrated into final end user tools. By doing the specialized machine learning code first, I am hoping to get some collaborators that are interested in integrating this into tools that everybody can use. But without the curated changeset list, it is going nowhere. Long term, hopefully it will get integrated into several tools... 
Jason
On Mon, Mar 5, 2018 at 12:42 PM, Dave F <davefoxfac63 at btinternet.com> wrote:

  Struggling to understand this
 If users are expected to send you changeset ids, how does it "detect spam"?
 In what way are users informed of spammy changesets?
 
 DaveF
 
 On 05/03/2018 14:06, Jason Remillard wrote:
  
  Hi, 
 
  This weekend I put together a SPAM detector for OSM changesets. 
 
 https://github.com/jremillard/ osm-changeset-classification
 
  You don't need to be a developer to contribute, send over any SPAM'y changesets you come across via a github issue, a pull request, or even an email to me. I just need the changeset id. 
 
  The code is currently hitting 99+% accuracy detecting the difference between 1500 random normal edits and 1500 sketchy changesets that Fredrick shared with the talk-us last last week. This is with zero tuning, so it looks like it will work well.
 
  Jason
   
  
 ______________________________ _________________
talk mailing list
talk at openstreetmap.org
https://lists.openstreetmap. org/listinfo/talk
 
 
 
______________________________ _________________
talk mailing list
talk at openstreetmap.org
https://lists.openstreetmap. org/listinfo/talk



_______________________________________________
talk mailing list
talk at openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk
  
-------------- section suivante --------------
Une pièce jointe HTML a été nettoyée...
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20180305/0f16adc1/attachment.html>


More information about the talk mailing list