[OSM-talk] OSM SPAM detector

Jason Remillard remillard.jason at gmail.com
Mon Mar 5 18:03:16 UTC 2018


Hi Dave,

The detector needs to be "trained" on what a spam changeset looks like
versus what a normal changeset looks like. Training really means
programming the detector by example.

Once we have a good set of example changesets, going forward, it will find
them on its own.

Rather than having me or Fredrick decide what is SPAM is or not, getting a
diverse set of changeset from many people will insure that the algorithm is
not biased relative to where the consensus is in the project. That is why I
posed this to talk not dev. People that map are needed for this task.

Finally, this is just a software component. It will still need to be
integrated into final end user tools. By doing the specialized machine
learning code first, I am hoping to get some collaborators that are
interested in integrating this into tools that everybody can use. But
without the curated changeset list, it is going nowhere. Long term,
hopefully it will get integrated into several tools...

Jason

On Mon, Mar 5, 2018 at 12:42 PM, Dave F <davefoxfac63 at btinternet.com> wrote:

> Struggling to understand this
> If users are expected to send you changeset ids, how does it "detect spam"?
> In what way are users informed of spammy changesets?
>
> DaveF
>
>
> On 05/03/2018 14:06, Jason Remillard wrote:
>
> Hi,
>
> This weekend I put together a SPAM detector for OSM changesets.
>
> https://github.com/jremillard/osm-changeset-classification
>
> You don't need to be a developer to contribute, send over any SPAM'y
> changesets you come across via a github issue, a pull request, or even an
> email to me. I just need the changeset id.
>
> The code is currently hitting 99+% accuracy detecting the difference
> between 1500 random normal edits and 1500 sketchy changesets that Fredrick
> shared with the talk-us last last week. This is with zero tuning, so it
> looks like it will work well.
>
> Jason
>
>
> _______________________________________________
> talk mailing listtalk at openstreetmap.orghttps://lists.openstreetmap.org/listinfo/talk
>
>
>
> _______________________________________________
> talk mailing list
> talk at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/talk
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20180305/48892484/attachment-0001.html>


More information about the talk mailing list