[Talk-us] Help fight advertising

Jason Remillard remillard.jason at gmail.com
Fri Mar 2 15:13:56 UTC 2018


Hi Frederik


> * Should we have some MapRoulette task or OSMCha automatism or OSMI view
> to detect potential advertising?
>
>
>
Detecting these change sets should be quite straightforward. Here is a
Keras sample that could be easily modified to process change sets. The
model in this example is tiny and could easily be run over all of the
change sets every day with a normal laptop, with no GPU.

https://github.com/keras-team/keras/blob/master/examples/pretrained_word_embeddings.py

The machine learning people are always hungry for more curated datasets.

You have done the hard work already by curating a list of spamy changesets.
Make a central place where we could keep a list of changesets that are
spam, so that if people are interested in writing a changeset spam
detector, the time consuming part is done already.

A github repository with a two CSV file or json file that has the changeset
id, and classification. For now (spam, good), and a python script to
download the changeset dumps and lookup the age/changeset count of the user
into a local directory would be enough.

12455662,spam
12555662,spam
1245155,good

them a

download.py file, downloads and writes out data/spam/xxxx,xml and
data/good/xxxx.xml

etc

After we have a bot(s) screening all of the change sets for spam, then many
things are possible.

Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-us/attachments/20180302/72b35a51/attachment.html>


More information about the Talk-us mailing list