[OSM-dev] Google Summer of Code
kabum
uu.kabum at gmail.com
Tue Apr 3 10:20:23 BST 2012
Hi,
Am 2. April 2012 22:20 schrieb Paul Norman <penorman at mac.com>:
> A tool that operates on the changeset level is
> https://github.com/pnorman/osm-weirdness****
>
> It detects changesets that have a high probability of being an import or
> mechanical edit. The detection is pretty crude but it does find a fair
> number of undocumented imports, mechanical edits, and other weirdness. If
> you point it an old state.txt file it will start in the past and work up to
> the present.
>
I've a look later this day on your script.
> **
>
> When working with the minutely diffs there are some limitations:****
>
> Limited knowledge of changesets. In practice, if you start your detection
> an hour in the past you can have a list of all open changesets, but it is
> not possible to know the tags of the changesets.****
>
> No knowledge of the previous state of objects. You know where deleted
> objects were, but you can’t tell how far an object is moved or what it’s
> tags were before. To tell this you need to query a service with a full
> history DB, and handling full history files is difficult.****
>
> No knowledge of way geometry if using existing nodes. Iandees’
> https://github.com/pnorman/osm-weirdness/tree/way_check solves this by
> fetching nodes in a way that aren’t also in the changeset from jxapi and it
> can then detect bad geometry (e.g. ways that trace over themselves)****
>
> ** **
>
> If you were to code a vandalism detection tool I think it should work on
> the minutely replication diffs (
> http://wiki.openstreetmap.org/wiki/Planet.osm/diffs)
>
I thought about analyse the data after the changeset is closed, but this
diffs sounds also good. I will check this way :) Thanks!
Am 3. April 2012 09:38 schrieb Derick Rethans <osm at derickrethans.nl>:
> On Mon, 2 Apr 2012, kabum wrote:
>
> > Result:
> > - each changeset has a total rating -> use a treshold value to divide
> them
> > into suspicious and not suspicious
>
> Instead of just using static thresholds, I think that something like SVM
> (http://en.wikipedia.org/wiki/Support_vector_machine) might be highly
> benificial here; and it's another cool technology to play with. There is
> a cool library for this (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) and
> I know there is at least an extension to use it from PHP:
> http://phpir.com/support-vector-machines-in-php
Thanks for this method ... seems to be very suitable for our use case.
I've already some years of experience of PHP, but I wouldn't prefer it for
this part of the project. I thought about Python (libsvm has native Python
bindings ;))
>
> > Some questions came up within this preparation:
> > - Is there a prefered language? Has this to be specified within the
> > proposal? (language skill has to be rated, so I would decide this during
> > the project phase)
>
> Not really any preferred language. What did you have in mind? For the
> front end I was thinking PHP, but the engine, I wouldn't know. I think
> something high performant (so C or C++) might be benificial.
>
My thoughts were that it's easy to setup and it's capable to call it easy
from a terminal or to include it in other python scripts (i.e. web
frontend).
If C++ is necessary, because of it's speed, then I think I could master
this. In the passed semester I participated in a software engineering
partical training at university (in a team of five fellow students), where
we have an extensive use of C++ (https://github.com/brainafk/Empire).
>
> > - I also would like to discuss used libraries and framework within the
> > project phase, or should I decide this also in my proposal?
> > - Should the frontend integrate in the current website (ruby on rails
> > project) or should this just be an optional feature?
>
> I think it can easily live as it's own website.
>
Ok :)
>
> > - How detailed should be the proposal? Is it enough to formulate this
> draft?
>
> That's a tricky one, the more information you provide the better I
> think, as it shows you have thought about it :-)
>
I think it grows a lot by this discussion and I try to be as detailed as
possible. :)
Thanks for the response :)
Regards,
Morris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20120403/8ecd90cb/attachment.html>
More information about the dev
mailing list