Hi,<div><br><div class="gmail_quote">Am 2. April 2012 22:20 schrieb Paul Norman <span dir="ltr"><<a href="mailto:penorman@mac.com">penorman@mac.com</a>></span>:<br><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div lang="EN-US" link="blue" vlink="purple"><p class="MsoNormal"><span style="font-family:Calibri,sans-serif">A tool that operates on the changeset level is <a href="https://github.com/pnorman/osm-weirdness" target="_blank">https://github.com/pnorman/osm-weirdness</a><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-family:Calibri,sans-serif">It detects changesets that have a high probability of being an import or mechanical edit. The detection is pretty crude but it does find a fair number of undocumented imports, mechanical edits, and other weirdness. If you point it an old state.txt file it will start in the past and work up to the present.</span></p>
</div></blockquote><div><br></div><div>I've a look later this day on your script.</div><div> </div><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div lang="EN-US" link="blue" vlink="purple"><p class="MsoNormal"><span style="font-family:Calibri,sans-serif"><u></u></span></p><p class="MsoNormal"><span style="font-family:Calibri,sans-serif">When working with the minutely diffs there are some limitations:<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-family:Calibri,sans-serif">Limited knowledge of changesets. In practice, if you start your detection an hour in the past you can have a list of all open changesets, but it is not possible to know the tags of the changesets.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-family:Calibri,sans-serif">No knowledge of the previous state of objects. You know where deleted objects were, but you can’t tell how far an object is moved or what it’s tags were before. To tell this you need to query a service with a full history DB, and handling full history files is difficult.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-family:Calibri,sans-serif">No knowledge of way geometry if using existing nodes. Iandees’ <a href="https://github.com/pnorman/osm-weirdness/tree/way_check" target="_blank">https://github.com/pnorman/osm-weirdness/tree/way_check</a> solves this by fetching nodes in a way that aren’t also in the changeset from jxapi and it can then detect bad geometry (e.g. ways that trace over themselves)<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-family:Calibri,sans-serif"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-family:Calibri,sans-serif">If you were to code a vandalism detection tool I think it should work on the minutely replication diffs (<a href="http://wiki.openstreetmap.org/wiki/Planet.osm/diffs" target="_blank">http://wiki.openstreetmap.org/wiki/Planet.osm/diffs</a>)</span></p>
</div></blockquote><div><br></div><div>I thought about analyse the data after the changeset is closed, but this diffs sounds also good. I will check this way :) Thanks!</div><div> </div><div> </div></div><div class="gmail_quote">
Am 3. April 2012 09:38 schrieb Derick Rethans <span dir="ltr"><<a href="mailto:osm@derickrethans.nl">osm@derickrethans.nl</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On Mon, 2 Apr 2012, kabum wrote:<br>
<br>
> Result:<br>
> - each changeset has a total rating -> use a treshold value to divide them<br>
> into suspicious and not suspicious<br>
<br>
</div>Instead of just using static thresholds, I think that something like SVM<br>
(<a href="http://en.wikipedia.org/wiki/Support_vector_machine" target="_blank">http://en.wikipedia.org/wiki/Support_vector_machine</a>) might be highly<br>
benificial here; and it's another cool technology to play with. There is<br>
a cool library for this (<a href="http://www.csie.ntu.edu.tw/~cjlin/libsvm/" target="_blank">http://www.csie.ntu.edu.tw/~cjlin/libsvm/</a>) and<br>
I know there is at least an extension to use it from PHP:<br>
<a href="http://phpir.com/support-vector-machines-in-php" target="_blank">http://phpir.com/support-vector-machines-in-php</a></blockquote><div><br></div><div>Thanks for this method ... seems to be very suitable for our use case.</div>
<div><br></div><div>I've already some years of experience of PHP, but I wouldn't prefer it for this part of the project. I thought about Python (libsvm has native Python bindings ;)) </div><div><br></div><div><br>
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<div class="im"><br>
> Some questions came up within this preparation:<br>
> - Is there a prefered language? Has this to be specified within the<br>
> proposal? (language skill has to be rated, so I would decide this during<br>
> the project phase)<br>
<br>
</div>Not really any preferred language. What did you have in mind? For the<br>
front end I was thinking PHP, but the engine, I wouldn't know. I think<br>
something high performant (so C or C++) might be benificial.<br></blockquote><div><br></div><div><div><br class="Apple-interchange-newline">My thoughts were that it's easy to setup and it's capable to call it easy from a terminal or to include it in other python scripts (i.e. web frontend).</div>
<div><br></div><div>If C++ is necessary, because of it's speed, then I think I could master this. In the passed semester I participated in a software engineering partical training at university (in a team of five fellow students), where we have an extensive use of C++ (<a href="https://github.com/brainafk/Empire">https://github.com/brainafk/Empire</a>).</div>
</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
> - I also would like to discuss used libraries and framework within the<br>
> project phase, or should I decide this also in my proposal?<br>
> - Should the frontend integrate in the current website (ruby on rails<br>
> project) or should this just be an optional feature?<br>
<br>
</div>I think it can easily live as it's own website.<br></blockquote><div><br></div><div>Ok :)</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
> - How detailed should be the proposal? Is it enough to formulate this draft?<br>
<br>
</div>That's a tricky one, the more information you provide the better I<br>
think, as it shows you have thought about it :-)<br></blockquote><div><br></div><div>I think it grows a lot by this discussion and I try to be as detailed as possible. :)</div><div><br></div><div>Thanks for the response :)</div>
<div><br></div><div>Regards,</div><div>Morris</div></div></div>