[OSM-dev] Google Summer of Code

kabum uu.kabum at gmail.com
Thu Apr 5 11:34:58 BST 2012


Am 3. April 2012 20:02 schrieb Paul Norman <penorman at mac.com>:

> The problem with detecting when changesets are closed is that there is no
> way to determine exactly when they are closed short of an API query. You
> can fake it by assuming changesets are closed an hour after the last change
> to them and 24 hours after the first change to them.
>

Open:  (http://www.openstreetmap.org/api/0.6/changeset/11187430)
<osm version="0.6" generator="OpenStreetMap server">
<changeset id="11187430" user="regedi" uid="645826" created_at="
2012-04-05T10:28:21Z" open="true" min_lat="50.0106489" min_lon="36.3515771"
max_lat="50.0112144" max_lon="36.3586195">
<tag k="created_by" v="Potlatch 2"/>
<tag k="build" v="2.3-375-g9f05171"/>
<tag k="version" v="2.3"/>
</changeset>
</osm>

Closed: (http://www.openstreetmap.org/api/0.6/changeset/11167430)
<osm version="0.6" generator="OpenStreetMap server">
<changeset id="11167430" user="bergfrei" uid="327035" created_at="
2012-03-31T15:11:30Z" closed_at="2012-03-31T15:16:55Z" open="false" min_lat
="47.9912789" min_lon="9.7206276" max_lat="48.0492344"max_lon="9.8521079">
<tag k="comment" v="Hochdorf Ausgleich Luftbildversatz"/>
<tag k="created_by" v="JOSM/1.5 (5047 de)"/>
</changeset>
</osm>

Or have I missed something?



>  It is better to detect problems when they occur, not up to 24 hours after
> they’ve occurred.
>

That's correct. A good practise would be, to code it as abstract as
possible and so only parse modify/delete/create sets. The origin
(minute/hour-diff/changeset) will be ignored.

I try to take this into account in my proposal.

Thanks for all of your ideas! It's time to finish my proposal :)

Regards,
Morris



> ****
>
> ** **
>
> *From:* kabum [mailto:uu.kabum at gmail.com]
> *Sent:* Tuesday, April 03, 2012 2:20 AM
> *To:* Derick Rethans
> *Cc:* OpenStreetMap dev list
>
> *Subject:* Re: [OSM-dev] Google Summer of Code****
>
> ** **
>
> Hi,****
>
> ** **
>
> Am 2. April 2012 22:20 schrieb Paul Norman <penorman at mac.com>:****
>
> A tool that operates on the changeset level is
> https://github.com/pnorman/osm-weirdness****
>
> It detects changesets that have a high probability  of being an import or
> mechanical edit. The detection is pretty crude but it does find a fair
> number of undocumented imports, mechanical edits, and other weirdness. If
> you point it an old state.txt file it will start in the past and work up to
> the present.****
>
> ** **
>
> I've a look later this day on your script.****
>
>   ****
>
> When working with the minutely diffs there are some limitations:****
>
> Limited knowledge of changesets. In practice, if you start your detection
> an hour in the past you can have a list of all open changesets, but it is
> not possible to know the tags of the changesets.****
>
> No knowledge of the previous state of objects. You know where deleted
> objects were, but you can’t tell how far an object is moved or what it’s
> tags were before. To tell this you need to query a service with a full
> history DB, and handling full history files is difficult.****
>
> No knowledge of way geometry if using existing nodes. Iandees’
> https://github.com/pnorman/osm-weirdness/tree/way_check solves this by
> fetching nodes in a way that aren’t also in the changeset from jxapi and it
> can then detect bad geometry (e.g. ways that trace over themselves)****
>
>  ****
>
> If you were to code a vandalism detection tool I think it should work on
> the minutely replication diffs (
> http://wiki.openstreetmap.org/wiki/Planet.osm/diffs)****
>
> ** **
>
> I thought about analyse the data after the changeset is closed, but this
> diffs sounds also good. I will check this way :) Thanks!****
>
>  ****
>
>  ****
>
> Am 3. April 2012 09:38 schrieb Derick Rethans <osm at derickrethans.nl>:****
>
> On Mon, 2 Apr 2012, kabum wrote:
>
> > Result:
> > - each changeset has a total rating -> use a treshold value to divide
> them
> > into suspicious and not suspicious****
>
> Instead of just using static thresholds, I think that something like SVM
> (http://en.wikipedia.org/wiki/Support_vector_machine) might be highly
> benificial here; and it's another cool technology to play with. There is
> a cool library for this (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) and
> I know there is at least an extension to use it from PHP:
> http://phpir.com/support-vector-machines-in-php****
>
> ** **
>
> Thanks for this method ... seems to be very suitable for our use case.****
>
> ** **
>
> I've already some years of experience of PHP, but I wouldn't prefer it for
> this part of the project. I thought about Python (libsvm has native Python
> bindings ;)) ****
>
> ** **
>
> ** **
>
> ** **
>
>
> > Some questions came up within this preparation:
> > - Is there a prefered language? Has this to be specified within the
> > proposal? (language skill has to be rated, so I would decide this during
> > the project phase)****
>
> Not really any preferred language. What did you have in mind? For the
> front end I was thinking PHP, but the engine, I wouldn't know. I think
> something high performant (so C or C++) might be benificial.****
>
> ** **
>
>
> My thoughts were that it's easy to setup and it's capable to call it easy
> from a terminal or to include it in other python scripts (i.e. web
> frontend).****
>
> ** **
>
> If C++ is necessary, because of it's speed, then I think I could master
> this. In the passed semester I participated in a software engineering
> partical training at university (in a team of five fellow students), where
> we have an extensive use of C++ (https://github.com/brainafk/Empire).****
>
>  ****
>
>
> > - I also would like to discuss used libraries and framework within the
> > project phase, or should I decide this also in my proposal?
> > - Should the frontend integrate in the current website (ruby on rails
> > project) or should this just be an optional feature?****
>
> I think it can easily live as it's own website.****
>
> ** **
>
> Ok :)****
>
>  ****
>
>
> > - How detailed should be the proposal? Is it enough to formulate this
> draft?****
>
> That's a tricky one, the more information you provide the better I
> think, as it shows you have thought about it :-)****
>
> ** **
>
> I think it grows a lot by this discussion and I try to be as detailed as
> possible. :)****
>
> ** **
>
> Thanks for the response :)****
>
> ** **
>
> Regards,****
>
> Morris****
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20120405/09e8454b/attachment-0001.html>


More information about the dev mailing list