[OSM-dev] Google Summer of Code

Paul Norman penorman at mac.com
Mon Apr 2 21:20:21 BST 2012


A tool that operates on the changeset level is https://github.com/pnorman/osm-weirdness

It detects changesets that have a high probability  of being an import or mechanical edit. The detection is pretty crude but it does find a fair number of undocumented imports, mechanical edits, and other weirdness. If you point it an old state.txt file it will start in the past and work up to the present.

 

When working with the minutely diffs there are some limitations:

Limited knowledge of changesets. In practice, if you start your detection an hour in the past you can have a list of all open changesets, but it is not possible to know the tags of the changesets. 

No knowledge of the previous state of objects. You know where deleted objects were, but you can’t tell how far an object is moved or what it’s tags were before. To tell this you need to query a service with a full history DB, and handling full history files is difficult.

No knowledge of way geometry if using existing nodes. Iandees’ https://github.com/pnorman/osm-weirdness/tree/way_check solves this by fetching nodes in a way that aren’t also in the changeset from jxapi and it can then detect bad geometry (e.g. ways that trace over themselves)

 

If you were to code a vandalism detection tool I think it should work on the minutely replication diffs (http://wiki.openstreetmap.org/wiki/Planet.osm/diffs)

 

 

From: kabum [mailto:uu.kabum at gmail.com] 
Sent: Monday, April 02, 2012 9:31 AM
To: OpenStreetMap dev list
Subject: Re: [OSM-dev] Google Summer of Code

 

Hi,

 

I thought about this proposal and this is the current state:

 

The processing part (called "engine") should be seperated from the interface (website).

 

engine - this part processes specific changesets and put the resutl into a database

website - frontend to display stored data (dashboard), mark false positives/negatives

 

extensibility of the engine:

- each criteria (for example see http://wiki.openstreetmap.org/wiki/Detect_Vandalism#Criteria - this seems to be a good base) is represented by a plugin

- plugins return a score (integer) which stored in the database

- different types of plugins:

  ° single changeset scope (i.e.: mass deletion/import, very far movement of nodes)

  ° multiple changeset scope (i.e.: many changesets within short time per user)

  ° user related score (i.e.: date of registration, number of edits, blocked user?)

  ° area related score - mark specific area as a suspicious one for some time (i.e.: vandalism of a area by several users)

- these scores are may summarized by type and then multiplied/weighted

- engine has to create "fake changesets" containing changes from several changesets being in relation (user, time window) to detect splitted changes

 

Result:

- each changeset has a total rating -> use a treshold value to divide them into suspicious and not suspicious

 

Testing:

- previous incidents  <http://www.openstreetmap.org/user_blocks> http://www.openstreetmap.org/user_blocks 

 

Some questions came up within this preparation:

- Is there a prefered language? Has this to be specified within the proposal? (language skill has to be rated, so I would decide this during the project phase)

- I also would like to discuss used libraries and framework within the project phase, or should I decide this also in my proposal?

- Should the frontend integrate in the current website (ruby on rails project) or should this just be an optional feature?

- How detailed should be the proposal? Is it enough to formulate this draft?

 

Point out my mistakes and feel free to ask questions, criticize this draft and share your ideas and thoughts. :)

 

Best regards,

Morris Jobke

 

 

Am 26. März 2012 12:14 schrieb kabum <uu.kabum at gmail.com>:

Hi,

 

me again. Derick answered my PM and I recognized, that I've missed some features.

 

The interface should be a simple website listing the suspicious changesets. As well a possibility to mark false positives and false negatives were great.

 

Derick suggested also a integration with JOSM and mentioned its changeset reverting capabilities.

 

Best regards,

Morris Jobke

Am 26. März 2012 00:36 schrieb kabum <uu.kabum at gmail.com>:

 

Hi,

Am 19. März 2012 22:45 schrieb Graham Jones <grahamjones139 at gmail.com>:

 

Hi,
Thank you for your interest in applying for GSoC with Openstreetmap.   This list will be fine to ask questions.

Here are a few suggestions to get you started:

- It is important to understand the fundamentals of what OSM is, so if you have not done so before, please start by creating an account and making some improvements to the map in your local area.

I heard of OSM a long time ago, but was just to lazy to contribute to. So I tried these days and I was really surprised how fast changes are visible in the rendered map. I've taken several notes of my surrounding waiting for filled into the OSM database. :)

- It would also be good to look at the OSM data structure.  Details of the xml file format can be found on our wiki. 

Done :) 

- If you search for Nominatim on the OSM wiki you should find some information on the current service and links to the source code to see how it currently does searching to see how it could be improved. 

The project idea was suggested by 'sabas88' - could he/she provide some more information on the issues behind this project suggestion please?

I've asked him and the only answer was a link to the GSoC project site in the OSM wiki. :(

 

I read a lot about OSM, it's mechanism, assistant tools, etc and also about Nominatim and I realized, that this isn't what I want to do. I've looking for some other contribution to OSM and GSoC and found the suggestion for an quality assurance tool specialized for edits/changesets (by Derick Rethans). There are many quality assurance tools but no one like this - or have I missed it?

 

The idea is to have a engine that gets a (set of) changesets or edits and analyse them. It should detect things like logical mistakes, mass deletions without corresponding insertions, etc and take also user metadata like duration of membership or count of his edits into account. It would be great if it compare the changes with current state of the data in this area and detect senseless checks, because the data is out of date and already corrected. 

 

Some other things to keep in mind while planning:

- extensibility through "plugins": engine (calls)-> several detection plugins

- there could be searches for suspicious changesets/edits in specific area

 

This was just a quick outline of the proposal. Are there some suggestions, wishes, questions or doubts?

 

In the next days I plan to specify this proposal.

 

Best regards,

Morris Jobke

Hope that helps.   Please feel free to ask more questions as you develop your proposal.

Regards

Graham

 

On 19 March 2012 21:28, kabum <uu.kabum at gmail.com> wrote:

Hi,

 

I am interested in "Nominatim (or alternative)", but there isn't any mentor mentioned. Where could I discuss about the idea?

 

Best regards,

Morris Jobke

 

_______________________________________________
dev mailing list
dev at openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev





 

-- 
Graham Jones

Hartlepool, UK.

 

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20120402/02239eda/attachment-0001.html>


More information about the dev mailing list