[OSM-dev] Server-side data validation

mick bareman at tpg.com.au
Sat Jul 14 00:21:00 BST 2012


On Fri, 13 Jul 2012 19:27:25 +0200
Paweł Paprota <ppawel at fastmail.fm> wrote:

> Hi all,
> 
> Today I have encountered a lot of bad data in my area - duplicated
> nodes/ways. These probably stem from an inexperienced user or faulty
> editor software when drawing building. I corrected a lot of this stuff,
> see changesets:
> 
> http://www.openstreetmap.org/browse/changeset/12208202
> http://www.openstreetmap.org/browse/changeset/12208389
> http://www.openstreetmap.org/browse/changeset/12208467
> http://www.openstreetmap.org/browse/changeset/12208498
> 
> As you can see, these changesets remove thousands of nodes/ways. I have
> done this using JOSM validators and "Fix it" option which automatically
> merges/deletes nodes that are duplicated.
> 
> That is all fine of course but this sparked a thought... why is this
> garbage data like this allowed into the database in the first place? Of
> course it can always be fixed client-side (JOSM, even some autobots) but
> why allow an unconnected untagged nodes or duplicated nodes, duplicated
> ways etc.?
> 
> I understand (though don't wholly agree...) the concept of having a very
> generic data model where anyone can push anything into the database but
> it would be trivial to implement some server-side validations for these
> cases (so that API throws errors and does not accept such data) and thus
> reduce client-side work by a very significant margin - i.e. I could have
> been working on something more useful in that time than removing garbage
> data.
> 
> Server-side validation could be of course taken even further - OSM
> server could reject meaningless tag combinations etc. - basically JOSM
> validators on the "error" level should be implemented as server-side
> validators, some "warning" level validators possibly as well.
> 
> This would ensure data consistency and integrity at least a little
> bit... (of course first bad data would have to be pruned from existing
> database so that it is consistent with validation logic but that's for
> another discussion).
> 
> What is the current consensus within OSM dev community on this aspect of
> OSM architecture?
> 
> Paweł

As a user of OSM data I would give this suggestion 3 thumbs up. I have just spent 8 days working through the highway=motorway, trunk & primary lines for New South Wales, Australia, about 6,000 objects, and found approximately 40,000 corrections I could make, adding Names, Route Numbers, Oneway and correcting lines representing the ends of dual carriageways where the line start near the end of one carriageway and turns around and continues up the other carriageway.

As I'm as human as the rest of us I would like to find a way to have my repairs 'peer reviewed' to avoid adding errors to the data before I work out how to do a bulk edit.

mick



More information about the dev mailing list