[OSM-dev] Server-side data validation

Fri Jul 13 19:04:10 BST 2012

Hi Pawel.
Some reasons - perhaps more exist:
1) The OSM API is a restful api that allows "live" editing: The editor 
software a) opens a changeset, b) creates a node, c) adds a tag - same 
for ways.
Between b and c there's an untagged osm element in the database (even if 
it's in most cases a very short time).
2) Ways without tags may be part of relations nevertheless: an outer way 
of a multipolygon does not necessarily have tags, as the tags applied to 
the multipolygon should go to the relation.
3) the free tagging scheme would allow similar stuff for nodes, too 
(while I don't know any issue where that's used currently). A 
theoretical example would be a set of nodes, which are defined points 
inside a fuzzy area/region and others which are defined points outside 
(where there's no concrete, hard boundary defined, e.g. for "the alpes".

Pushing this validation to the server side has several drawbacks:
- usually server load is the bottleneck in osm, not client load.
- a check on server side would fix the corresponding tagging and makes 
other tagging schemes invalid probably, a contradiction to the free 
tagging scheme we have.
- the api would have to change to use transaction like semantics, wich 
is again higher server load, but the only way to make sure not to create 
these invalid stuff.

regards
Peter

Am 13.07.2012 19:27, schrieb Paweł Paprota:
> Hi all,
>
> Today I have encountered a lot of bad data in my area - duplicated
> nodes/ways. These probably stem from an inexperienced user or faulty
> editor software when drawing building. I corrected a lot of this stuff,
> see changesets:
>
> http://www.openstreetmap.org/browse/changeset/12208202
> http://www.openstreetmap.org/browse/changeset/12208389
> http://www.openstreetmap.org/browse/changeset/12208467
> http://www.openstreetmap.org/browse/changeset/12208498
>
> As you can see, these changesets remove thousands of nodes/ways. I have
> done this using JOSM validators and "Fix it" option which automatically
> merges/deletes nodes that are duplicated.
>
> That is all fine of course but this sparked a thought... why is this
> garbage data like this allowed into the database in the first place? Of
> course it can always be fixed client-side (JOSM, even some autobots) but
> why allow an unconnected untagged nodes or duplicated nodes, duplicated
> ways etc.?
>
> I understand (though don't wholly agree...) the concept of having a very
> generic data model where anyone can push anything into the database but
> it would be trivial to implement some server-side validations for these
> cases (so that API throws errors and does not accept such data) and thus
> reduce client-side work by a very significant margin - i.e. I could have
> been working on something more useful in that time than removing garbage
> data.
>
> Server-side validation could be of course taken even further - OSM
> server could reject meaningless tag combinations etc. - basically JOSM
> validators on the "error" level should be implemented as server-side
> validators, some "warning" level validators possibly as well.
>
> This would ensure data consistency and integrity at least a little
> bit... (of course first bad data would have to be pruned from existing
> database so that it is consistent with validation logic but that's for
> another discussion).
>
> What is the current consensus within OSM dev community on this aspect of
> OSM architecture?
>
> Paweł
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/dev