[OSM-dev] Server-side data validation

Frederik Ramm frederik at remote.org
Fri Jul 13 19:11:31 BST 2012


On 13.07.2012 19:27, Paweł Paprota wrote:
> What is the current consensus within OSM dev community on this aspect of
> OSM architecture?

Historically, the OSM API was meant to be (very) little more than an SQL 
database with a spatial index and history. It was placed at a very, very 
low level, without any understanding about the data - just taking it and 
storing it.

Later, some - very few - consistency checks were added, i.e. the 
database now makes sure that you do not reference an object that doesn't 
exist, or delete an object that is still "in use".

Until today, the database has no concept of what a multipolygon is, or 
whether one-node ways or consecutive identical nodes in a way make 
sense; and it doesn't even begin to look at tags.

There are good reasons for this, most notably the flexibility to do even 
"unexpected" things, but also the avoidance of complexity. The server 
processes thounsands of updates in one minute, and it has enough work to 
do as it is, making sure that all of the 879 members of your relation 
actually still exist. Some checks sound easy enough but when you look 
closely they might actually require loading and inspecting lots of other 
objects (think of a route relation where you would like to make sure it 
is contiguous - requires loading all member ways and comparing end 
nodes!). And some checks sound sensible enough but then suddenly someone 
invents a use case where, say, a one-node way suddenly makes sense.

It is an issue that is worth debating - what aspects of data integrity 
should the server attempt to guarantee - but is is *certainly* not as 
easy as "let's just take JOSM's validator checks and implement them 
server-side". (Not least because these checks are too zealous for that.)


Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00'09" E008°23'33"

More information about the dev mailing list