[OSM-talk] Data Model, was: Server Slowness

Tue Jan 16 11:37:17 GMT 2007

Hi,

      I'm rather new to OSM and this list, but I'd like to offer my  
two cents as well.

In my eyes, there's a low-level data model and a high-level data  
model. The low-level data model is about how many digits of precision  
we store for co-ordinates, what kinds of indexes we use on the  
database, whether we use an SQL database at all, and things like that.

It seems that the current low-level data model works quite well given  
the request load and hardware constraints, has been thoroghly tested  
and compares favourably (performance-wise) to prodcuts that come with  
spatial support. For external users, the only thing that should  
matter is how quickly the database operations complete. Whether,  
behind the scenes, there's a MySQL or PostGIS or flat file system  
should not be of any significance. I do not really understand the  
repeated calls for using PostGIS or whatever; the proponents of this  
seem to assume an automatic performance gain by using established  
systems instead of homemade structures, but this assumption has not  
been proven.

The high-level data model describes the objects and their  
relationships. We currently support nodes, segments, and ways, each  
of which may have any number of attributes, and we have a wiki-like  
version control. This "wiki" aspect is often overlooked by people who  
are just interested in the current dataset (and indeed it may, at  
some time in the future, make sense to divide the two, allowing write  
access only into the fully versioned wiki database but granting read  
access from a much faster "only current" system).

Above the data model there's the API and application layers defining  
what the users see and what they can do. Many perceived shortcomings  
of the current model could be fixed on either the high-level data  
model or the API layer, some even on the application layer.

For example, take dual carriageways which are currently often  
modelled as two distinct sets of segments. This makes sense for high- 
detail rendering but becomes ugly if you want to paint a motorway map  
for the whole country. You could simply solve that on the application  
layer by - and this is just one possible example - tagging the  
segments making up one half of the way with a sort of high-detail- 
render attribute and omitting them when rendering low-detail. Your  
favourite editor could do that automatically whenevery you tag  
something a "motorway". You could, on the other hand, change the high- 
level data model to allow a level-of-detail sensitive classification  
and grouping of objects in general.

Another example is routing information. Say you have an intersection  
of two roads, each with two lanes for every direction; 100 metres  
before the intersection, a "turn left" lane is added on the left of  
every road leading towards the intersection, and we need to model the  
fact that turning left is only allowed from the left lane, and we  
need to alert the user in due time to move onto the left lane if he  
is to make the turn. The fact that certain turns are possible and  
limited to certain conditions is often contained in "maneouvre" data  
objects in routing data models. A maneourvre consists of a segment  
leading to a node, the node, and another segment leading away from  
it. All this could be modelled on the API or application layer with  
our current data model, e.g. by having an attribute tacked to a  
segment that reads  
"maneouvre=left_turn_at_node_321345633_requires_left_lane_leads_to_segme 
nt_54223453" or something (probably more computer-readable). It could  
also be included in the data model by introducing a conecept of  
qualified relationships between objects; in that case, one could  
retain better control over the structure because the integrity  
constraints could be safeguarded by the core instead of relying on  
applications to do it right.

It has always been wiki-style to just see what happens and put the  
structure there later. I believe we will see a similar development  
with OSM. At first, people will be happy to have their streets on the  
map at all. At some point in the future, a more precise specification  
will be required, which can and will first be achieved by creating   
multitude of new attributes instead of overhauling the whole data  
model, and at a later time the data model may be changed to reflect  
these requirements.

The data model is very human-readable at the moment; you can just  
fire up your editor and see the attributes and understand them. This  
will probably degrade as more complex attributes with inter-object  
relationships are inserted, and specifically as they start being  
inserted automatically by editing software. (Come to think of it, one  
could even build a spline drawing mode into an editor and then add  
the spline handle positions as segment attributes. Cool, as long as  
you use that specific editor...)

People calling for a switch to GIS-capable systems often cite  
interoperability as a reason; I believe that can be achieved by  
writing converters and filters. It works in the graphics world, so  
why shouldn't it work here. I know that if I save my PNG as JPG and  
then re-convert it to PNG, it will not be the same PNG because  
information was lost. The same will probably apply to every sort of  
data transfer between systems. But using a GIS-capable backend will  
not change that.

My whole opinion in a nutshell: (1) no need to switch to a spatial  
whatever database if the current system works. (2) data model may  
need redesign some time in the future to cope with complex routing  
and rendering tasks, or else people will start stuffing everything  
into attributes whose well-formedness is not guaranteed and (worst  
case) may even be application specific. (3) interoperability to be  
created through converters and filters.

Just in case someone was interested.

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00.09' E008°23.33'