[OSM-talk] Data Model, was: Server Slowness
Frederik Ramm
frederik at remote.org
Tue Jan 16 11:37:17 GMT 2007
Hi,
I'm rather new to OSM and this list, but I'd like to offer my
two cents as well.
In my eyes, there's a low-level data model and a high-level data
model. The low-level data model is about how many digits of precision
we store for co-ordinates, what kinds of indexes we use on the
database, whether we use an SQL database at all, and things like that.
It seems that the current low-level data model works quite well given
the request load and hardware constraints, has been thoroghly tested
and compares favourably (performance-wise) to prodcuts that come with
spatial support. For external users, the only thing that should
matter is how quickly the database operations complete. Whether,
behind the scenes, there's a MySQL or PostGIS or flat file system
should not be of any significance. I do not really understand the
repeated calls for using PostGIS or whatever; the proponents of this
seem to assume an automatic performance gain by using established
systems instead of homemade structures, but this assumption has not
been proven.
The high-level data model describes the objects and their
relationships. We currently support nodes, segments, and ways, each
of which may have any number of attributes, and we have a wiki-like
version control. This "wiki" aspect is often overlooked by people who
are just interested in the current dataset (and indeed it may, at
some time in the future, make sense to divide the two, allowing write
access only into the fully versioned wiki database but granting read
access from a much faster "only current" system).
Above the data model there's the API and application layers defining
what the users see and what they can do. Many perceived shortcomings
of the current model could be fixed on either the high-level data
model or the API layer, some even on the application layer.
For example, take dual carriageways which are currently often
modelled as two distinct sets of segments. This makes sense for high-
detail rendering but becomes ugly if you want to paint a motorway map
for the whole country. You could simply solve that on the application
layer by - and this is just one possible example - tagging the
segments making up one half of the way with a sort of high-detail-
render attribute and omitting them when rendering low-detail. Your
favourite editor could do that automatically whenevery you tag
something a "motorway". You could, on the other hand, change the high-
level data model to allow a level-of-detail sensitive classification
and grouping of objects in general.
Another example is routing information. Say you have an intersection
of two roads, each with two lanes for every direction; 100 metres
before the intersection, a "turn left" lane is added on the left of
every road leading towards the intersection, and we need to model the
fact that turning left is only allowed from the left lane, and we
need to alert the user in due time to move onto the left lane if he
is to make the turn. The fact that certain turns are possible and
limited to certain conditions is often contained in "maneouvre" data
objects in routing data models. A maneourvre consists of a segment
leading to a node, the node, and another segment leading away from
it. All this could be modelled on the API or application layer with
our current data model, e.g. by having an attribute tacked to a
segment that reads
"maneouvre=left_turn_at_node_321345633_requires_left_lane_leads_to_segme
nt_54223453" or something (probably more computer-readable). It could
also be included in the data model by introducing a conecept of
qualified relationships between objects; in that case, one could
retain better control over the structure because the integrity
constraints could be safeguarded by the core instead of relying on
applications to do it right.
It has always been wiki-style to just see what happens and put the
structure there later. I believe we will see a similar development
with OSM. At first, people will be happy to have their streets on the
map at all. At some point in the future, a more precise specification
will be required, which can and will first be achieved by creating
multitude of new attributes instead of overhauling the whole data
model, and at a later time the data model may be changed to reflect
these requirements.
The data model is very human-readable at the moment; you can just
fire up your editor and see the attributes and understand them. This
will probably degrade as more complex attributes with inter-object
relationships are inserted, and specifically as they start being
inserted automatically by editing software. (Come to think of it, one
could even build a spline drawing mode into an editor and then add
the spline handle positions as segment attributes. Cool, as long as
you use that specific editor...)
People calling for a switch to GIS-capable systems often cite
interoperability as a reason; I believe that can be achieved by
writing converters and filters. It works in the graphics world, so
why shouldn't it work here. I know that if I save my PNG as JPG and
then re-convert it to PNG, it will not be the same PNG because
information was lost. The same will probably apply to every sort of
data transfer between systems. But using a GIS-capable backend will
not change that.
My whole opinion in a nutshell: (1) no need to switch to a spatial
whatever database if the current system works. (2) data model may
need redesign some time in the future to cope with complex routing
and rendering tasks, or else people will start stuffing everything
into attributes whose well-formedness is not guaranteed and (worst
case) may even be application specific. (3) interoperability to be
created through converters and filters.
Just in case someone was interested.
Bye
Frederik
--
Frederik Ramm ## eMail frederik at remote.org ## N49°00.09' E008°23.33'
More information about the talk
mailing list