[OSM-talk] Data primitives (was: The segments vs ways vs superways question again...)

Wed Jan 3 12:04:43 GMT 2007

We have been discussing the data primitives used for OSM on this list lately.
Currently the primitives are nodes, segments and ways. Other models with
nodes, ways and superways or with groups are discussed here.

I want to step back a bit and think about this a bit more systematically:

There is the real world outside, with many complicated things in it. Some
of these things we want to put on our maps. Things in the real world always
exist in three dimensions but we abstract from that and store only the part
of the information we really need.

Small things like post boxes can be thought of as *points*. Larger length
things like roads can be thought of as *lines*. And even larger things like a
park need to be *areas*.

For the points we are done. We have the data primitive called "node" for
that which just has a coordinate and thats it.

For the lines it is a bit more complicated, because they can wind around.
Think of a track somewhere in the countryside with nothing else around. It
is one track, but we have to represent it as several straight lines tacked
on each other. The track is represented by a list of lines (often called
a polyline). Note that each single segment of this track doesn't have any
life of its own. We only need those single segments, because the road is
windy and we can't magically know just from the endpoints where the bends
are. But logically it is only one thing. If you wanted a topological diagram
which only shows how the road network is connected, you would not need the
details of this polyline, just the endpoints and how they connect to other
polylines.

Note that if you think about it this way, than the polyline *always* stops
at an intersection or other node that is somehow more than just a point
where the road goes through. So for every gate, crossroads or fork in the
road, you'll have to start a new polyline. There is no way a polyline can
fork. If you want to give driving instructions along a polyline, the only
thing you ever have to say is "follow the road".

Note that a polyline has a beginning and an end. We'll need that later for
oneways etc. But there is no need for each segment in the polyline to have a
direction, they *must* all go the same way.

On to areas: They are a list of lines which enclose an area. So they are
sort of a polyline which loops back onto itself. But it gets more
complicated because areas can have holes. To model them you need more
polylines forming another loop. And it gets more complicated from there,
so I'll skip over the details here, we can look at them later.

So, we now had the first step which was translating the *physical* features
into a simple mapping of points, polylines and areas. The next thing is to
consider the *logical* characteristics of all these things.

For points it, again, is easy. So lets go straight to the polylines.

We already said that the polylines are only there to be able to draw wiggly
lines properly and that every time something interesting happens, like if
there is a gate, we need to start a new polyline. So we could tag all
polylines with their respective features and would be done. But in many
cases a residential road, for instance, would consist of several polylines,
because it intersects with other roads etc. So we probably need more that
that. But we only need more, because we don't want to have 17 polylines, all
with "name=High Street" when we somehow could do this in one go.

Lets look at this in more detail with some examples:

We have a residential road which is part one way. We can tag all polylines
with "highway=residential" and "name=High Street" and some of the polylines
with "oneway=true". Or we could have some "higher object" which groups all
the polylines of this road and tag it with "highway=residential" and
"name=High Street" and another "higher object" which groups only the one way
street parts an tag it "oneway=true".

Now on to something more complicated: There is a primary road and a tram
line down the middle. They both follow the same path, so they should use the
same polyline, which is - if you remember from above - just the
representation of the physical way the road winds about. So the road and the
tram line might share a path, but logically they are very different and have
different features.

Why do they have to share the same path? Well, think about what happens if
you find out that the guy who entered the data was off a little bit with his
GPS coordinates and you have better ones. You correct the points that make
up the polyline. You want both the road and the tram line to move. And the
renderer can only draw this properly if they are the same either.

So, what we see from all this is that we don't want to tag the polyline
itself, because after all it is just a representation of the physical windy
path. Instead we want to tag some "higher object" which consists of one or
more polylines. These objects can share polylines between them. In database
parlance there is a many-to-many relation between polylines and these "higher
objects".

You thought this was complicated? It gets worse...

Lets consider a primary road that goes through a larger town, you'll probably
have sections with different speed limits and other characteristics. Some parts
may be single-lane, some multi-lane, some divided, some not. But it is still
only one road with one name. Or even worse: A motorway, crossing a whole
country. Currently we draw both parts of a devided motorway as different paths,
but thats not right. You'll never be able to render motorways properly on all
scales doing that. They follow, after all, the same polyline. Sure one part is
a few meters to the left and one a few meters to the right, but thats the same
for any track (or a road with cycle lane). If you look on any map, you'll see
that they draw a motorway or trunk road as one polyline, typically rendered as
a colored line with a black line on each side and a black line in the middle.
And if you are not confused yet, think about how to model exits and motorway
links in this case.

I leave the case of areas as an exercise for the reader. :-)

So, I am strongly in favour of ditching the "segment" as a primary object in
our database and introducing a "polyline" instead. I am not so sure about
ways, superways or groups. Thinking further about what I have written above,
we could probably come up with a way to represent all this in a coherent
data model, but that would probably be unusable for everyone but experts,
and thats not what we want. But we really need to address these issues.
Currently there is no way to model and draw motorways or streets with trams
on them properly. And I haven't even gotten into how we can do routing with
this kind of data.

Jochen
-- 
Jochen Topf  jochen at remote.org  http://www.remote.org/jochen/  +49-721-388298