[Openstreetmap-dev] Re: OSM Schema Design

Sun Jan 29 21:54:45 GMT 2006

Immanuel Scholz wrote:
> Hi,
> 
>> I have seen that. It's a good start and it's good to be able to validate
>> the XML data generated by the server with the XML Schema.
> 
> Currently there does not exist an XML Schema in form of a schema file.
I referred to this schema file on the svn server:
	http://www.openstreetmap.org/svn/schema/osm.xsd

I guess above osn.xsd schema had been validated against the XML examples
you are referring below as 'verbal XML schema'.

Sadly this schema format is hardly readable for humans·.. :-)

> Everytime I say "XML Schema" I mean the roughly verbal definition at the
> wiki.

You refer to these pages?
	http://www.openstreetmap.org/wiki/index.php/XML_Schema
	http://www.openstreetmap.org/wiki/index.php/Talk:XML_Schema

This explains a lot.

> Since we already have performance problems within the XML code of the
> server, I would strongly prefer not to add an automatic XML validation
> until some profiling measurement is set up to show that this is not an
> impact to performance too (I don't trust Ruby's XML code anymore ;-)

I guess the XML Schema had been created to model the XML structure and
to validate some XML samples generated by the server in order to assure
it delivers valid XML.

I am uncertain, whether good validating parsers would be faster or 
slower than non-validating ones.

>> It is my understanding that in mid term OSM street map data needs to be
>> structured differently than today, as there are many things, which have
>> not been covered yet.
> 
> Maybe, maybe not. I believe the current data structure is powerful enough
> to handle all needs an open STREET map could have.

I think the central point for everything should be a node. Data should
become structured in such a manner, that the computer understand what
the objects are and can present them in an  appropriate manner for the
current view ('usage scenario').

The way the data is currently structured, it is kept simple. What is good.
The current data structure seems to support drawing of a map, while 
other usage-scenarios such as navigation with text or voice output will 
require enhanced data structures. For example, here the computer must 
really be able to distinguish a roundabout from an plain circular 
street, as the a message like 'leave the roundabout at the second exit' 
is expected.

>> rivers,
> Property "class=river" on a street could be a way to do this.
Looks quite wired to me. Property on a track would be better and street
should be property of a track.

>> lakes,
> Property "class=lake" on an area.
Looks better.
But still there may be islands inside ;-)
Maybe areas will need have other areas inside.

>> bridges,
> "class=bridge" on a line segment
OK.
But here seems to become enhanced information required. If the bridge X 
of street A has been build cross street B, then it would be useful to 
associate that information with the map data.

>> house numbers
> "house_number=xx" on a node
But the node must be somehow part of a street or have a reference to a 
street.
Otherwise (e.g. at a crossing) its impossible to tell, to which street 
it belongs. Additional further information must be provided, to tell on 
which street-side the house is located.

>> forests
> "class=forest" on an area
OK. But also forests may have holes with non forest parts...
See above.
>> street types (e.g. motorways, country roads, city roads, bicycle roads)
OK.

> one way streets
Direction must be defined relative to the street.

> All these are properties on a street.

>> railways,
> Property on a street
Looks quite wired to me.

A tram driving in the middle of a street would be a street in the middle 
of a street?

>> restrictions (max vehicle speed,
> Either property on a street or on a line segment
Property of one or more a directed line segments. Restrictions depend on 
the driving direction.

>>max vehicle height,
> Either property on a street or on a line segment
Property of one or more a directed line segments at a particular node?

>>max vehicle weight
> Either property on a street or on a line segment
Property of one or more a directed line segments at a particular node?

>> roundabouts,
  > Property on a node for small roundabouts or on several line segments for

> more complex ones. Maybe on a street which contain exactly all line
> segments which participate on the roundabout.
Prefer property of a type 'roundabout' and of type street too. ;-)
Connected to several tracks of type street. It might be worth to
consider storing the angels at which the tracks arrive at the roundabout
in order to have better maps for navigation.

>> motorway drive-up,
> Property on a line segment, node or street - depending how complex the
> drive up is.
There are usually a lot of different motorway drive-up types. For 
navigation purposes it might become required, to assign the type in 
order to provide accurate direction.

>> country borders,
> Although I disbelieve this should be in an streetmap database, if you want
> to enter it, make it a property on a street surrounding the country.
Looks rather like a huge type of area me.

>> information to support routing,
> Property on the object you want to give hint for.
Something that might be used to calculate the duration (time, length) of 
a travel between two  points and is used by some kind of least cost
routing algorithm to calculate the fastest or shortest path.

>> railway stations, etc.
> Property on a node.
See house number.

> I hope you see my point. I strongly disagree to make the data structure
> more complex than necessary if not given a good argument.
> 
> Maybe there will be reasons for not expressing something as properties but
> include it into the data structure. Please argue why you think a change of
> the data structure is necessary for the examples above.

I fully agree with you that data structure should be kept simple. Also 
it is likely better to think twice before not well understood data 
elements become introduced.

I see 2 main things, why a complexer data structure might become worth 
to be considered:

Reason 1. To reduce data volume.
E.g. a roundabout needs just a center, a radius. If the roundabout 
becomes enhanced with exit IDs defined by degrees e.g. from north, each 
exit ID would replace an additional node.

Other examples could be modeling streets by using curves instead line 
segments.

Reason 2. To support other use cases than drawing a map.
E.g.
- for navigation is additional direction information (text, voice) needed
- to enhance searching / indexing capabilities (like show me all 
roundabouts, etc.)
- to calculate routes (e.g. fastest, shortest, fastest with vehicle 
height 4 m)

 > Wiki pages ..
> Maybe helping there is what you want ?

Yes. Above discussion should be shifted to the Wiki pages.

> I see room for simplifying the API. For example I don't see the need to
> get single objects by id, when they already come fully described out of
> the map - request.

> Steve is currently simplifying the database and I am sure has some ideas
> for changes to 0.3 API too.. ;-)

> I think the "code after demand" approach is far better here.
No, not the waterfall model - it never works.
I propose rather to capture the causal relationships between data 
structure and possible usage scenarios as early as possible.

The better these relations are commonly understood, the more intentional 
decisions are made, the better they get.

>>> but it is the current plan to
>>> test implement a CSV output of the object schema (all XML
>>> stuff replaced
>>> with a simple CSV), because the server spent most of the time encoding
>>> the XML.
Forget CSV. This will make en-/decoding very had and error prone. 
Despite you will severely fail, when the data structure should become 
more complex as today. Thus it will even become a risk in the future.

> However, even the XML encoding is too slow.
I'd guess then there is a problem in the XML implementation, either in 
the way then data is stored or in the way the stored data is serialized.

Compare speed to using 'print' command.

>> By the the way: unlike XML Schemes, which not only define the data
>> structure, but also how the data is encoded, ASN.1 schemes just define
>> how the data is structured and keep the data encoding separated to an
>> appropriate encoder.
> 
> I think XML was not chosen because XML Schemes has to be used. XML was
> chosen because it is the first and simplest idea that worked. Evidence to
> this is, that no XML scheme validation is present anywhere in the code
> now.
> 
> If you know ASN.1 well  and if you point some ruby coders to ASN.1
> libraries and define a ASN.1 scheme on how data are transfered

Libraries with open source license are currently only available in C. I 
expect the effort build support for another language as high.

Here is an excellent open source ASN.1 compiler:
http://lionet.info/asn1c/

A designing an OSM ASN.1 schema would be no problem for me.

If http: is capable to transfer binary data the existing BER encoder 
could be used. Otherwise a new ASN.1 encoder type would have to be 
developed too.

> and if you convince that coder that ASN.1 is better than, say, CSV or XML, then maybe
> it get implemented and used as transport mechanism.

I guess, that in encoding data the benefit of ASN.1 compared to XML is 
not so dramatic (factor 2), if you have an efficient XML parser, while 
in decoding the speed may be significant (factor 20). These numbers 
apply to using the ASN.1 BER encoder.

Here are some ASN.1-XML encoding speed comparisons:
	http://www.obj-sys.com/docs/ASN1forBinXML.pdf

These are from one ASN.1 tool vendor, but I have seen similar documents 
from other sources.

> I never looked at ASN.1 more than I was forced to during study. 
 > To me it looks weird, bloated and complex.

It takes some time to get used to the syntax. Originally it had been 
designed to specify protocols. later on it was enhanced to encoding of 
protocols.
It's good in:
- protocol specification
- design of backward compatible protocols
- efficient encoding (information per byte), as needed for low data rate 
bearers
- low memory consumption and processor requirements as available for 
embedded system

 > I prefer simple solutions.
I'd recommend to optimize XML first. Going to ASN.1 for encoding would 
only be useful, if there is a significant benefit from reduced message 
size or increased decoding speed.

Using an ASN.1 schema instead of an XML DTD to specify the data 
structure might still be an option worth to check.

Br,
Michael