[Openstreetmap-dev] CSV transport encoding scheme

Immanuel Scholz immanuel.scholz at gmx.de
Tue Jan 24 15:07:22 GMT 2006


after giving you weeks of chances to come up with an ecnoding scheme
featuring something (hopefully) less resource eating than the REXML-Ruby
implementation, I now present my suggestion of an CSV encoding scheme.

0. Why CSV?

I hope CSV is the most simple solution that just not suffer from too
boring problems like endian of integers or encoding characters that come
with something like "pure ASCII" or "pure binary-ish".

1. UTF-8

Everything is UTF-8, no exceptions, no declaration, no alternative.

2. Common definitions

- One dataset (=line) is one data entry. So one line is one node, line
segment, street or area

- properties are not data entries for itself anymore. This greatly
simplify the human interpretion. Transferring of properties later in a
stream can be achieved by other tecniques (as needed).

- Every object has a one-word identifier: "node", "segment", "street" and
"area", which is always the first data entry of every line.

3. Global structure

- First line is containing of at least one data element with a string
version identifier (e.g. "0.3"). There is no such thing as backward
compatible. Either the string matches or the file is incompatible.
- Every other data in the first line can be ignored and are for debug
- After the first line, all other objects follows in any order (with the
limitations described in the XML schema, e.g. nothing references to
something not transfered yet...)

4. The objects

Start part: Every line start with the following values:

obj-class, id

"obj-class" is "node", "segment", "street" or "area".
"id" is the integral id of the object.

Then the object specific part is added:

for nodes:
lat, lon

for line segments:
first_node_id, second_node_id

for streets:
no-segments, segment_id1, segment_id2, ...

for areas:
no-nodes, node_id1, node_id2, node_id3, ...

Finally, all key/values properties are added:
no-properties, key1, value1, key2, value2, ...


0.3, "created on Friday, 13th", JOSM V1.1
node,1235,51.2121,10.9996,2,oneway,,name,"Baker Street Corner"
segment, 1236, 1234, 1235, 1, name, Baker Street

Only discussion of this specific CSV scheme. For discussions about to be
CSV or not to be CSV, please refere to mailing list history (or start
another mail)

- highly understandalbe for human (most I could get out of it)
- near to the current xml scheme (quicker implementation change)
- keeps the almost linear table-like structure, which is much easier to
handle than an object tree

- dynamic length of entries in one line makes it harder to implement in
static languages like C. Dynamic buffer allocation required. However,
since the property strings can be in any length anyway, I don't think this
add too much in complexity
- no overall size counter make it hard to impossible to implement a
meaningfull progress bar. However, calculating the number of data entries
in advantage is a significant performance hit (extra sql query or complete
sql buffer required!)

Ciao, Imi.

More information about the dev mailing list