[Openstreetmap-dev] CSV transport encoding scheme

Immanuel Scholz immanuel.scholz at gmx.de
Tue Jan 24 19:02:03 GMT 2006


Hi,

> I think it's better to use tabulation instead of coma as separation
> character. So we can use coma in properties values without escaping,

I am not sure, but we may loose some applications as Exel when choosing
something different than comma as separator.


> and coma can be use as separator for element list.

This means, you suggest mixing two different CSV-Systems into each other.
The first is using tab as separator and embedded in one data field is a
second CSV-line with comma as separator.


In theory, we could do that, but this is not simple CSV anymore.

(*sigh* Another "own invented text based format". Maybe fixing XML would
be an attractive alternative to this.)


> Next, it's a bad idea to have a variable number of arguments in the
> middle of a line (segments list for streets in your example).

I agree, that it is not too clever to have a variable amount of entries in
a CSV-line, regardless at which position in the line this list is.

CSV is made for static data structures where the meaning and number of
elements per line is the same for the whole document. My scheme sacrifice
this big advantage (in performance and simplicity). So does yours.

OSM has different data structures and I don't see a simple way to transfer
them in a static length list together without much hassle.


> The parser don't have to guess when to change reading segment ids and
> start to read properties.

The parser must not have to guess anything. Every list has the number of
elements prior to the list data, so the parser exactly knows when the
variable sized list ends.

However, the parser can not know the number of elements when starting a
line (which is true for my and your suggestion)



> 0.3	"created on Friday, 13th"	JOSM V1.1
> node	1234	51.2232,11.4232
> node	1235	51.2121,10.9996	oneway	name:Baker Street Corner
> segment	1236	1234,1235	name:Baker Street
> street  2345	1236,1237,1238	arg2	arg3:value3	arg4:value4

Now you introduce a third seperator, the : which seperates keys from
values. This means you have:
- tab seperated fields
- comma seperated lists
- colon seperated properties (with optional one or two elements)



To make my point clear: I don't want to completly reinvent just another
text based transport scheme. This is just a thing which would have fit 30
years ago. There are plenty of well defined, editor- and library
supported, debuggable, established transport formats out there.


Ciao, Imi.






More information about the dev mailing list