[Openstreetmap-dev] CSV transport encoding scheme
immanuel.scholz at gmx.de
Wed Jan 25 15:18:26 GMT 2006
> Tabs are widely use as separator. It has the advantage that nobody use
> it outside computer world. (if we use coma, we have to escape/unescape
> coma in properties values)
This way we have to escape tabs in key/values. Or else we have to define
that the tab-character is illegal in keys or values. This means, the
transport layer is interferring with the application layer (sometimes also
called "violation of abstraction").
> BTW, with variable-length line I dont think you could do anything
> usefull in spreadcheats.
Yeah.. probably. So either do not use variable length stuff at all or
don't bother to support common delimiter for spreadsheet access ;)
> tokenize each line at tabs.
If you use these as a parser, it also implies that keys/values does not
contain any character incompatible with StringTokenizer/split. These are
line feeds, carrige returns, tabs (since used as delimiter in the outer
structure), \0, EOF...
All these were possible with XML as transport scheme. So it is when using
I much prefere NOT to define a transport mechanism where we have AGAIN
bother about encoding issues. There have done HUNDRETS of other coders
before and I don't see the point why we should just reinvent the
Home-brewed text parsers are vulnerable to encoding problems. CSV is not.
> I think it's much much more _simple_ to use than a flat list of elements.
And I think it only looks simple at first glance. The problems came later.
As example the current scheme is not pure-XML. There is the tags-string
that need to be parsed manually (it is in a home-brewed text-based
format). And this already leads to a bug in JOSM where users cannot
download objects which contain an property with empty key and empty value.
Ok, maybe I am too stupid to implement the tags-structure-parser
correctly, but I am very sure, I would not have this bug if properties
would have been transfered in XML encoding.
> In fact my opinion is that OpenStreetMap should stick with XML for data
> transfert, but it's not a subject for this thread ;)
> Or, at least, keep a structured data format even if it's not purely CSV.
There are several solutions to the current performance profil of the
server. Using a different encoding is one. Feel free to suggest another :)
(As example profile for yourself and proove that my assumption is wrong!
More information about the dev