[Openstreetmap-dev] CSV transport encoding scheme

Immanuel Scholz immanuel.scholz at gmx.de
Thu Jan 26 11:44:21 GMT 2006


> Ah, it looks to me like you stopped using a library to write XML.  That's
> a bad idea, since it's really easy to introduce bugs and invalid code.
> A good XML library won't let you emit invalid XML - that's a really good
> thing!

I fully agree. 100%. REALLY! ;-)

> http://www.openstreetmap.org/trac/browser/ruby/api/map.rb?rev=812 line 58
> seems to me to be missing a '>'.

Yeah, excellent example why it is a bad idea to write own xml. I think
this is the problem Steve experienced.

To my defense I want to say that the <osm> tag was the only "own" xml
output I made. I was in the (wrong) believe that the little <osm> tag
could not harm in any case and put it in there.

Since I switched many times between the old code, the new, spiced with
many profiling code between etc. ppp. I missed the correct spelling in the
and (and was not able to test it in a final test, since I had to apply the
changes to a version which mismatched the sql-database script.) ;-(

> Also, if you're worried about performance, why put in '\n' after every
> node?  Pretty printing XML isn't hard, all good XML editors will format it
> for you if you want to edit/read it yourself.

Streaming does not work well usually with no carrige return in it. I hoped
to get the ruby-apache framework to do some buffer flushes this way to
start the transfer of the data before the end of the script (maybe
including to free some resources).

The carrige returns had no impact to performance anyway.

>> Someone (sorry, don't remember) suggested CSV as a very performant
>> replacement for XML some month before, so my first idea was using this
>> encoding scheme as replacement for XML.
> Imi, I think that suggestions to use CSV may have been sarcastic, or at
> least someone playing devil's advocate.

I still have not seen any better alternative. I don't think ASN.1 would
improve the situation nor I don't like to do a home-brewed solution. I
didn't find an alternative XML implementation running under ruby and (as
you pointed out correctly), I don't want to write my own XML output code.

And there is a build in CSV library in ruby.

> Your profiling efforts are good ones, but I would be more interested in
> seeing profiles for real regions of OSM (Birmingham, Oslo, etc), and for
> sizes of regions we'll actually be requesting - take a look at the logs
> and use a typical bounding box, for example.

I have no access to the main server to do this.
Nor do I think you will get very different results (but I hope ;)
Nor do I think it is good to profile only usual case. I suspect, that with
the current implementation, one access to an oversized bounding box may
affect server performance of other requests, so we have to address all
requests, especally worst case requests.
And I think my data is quite representative. You can generate the data for
yourself using sql/make-test-data.rb (or something similar named, forgot
the exact name).

Of course, mirror the actual server data and profile real requests can be
done. If you get a very different result, I am the one who lives most
happy with XML ;-)

> I don't think we will see significant performance improvements by using
> CSV,

By gut feeling or by profiling?

(Well, to be honest, I haven't profiled CSV yet ;)

> and in the short term it will be a huge waste of developer time to
> rewrite 4 clients to a new schema.

The server is unusable currently most of the evening time. This is already
a huge waste of time (not for developer but for users).

If we don't do something against this, OSM cannot grow significant any
futher. I don't like to replace something like XML without a reason.
Unfortunatly we have a reason.

Ciao, Imi.

More information about the dev mailing list