[OSM-dev] Reducing osm2pgsql memory usage using a database method
frederik at remote.org
Mon Mar 12 11:18:24 GMT 2007
> I don't understand your argument here. If there are problems with
> planet XML lets fix them, parsing XML with regex wont get you far.
In that case, you can surely show me the DTD or XML schema for the
planet file ;-) and if you made it up yourself, then the question
must be permitted in how far such a guesswork XML schema is any
better than (guesswork) regular expressions.
In a situation where there's only one single provider of the XML
data, I don't feel I have to be able to handle all sorts of strange
XML formats that might describe an equivalent parse tree. Of course,
anyone is free to write code that processes hundreds of hypothetical
variations of the planet file in addition to the real one if he so
Anyway, I was only saying that for my specific applications - which
include selecting polygonal areas out of a planet file, finding last
modification timestamps for t at h tiles, and occasional experiments -
nothing beats running through the file with Perl regular expressions
(I tried a non-validating SAX parser). Since most of these tasks are
only done once per planet file, I hazard the guess that I'm even
faster that way than importing everything into a database and running
analyses from there.
As for osm2pgsql, I don't even know what it does, I only know that it
is a C program that used to keep most of the planet file in memory
making it unusable on low-end machines, and that there's another
version that holds temporary data in a database table which is much
slower but consumes hardly any memory. From that, I derived my
suggestion that the plant file might probably be processed in chunks.
From your rather cynical reaction to that, I gather it must have
been a stupid idea. Ok, no problem, sorry to have intervened, I'll
stick to my own toys in the future.
Frederik Ramm ## eMail frederik at remote.org ## N49°00.09' E008°23.33'
More information about the dev