[OSM-dev] Reducing osm2pgsql memory usage using a database method

Frederik Ramm frederik at remote.org
Mon Mar 12 11:18:24 GMT 2007


Hi,

> I don't understand your argument here. If there are  problems with  
> planet XML lets fix them, parsing XML with regex wont get you far.

In that case, you can surely show me the DTD or XML schema for the  
planet file ;-) and if you made it up yourself, then the question  
must be permitted in how far such a guesswork XML schema is any  
better than (guesswork) regular expressions.

In a situation where there's only one single provider of the XML  
data, I don't feel I have to be able to handle all sorts of strange  
XML formats that might describe an equivalent parse tree. Of course,  
anyone is free to write code that processes hundreds of hypothetical  
variations of the planet file in addition to the real one if he so  
desires.

Anyway, I was only saying that for my specific applications - which  
include selecting polygonal areas out of a planet file, finding last  
modification timestamps for t at h tiles, and occasional experiments -  
nothing beats running through the file with Perl regular expressions  
(I tried a non-validating SAX parser). Since most of these tasks are  
only done once per planet file, I hazard the guess that I'm even  
faster that way than importing everything into a database and running  
analyses from there.

As for osm2pgsql, I don't even know what it does, I only know that it  
is a C program that used to keep most of the planet file in memory  
making it unusable on low-end machines, and that there's another  
version that holds temporary data in a database table which is much  
slower but consumes hardly any memory. From that, I derived my  
suggestion that the plant file might probably be processed in chunks.  
 From your rather cynical reaction to that, I gather it must have  
been a stupid idea. Ok, no problem, sorry to have intervened, I'll  
stick to my own toys in the future.

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00.09' E008°23.33'






More information about the dev mailing list