[OSM-dev] Reducing osm2pgsql memory usage using a database method

Artem Pavlenko artem at mapnik.org
Sat Mar 10 14:56:01 GMT 2007


> I've just uploaded an experimental version of osm2pgsql which uses
> Postgresql database tables for the transient node and segment storage.
> This drops the memory usage from >1GB to ~60MB. On the downside, the
> import time has gone up from 20 to 100 minutes. I'm sure this can be
> improved though with some more database Mojo.
> For further details see SVN (utils/osm2pgsql/experimental/readme.txt)
> or
> http://trac.openstreetmap.org/browser/utils/osm2pgsql/experimental/ 
> readme.txt
Good stuff.

I'm working on new osm.xml and I have some ideas on how to improve  
osm2psql output:

1. We can re-write osm2pgsql in c++ and take advantage of dynamic  
structures e.g std::map, safe formatting and casting   
boost::lexiacal_cast,  boost::format and more.

2. At the moment there are a lot of redundant data in output tables.  
Everything apart from geometries are dumped as 'TEXT' .

We can have a more flexible design where table structure, attribute  
values are configurable (at compile time). Consider this for example:
To render highway features in correct order I want to have z_order  
field in planet_osm_table calculated as follow:

int  z_order ( osm_feature const& feat)
	int layer = 0; //default
	     layer = boost::lexical_cast<int>(feat['layer']);
	catch (boost::bad_lexical_cast & )
               // layer tag has got lots of junk!!!
	int highway_z = 0; // 0..9
	std::string highway = feat['highway']
	if ( highway == 'motorway' || highway == 'motorway_link')
              highway_z = 9;
	else if (...) {}

	bool bridge = false;
	try {
	    bridge = boost::lexical_cast<bool>(feat['bridge']);
         catch (...)  {}
         return   10 * ( layer + bridge?1:0)  + highway_z ;

Also I want to have consistent numeric feature_type calculated  
differently depending on tags/values. This will make rendering more  
efficient and will bring some (needed) sanity to styles in osm.xml.

3. Also we can abstract 'output writing' to have multiple back-ends :  
mysql, sqlite , shapefiles etc .

What do you think?


PS. I'm running mapnik , postgresql , osm2psql on mac os x now and I  
get substantial performance improvements mainly from faster disk i/o.  
I wonder if we should look into finding optimal filesystem for  
postgresql. I'm using ext3 on linux.

> 	Jon
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev

More information about the dev mailing list