[OSM-dev] osm2pgsql -C vs --flat-nodes

Wed Sep 19 14:38:46 BST 2012

Greetings developers,

I'm in the process of attempting to load the newly licensed planet and 
have recently learned about --flat-nodes in osm2pgsql.  I'm trying to 
make use of this feature to reduce the disk space consumption of the 
--slim updatable database, but I'm having issues getting enough memory 
allocated to my VMware virtual machine to complete the import.  It runs 
out of memory querying the pending_ways.

I've looked through the code and it appears that using a -C 14000 in 
conjunction with --flat-nodes may be redundant as they're both 
attempting to speed up access to a node's coordinates, the -C by keeping 
it completely in RAM and --flat-nodes doing a RAM-based cache of 10,000 
disk-based blocks of 1024 nodes each.  Granted the -C 14000 manages to 
hold all 1,569,263k nodes in RAM (at 98.9% full) while the --flat-nodes 
will only hold 10,240k nodes in RAM, so I can expect a (significant?) 
slowdown, but...

Can I dramatically reduce, or nearly eliminate the -C node cache and let 
the --flat-nodes pick up the slack for the planet import?  Will this 
work?  And will it be nearly fast enough to be reasonable?

Lynn (D) - KJ4ERJ

PS.  I've got a 6 core VM with 24, 28, and then 32GB of RAM hosted on an 
8 core i7 with only 28G of physical RAM.  I know I'm paging the VM.  
Disk configuration is one virtual drive for the root and 3 virtual 
drives (each on a different physical spindle) lashed as a RAID0 array 
for the gis DB.  I'm using the following import command:

osm2pgsql --slim -d gis -C 14000 --number-processes 6 --flat-nodes 
/mnt/SSD/flatnodes/flatnodes.osm /mnt/raid0/planet/planet-120912.osm.pbf

PPS.  I started with bunzip2 -c /mnt/raid0/planet/planet-120912.osm.bz2 
|  osm2pgsql ... /dev/stdin, but WOW is the PBF faster, especially on 
the node portion with a 107k/sec node rate for the bz2 and 807k/sec for 
the pbf.

PPPS.  I know I need SSD, but that's not in the $$$ picture at the 
moment.  After the planet import is complete, I move the rendering 
tables onto an SSD, but I'm not sure how to tell osm2pgsql that I'd like 
those tables created in the alternate dataspace (called, appropriately, 
SSD in postgresql).