[osmosis-dev] planet import

Brian DeRocher brian at derocher.org
Sun Apr 10 03:27:31 BST 2011


Hey everyone,

I started a full planet import on the 29th, 11 days ago.  I'm trying to get an idea how long this will take.  I just want to know if this will take about 20 days or more like 40 days.

Here's my setup:

2 dual core Opterons, cpu is not the bottleneck

8 GM ram, htop reports this RES memory usage
	postgres 1082M UPDATE
	java osmosis 91928 (15 processes/threads?)

Areca RAID 5  1T  with 3 disks
	/var is 552 GB, 444 GB used (87%) 80GB available
		This usage has gone up and down from 84% to 91% a few times per day.
	The import added about 300GB.

Debian 6.0

PostgreSQL 8.4 is probably not tuned well for this hardware, and it's not tuned well for large imports.
	work_mem	1MB
	maintenance_work_mem	16MB
	checkpoint_segments	3
	fsync	on (i have a BBU and may set this to off in the future)
	shared_buffers	24MB
	The xlog is on the RAID 5 array too.

I've modified osmosis to connect to port 5433.  Did i miss something?  Can i specify that on the command line?

I ran:  $ bzcat planet-110316.osm.bz2 | src/osmosis-0.34+ds1/bin/osmosis --read-xml file=- --write-pgsql host="localhost" user="osm" password="Shut up, Ted."

Here's the log so far.

Mar 29, 2011 11:11:43 PM org.openstreetmap.osmosis.core.Osmosis run
INFO: Osmosis Version 0.34
log4j:WARN No appenders could be found for logger (org.java.plugin.ObjectFactory).
log4j:WARN Please initialize the log4j system properly.
Mar 29, 2011 11:11:44 PM org.openstreetmap.osmosis.core.Osmosis run
INFO: Preparing pipeline.
Mar 29, 2011 11:11:44 PM org.openstreetmap.osmosis.core.Osmosis run
INFO: Launching pipeline execution.
Mar 29, 2011 11:11:44 PM org.openstreetmap.osmosis.core.Osmosis run
INFO: Pipeline executing, waiting for completion.

Sadly i did not configure logging correctly.

According to pg_stat is currently running this statement, so it looks like it's mostly done.

UPDATE ways SET bbox = (SELECT Envelope(Collect(geom)) FROM nodes JOIN way_nodes ON way_nodes.node_id = nodes.id WHERE way_nodes.way_id = ways.id)

Looks like a correlated subquery to me.  Probably performing a nested loop.

I've read in the mailing list that adding the bbox and linestring columns will make the import "much" longer.  So does that mean 10 days or 100 days?

I checked \d ways and i see "idx_ways_bbox" gist (bbox) and "idx_ways_linestring" gist (linestring).  So either those indexes were created after "UPDATE ways set bbox..." or i see the database before the transaction started.

I don't know if this is in a transaction or not.  I can't find the BEGIN in the code.  I do see setAutoCommit() and this appears to be called with false.

Any tips?

Thanks,
Brian

-- 
Brian DeRocher
http://brian.derocher.org



More information about the osmosis-dev mailing list