[OSM-dev] osm2pgsql update

Thu Nov 24 09:53:56 GMT 2011

Hi,

    Kai has made a number of interesting improvements to osm2pgsql in 
the last weeks. I believe some bits are still work in progress but on 
the whole osm2pgsql has become a lot more efficient - it makes better 
use of cache memory and can even use multiple processes for some tasks. 
Anyone who regularly spends time waiting for osm2pgsql to complete is 
encouraged to check out a recent version from svn and try if that 
improves things for him.

I think it would be great to share results of osm2pgsql runs among users 
- how long does it take to import X on infrastructure Y?

I've made a start here, please add/modify as you see fit:

http://wiki.openstreetmap.org/wiki/Osm2pgsql/Benchmarks

There's one particular use case that osm2pgsql did not cover so well in 
the past - the "I don't want to apply updates but I need to use slim 
mode nonetheless because I don't have enough memory for non-slim" use case.

osm2pgsql is not very well suited for this because it puts all its 
temporary information into the database instead of a more efficient 
random-access structure. This is something I'll leave for someone else 
to fix, but I did one thing to make this use case a bit better; I 
introduced a "--drop" flag that makes osm2pgsql drop the temporary 
tables after import, and also does not create the indexes on way id and 
relation id that a --slim import normally created. So now, after 
importing a data set with --drop and --slim, you should have a database 
that looks almost the same as one imported without --slim. By dropping 
the unnecessary tables and indexes, the database usually is only 25% of 
the size of a complete --slim import (but of course it is unsuitable for 
updates).

There's one strange thing I noticed. When I dropped the creation of 
indexes (more precisely, primary keys) on way id and polygon id, 
suddenly osm2pgsql took ages to run - even though these indexes are 
clearly not created in non-slim mode and therefore should not be required.

I found out that the culprit is in the multipolygon code, where after 
finding out that an one-way outer ring is tagged the same as the 
multipolgon relation itself, a "delete_way_from_output" is issued, 
presumably to remove that already-generated ring. This leads to a 
"DELETE from <table> where osm_id=<id>" which requires a table scan 
because of lack of primary keys.

I have now disabled this for --slim --drop mode (the change will not 
affect normal --slim mode), but have to investigate further - this will 
likely create some extra areas for outer rings, but since it doesn't 
have these indexes, non-slim mode should exhibit the same behaviour.

Is anyone aware of multipolygon handling not working right when not 
using --slim? We might have to (re)introduce the primary key for osm_id 
at least on the polygon table to allow this deletion of duplicate areas.