[OSM-dev] osm2pgsql update
frederik at remote.org
Thu Nov 24 09:53:56 GMT 2011
Kai has made a number of interesting improvements to osm2pgsql in
the last weeks. I believe some bits are still work in progress but on
the whole osm2pgsql has become a lot more efficient - it makes better
use of cache memory and can even use multiple processes for some tasks.
Anyone who regularly spends time waiting for osm2pgsql to complete is
encouraged to check out a recent version from svn and try if that
improves things for him.
I think it would be great to share results of osm2pgsql runs among users
- how long does it take to import X on infrastructure Y?
I've made a start here, please add/modify as you see fit:
There's one particular use case that osm2pgsql did not cover so well in
the past - the "I don't want to apply updates but I need to use slim
mode nonetheless because I don't have enough memory for non-slim" use case.
osm2pgsql is not very well suited for this because it puts all its
temporary information into the database instead of a more efficient
random-access structure. This is something I'll leave for someone else
to fix, but I did one thing to make this use case a bit better; I
introduced a "--drop" flag that makes osm2pgsql drop the temporary
tables after import, and also does not create the indexes on way id and
relation id that a --slim import normally created. So now, after
importing a data set with --drop and --slim, you should have a database
that looks almost the same as one imported without --slim. By dropping
the unnecessary tables and indexes, the database usually is only 25% of
the size of a complete --slim import (but of course it is unsuitable for
There's one strange thing I noticed. When I dropped the creation of
indexes (more precisely, primary keys) on way id and polygon id,
suddenly osm2pgsql took ages to run - even though these indexes are
clearly not created in non-slim mode and therefore should not be required.
I found out that the culprit is in the multipolygon code, where after
finding out that an one-way outer ring is tagged the same as the
multipolgon relation itself, a "delete_way_from_output" is issued,
presumably to remove that already-generated ring. This leads to a
"DELETE from <table> where osm_id=<id>" which requires a table scan
because of lack of primary keys.
I have now disabled this for --slim --drop mode (the change will not
affect normal --slim mode), but have to investigate further - this will
likely create some extra areas for outer rings, but since it doesn't
have these indexes, non-slim mode should exhibit the same behaviour.
Is anyone aware of multipolygon handling not working right when not
using --slim? We might have to (re)introduce the primary key for osm_id
at least on the polygon table to allow this deletion of duplicate areas.
More information about the dev