[OSM-dev] speeding up loading an OSM dump into PostGIS?

Kai Krueger kakrueger at gmail.com
Fri Dec 16 17:01:56 GMT 2011


On 01/-10/-28163 12:59 PM, Peter Körner wrote:
> Am 16.12.2011 12:47, schrieb Hartmut Holzgraefe:
>> On 16.12.2011 11:21, Peter Körner wrote:
>>
>>> At one of the Hack-Weekends someone played around with distributing the
>>> SQL-Commands issued by osm2pgsql via XMPP.
>>
>> with the SQL command execution, especially the index creation, being
>> the most expensive part of an osm2pgsql run this would onyl save about
>> 1/4 to 1/3 of the total planet import execution time i'm afraid ...
>>
> 
> It would help with keeping updated: you would not need the --slim tables
> anymore (on each server - only on the master)

The biggest --slim table is the planet_osm_nodes table which is roughly
a 60 Gb table plus a 25Gb index. Having to access the nodes table is
also where most of the slowness in a memory constrained import and
during any diff application comes from. So the biggest benefit for speed
would probably come from optimising the way osm2pgsql access the nodes.

I have thought about moving the planet_osm_nodes table out of the
postgis database and into a flat file. Really the only information
needed about the nodes from the slim table is the lat / lon coordinates.
 I don't know how exactly postgres stores its data-structures, but it
uses more than 8 bytes per row for sure. The flat file would simply be a
huge array where you store the lats and lons as 8 byte tuples.

The highest node id is currently about 1 600 000 000, which results in
about a 13Gb file. As it is much smaller, a bigger proportion should be
in disk cache and a node can be retrieved in O(1). So the idea would be
that it is faster than using the database for this purpose.

My initial tests weren't so promising, but I tried to use sparse files
to save disk space, which was probably not a good idea, as that looses
the O(1) access characteristics and also makes writing to them much more
complicated.

I don't yet know when I will get around trying these things again, but
hopefully at some point I can figure out if this idea helps. Of cause,
if anyone else wants to try it...


Kai

> 
> Peter
> 
> 




More information about the dev mailing list