[Geocoding] Slowdown of osm2pgsql import

Holger Schöner numenor at ancalime.de
Thu Jan 7 20:55:46 GMT 2010


Hi Frans,

> The nodes import started several factors faster than on the small
>  machine. When it reached the ways, it became very slow again, relations
>  now is the same.
> While it started with nearly 100% CPU usage, now there is a peak of 50%,
>  but most of the time it is below 10%.
> Has anybody experience if this is  Amazon related, or will there be much
> more projection to be done on ways/rels, so it is normal?

I have a little experience from a decent machine at home, Intel Dual Core 
with 12 GB Memory and two consumer harddrives (one devoted entirely to the 
db), as well as with a quadcore Intel I7 with 8GB and similar harddrives as 
remotely hosted server. With both I have similar experiences:

I can reconfirm your experiences, although with me (in slim mode) the import 
is done in about 1.5 days (or was it 15 hours? Last time I did a complete 
import was in November).

Following the output of osm2pgsql, the node numbers flicker by rather 
quickly (although there are, of course, orders of magnitude more of those), 
while ways and relations are much slower. And this is not the end; after all 
ways and relations are processed once, osm2pgsql has to process (some of) 
the ways again, and finally the database has to create indices etc., which 
from my memory takes a significant part of the time (maybe a third or half 
of it?).

That processing ways and relations need much more time than nodes is 
plausible: For creating the PostGIS Simple Features geometries, a node has 
no dependencies (it contains its lat and lon values), while ways and 
relations have to look up the referenced nodes (and ways and relations for 
relation), because their geometry is directly incorporated in the converted 
LineStrings and Polygons. This usually (especially in slim mode) seems to 
induce a lot of db querying, even though the data might (more or less 
completely) fit into a cache in memory.

> While from the speedup I estimated that the import will be done below 24
> hrs, now I'm on 2 days and a half.

I guess, this is a consequence of the slow I/O performance of Amazon EC2 
machines, which Andy has mentioned. I have no experience with these, so I 
unfortunately cannot tell you more about what to expect.

Yours,
Holger Schöner




More information about the Geocoding mailing list