[Geocoding] Slowdown of osm2pgsql import
Holger Schöner
numenor at ancalime.de
Thu Jan 7 20:55:46 GMT 2010
Hi Frans,
> The nodes import started several factors faster than on the small
> machine. When it reached the ways, it became very slow again, relations
> now is the same.
> While it started with nearly 100% CPU usage, now there is a peak of 50%,
> but most of the time it is below 10%.
> Has anybody experience if this is Amazon related, or will there be much
> more projection to be done on ways/rels, so it is normal?
I have a little experience from a decent machine at home, Intel Dual Core
with 12 GB Memory and two consumer harddrives (one devoted entirely to the
db), as well as with a quadcore Intel I7 with 8GB and similar harddrives as
remotely hosted server. With both I have similar experiences:
I can reconfirm your experiences, although with me (in slim mode) the import
is done in about 1.5 days (or was it 15 hours? Last time I did a complete
import was in November).
Following the output of osm2pgsql, the node numbers flicker by rather
quickly (although there are, of course, orders of magnitude more of those),
while ways and relations are much slower. And this is not the end; after all
ways and relations are processed once, osm2pgsql has to process (some of)
the ways again, and finally the database has to create indices etc., which
from my memory takes a significant part of the time (maybe a third or half
of it?).
That processing ways and relations need much more time than nodes is
plausible: For creating the PostGIS Simple Features geometries, a node has
no dependencies (it contains its lat and lon values), while ways and
relations have to look up the referenced nodes (and ways and relations for
relation), because their geometry is directly incorporated in the converted
LineStrings and Polygons. This usually (especially in slim mode) seems to
induce a lot of db querying, even though the data might (more or less
completely) fit into a cache in memory.
> While from the speedup I estimated that the import will be done below 24
> hrs, now I'm on 2 days and a half.
I guess, this is a consequence of the slow I/O performance of Amazon EC2
machines, which Andy has mentioned. I have no experience with these, so I
unfortunately cannot tell you more about what to expect.
Yours,
Holger Schöner
More information about the Geocoding
mailing list