[OSM-dev] osm2pgsql slow on update import

Kai Krueger kakrueger at gmail.com
Sat May 7 04:13:09 BST 2011

Ian Dees wrote:
> On subsequent updates osm2psgql does not have node information in memory
> anymore, so it must request the node information from PostgreSQL. This
> takes
> orders of magnitudes longer to do than a hit to memory.
One possible additional problem is that osm2pqsql retrieves the node
information one node at a time, rather than multiple nodes in a single sql
query. Therefore if a way has 2000 nodes, it will need to fire off 2000 SQL
queries to retrieve the nodes for that way. It also does not have any
parallelism or asynchrony, resulting in them all being executed in a
sequential blocking fashion.

The easy solution to this would probably be to simply implement the
pgsql_nodes_get_list function correctly, rather than as a simple for-loop
wrapper around pgsql_nodes_get.

A more complicated solution could potentially prefetch all nodes in a diff
with a single query, before going through the normal processing. However, I
have not measured either solution to see how much they would improve
performance and if they are worth the effort.

However, looking at Yevaud trying to catch up with the 2 and a half days
since the full planet import, it also appears that there are considerable
amount of times when applying diffs are not I/O bound, suggesting there are
more bottlenecks. 


View this message in context: http://gis.638310.n2.nabble.com/osm2pgsql-slow-on-update-import-tp6241594p6339421.html
Sent from the Developer Discussion mailing list archive at Nabble.com.

More information about the dev mailing list