[Tile-serving] Parallelizing more of osm2pgsql

Sat Jun 29 21:04:06 UTC 2013

Hello everyone,

as imports with osm2pgsql still take a long time and partly don't use
the resources of a modern multi-core server particularly well, I have
been trying to get more of osm2pgsql parallelized. Particularly I have
been working on getting the way and relation parsing states more
parallelized.

I now have a proof of principle implementation of a (partially) threaded
implementation of these stages [1] for fresh imports (I haven't yet
verified it for diff processing).

The way it currently is implemented is that a single feeder thread does
the pbf / xml parsing and tag transform and then passes on the osm data
to multiple worker threads to turn that into postgis geometries and
commit it to the rendering tables. In the relation processing stage, the
feeder thread both parses the osm relation information from the input
file, as well as retrieve the necessary way and node information from
the db. It then passes the full relation information on to the worker
threads.

As the geometry building stages appear to be relatively CPU intense this
split works pretty well for smaller extracts. E.g. when importing a UK
extract on my laptop (4 core with hyperthreading), I see about a 3 times
speed up of the way importing stage. For the relation processing stage,
I partly get as high as a 5 times speedup, which is pretty impressive on
a 4 core CPU. This is using 8 worker threads.

Overall, this results in about 20 - 30% speed up of the complete import
time on a standard slim mode import and nearly halves the time needed to
import an extract with the --drop option (dependent on hardware).

Unfortunately this does not appear to hold up on larger extracts or the
planet import once the db no longer fully fits into ram. Even when the
db is on a fast SSD, the added latency of fetching the way data from
disk appears to mostly shifts the bottleneck from the worker threads to
the single feeder thread retrieving the way and node data, mitigating
much of the advantage of the parallelized implementation.

It should be possible to push the way and node retrieval into the worker
threads as well and only leave the pbf parsing in the feeder thread. However
a) This would be a lot more effort due to the current abstraction levels
within osm2pgsql and would also need substantial updating of the
gazeteer output backend, while the current solution is nearly entirely
encapsulated within the rendering output backend. So getting this done
will take quite a while and be a lot more effort in testing and be error
prone until all of the thread safety issues are resolved.
b) Will it actually help speed up things? Once it is IO bound will
making things multi-threaded actually help. It is hard to say, but
possibly yes. It probably depends on how effective postgresql is at
parallelizing IO operations even for single "SELECT way WHERE id ="
style queries. SSDs and probably also raid arrays really benefit from
high IO queue depths. On 4kb random IO SSDs don't reach their full
performance until queue depths of 32. And there is roughly a 10 times
IOPS difference between queue depth of 1 and 32. So there is potential
for the multi threaded implementation to be of advantage, as it will
then also have multiple parallel sql queries outstanding that can be
parallelized.

So one question is how to proceed?
Does it make sense to get what I currently have "production ready" and
commit it to osm2pgsql, even though on full planet imports it might not
have as much performance advantages as hoped (it still has some
particularly on high memory machines with fast SSDs). Or should I rather
concentrate on getting the way and node retrieval parallelized in the
hope that this improves things further?

Any thoughts?

Kai

[1] https://github.com/apmon/osm2pgsql/tree/threading