[Tile-serving] Parallelizing more of osm2pgsql

Sun Jul 14 23:13:42 UTC 2013

Hello everyone,

I have gone ahead and parallelized even more of osm2pgsql. Now the
single threaded feeder threads only consist of the pbf parsing and
everything else is done in parallel.

With this parallelisation, I now see speed-ups of around 20% - 40%
compared to what is currently in trunk. But those numbers still depend
on having sufficient amounts of ram, or fast SSDs. In the relation
processing stage, I am now seeing read speeds of at times over
500MByte/s, maxing out the SSD and sata controller.

I haven't yet committed this work back into the main repositories, as
the diff processing likely doesn't work yet with the parallelization and
I know that the gazeteer backend for nominatim is still broken. However,
the initial import should hopefully now be stable and produce correct
results.

If anyone is brave enough to try this out, or has a spare machine they
could test it on, it would be great to get some additional results on
different hardware to see how much this does or doesn't help. Also it
seems some of the changes needed to get things thread safe appear to
have reduced single threaded efficiencies somewhat. My guess would be
either due to lock operations, or as things are now no longer in a
single transaction, as you can't share a transaction across threads.

I will continue to clean up the code the test diff imports and fix up
nominatim after which I am hopeing to at some point merge this into the
main osm2pgsql repository.

the current code can be found in my repository at
https://github.com/apmon/osm2pgsql/tree/threading if anyone does already
want to try it out.

Kai

On 06/29/2013 03:04 PM, Kai Krueger wrote:
> Hello everyone,
> 
> as imports with osm2pgsql still take a long time and partly don't use
> the resources of a modern multi-core server particularly well, I have
> been trying to get more of osm2pgsql parallelized. Particularly I have
> been working on getting the way and relation parsing states more
> parallelized.
> 
> I now have a proof of principle implementation of a (partially) threaded
> implementation of these stages [1] for fresh imports (I haven't yet
> verified it for diff processing).
> 
> The way it currently is implemented is that a single feeder thread does
> the pbf / xml parsing and tag transform and then passes on the osm data
> to multiple worker threads to turn that into postgis geometries and
> commit it to the rendering tables. In the relation processing stage, the
> feeder thread both parses the osm relation information from the input
> file, as well as retrieve the necessary way and node information from
> the db. It then passes the full relation information on to the worker
> threads.
> 
> As the geometry building stages appear to be relatively CPU intense this
> split works pretty well for smaller extracts. E.g. when importing a UK
> extract on my laptop (4 core with hyperthreading), I see about a 3 times
> speed up of the way importing stage. For the relation processing stage,
> I partly get as high as a 5 times speedup, which is pretty impressive on
> a 4 core CPU. This is using 8 worker threads.
> 
> Overall, this results in about 20 - 30% speed up of the complete import
> time on a standard slim mode import and nearly halves the time needed to
> import an extract with the --drop option (dependent on hardware).
> 
> Unfortunately this does not appear to hold up on larger extracts or the
> planet import once the db no longer fully fits into ram. Even when the
> db is on a fast SSD, the added latency of fetching the way data from
> disk appears to mostly shifts the bottleneck from the worker threads to
> the single feeder thread retrieving the way and node data, mitigating
> much of the advantage of the parallelized implementation.
> 
> It should be possible to push the way and node retrieval into the worker
> threads as well and only leave the pbf parsing in the feeder thread. However
> a) This would be a lot more effort due to the current abstraction levels
> within osm2pgsql and would also need substantial updating of the
> gazeteer output backend, while the current solution is nearly entirely
> encapsulated within the rendering output backend. So getting this done
> will take quite a while and be a lot more effort in testing and be error
> prone until all of the thread safety issues are resolved.
> b) Will it actually help speed up things? Once it is IO bound will
> making things multi-threaded actually help. It is hard to say, but
> possibly yes. It probably depends on how effective postgresql is at
> parallelizing IO operations even for single "SELECT way WHERE id ="
> style queries. SSDs and probably also raid arrays really benefit from
> high IO queue depths. On 4kb random IO SSDs don't reach their full
> performance until queue depths of 32. And there is roughly a 10 times
> IOPS difference between queue depth of 1 and 32. So there is potential
> for the multi threaded implementation to be of advantage, as it will
> then also have multiple parallel sql queries outstanding that can be
> parallelized.
> 
> So one question is how to proceed?
> Does it make sense to get what I currently have "production ready" and
> commit it to osm2pgsql, even though on full planet imports it might not
> have as much performance advantages as hoped (it still has some
> particularly on high memory machines with fast SSDs). Or should I rather
> concentrate on getting the way and node retrieval parallelized in the
> hope that this improves things further?
> 
> Any thoughts?
> 
> Kai
> 
> 
> [1] https://github.com/apmon/osm2pgsql/tree/threading
>