[Tile-serving] Parallelizing more of osm2pgsql

Paul Norman penorman at mac.com
Tue Jul 16 22:49:54 UTC 2013


> From: Kai Krueger [mailto:kakrueger at gmail.com]
> Sent: Sunday, July 14, 2013 4:14 PM
> Subject: Re: [Tile-serving] Parallelizing more of osm2pgsql
> 
> Hello everyone,
> 
> If anyone is brave enough to try this out, or has a spare machine they 
> could test it on, it would be great to get some additional results on 
> different hardware to see how much this does or doesn't help. Also it 
> seems some of the changes needed to get things thread safe appear to 
> have reduced single threaded efficiencies somewhat. My guess would be 
> either due to lock operations, or as things are now no longer in a 
> single transaction, as you can't share a transaction across threads.

I ran some tests on a cr1.8xlarge EC2 spot instance (0.35 USD/hr spot), and
the results are interesting.

The instance has 244GB of RAM, 240GB of SSD as 2 120GB volumes, 2 E5-2670,
which have a total of 16 cores (32 threads) at 2.6GHz, 3.3GHz turbo. Flags
used were --slim --flat-nodes --drop -C 20000 --unlogged, postgres with
fsync off and all options for speed over anything resembling integrity, with
4GB maintenance_work_mem. This machine is actually *slower* for single-core
performance over Kai's laptop which has better IPC and a higher turbo. The
geofabrik Europe extract was used for testing.

I tried with both 32 and 8 processes. I had to up the postgresql connections
for the former. 

For the first stage: 

8: Processing: Node(987516k 1256.4k/s) Way(119852k 39.08k/s)
Relation(1498360 2578.93/s)  parse time: 4434s
32: Processing: Node(987516k 1231.3k/s) Way(119852k 41.82k/s)
Relation(1498360 2522.49/s)  parse time: 4262s

Ways were at about 400% CPU, relations peaked at 600%. I hit 2.5k iops write
on the SSD array.
 
We could be limited by the single-threaded PBF reader here.

For the second stage:

8: Pending ways: 17k/s
32: Pending ways: 24k/s

The indexing and clustering stages do benefit from higher core counts, and
exceeded 5k iops write, being CPU bound.

Overall:
8: 4h30m
32: 4h
32 without USE_TREE: 3h30m

Conclusions: osm2pgsql is limited to effectively 4-8 cores, perhaps from a
single-threaded task. Good for a desktop i7, but doesn't really make full
use of a server CPU. The fastest machine I have access to for osm2pgsql
would probably be my overclocked gaming desktop with a 4-core i5, no HT.

I then tried non-slim mode. This used slightly more RAM then before, but at
no point did all the RAM get used. The total time was 6:27:12.

Conclusions: If you have enough ram for non-slim you have enough ram to
cache your database reads and writes, and --slim --drop is faster than
non-slim.

Something for discussion: Do we want to optimize non-slim mode or drop it?
Right now it's slower and has a very high RAM requirement, so it really
doesn't make sense to keep it as-is.




More information about the Tile-serving mailing list