[Tile-serving] Parallelizing more of osm2pgsql
Paul Norman
penorman at mac.com
Tue Jul 16 22:49:54 UTC 2013
> From: Kai Krueger [mailto:kakrueger at gmail.com]
> Sent: Sunday, July 14, 2013 4:14 PM
> Subject: Re: [Tile-serving] Parallelizing more of osm2pgsql
>
> Hello everyone,
>
> If anyone is brave enough to try this out, or has a spare machine they
> could test it on, it would be great to get some additional results on
> different hardware to see how much this does or doesn't help. Also it
> seems some of the changes needed to get things thread safe appear to
> have reduced single threaded efficiencies somewhat. My guess would be
> either due to lock operations, or as things are now no longer in a
> single transaction, as you can't share a transaction across threads.
I ran some tests on a cr1.8xlarge EC2 spot instance (0.35 USD/hr spot), and
the results are interesting.
The instance has 244GB of RAM, 240GB of SSD as 2 120GB volumes, 2 E5-2670,
which have a total of 16 cores (32 threads) at 2.6GHz, 3.3GHz turbo. Flags
used were --slim --flat-nodes --drop -C 20000 --unlogged, postgres with
fsync off and all options for speed over anything resembling integrity, with
4GB maintenance_work_mem. This machine is actually *slower* for single-core
performance over Kai's laptop which has better IPC and a higher turbo. The
geofabrik Europe extract was used for testing.
I tried with both 32 and 8 processes. I had to up the postgresql connections
for the former.
For the first stage:
8: Processing: Node(987516k 1256.4k/s) Way(119852k 39.08k/s)
Relation(1498360 2578.93/s) parse time: 4434s
32: Processing: Node(987516k 1231.3k/s) Way(119852k 41.82k/s)
Relation(1498360 2522.49/s) parse time: 4262s
Ways were at about 400% CPU, relations peaked at 600%. I hit 2.5k iops write
on the SSD array.
We could be limited by the single-threaded PBF reader here.
For the second stage:
8: Pending ways: 17k/s
32: Pending ways: 24k/s
The indexing and clustering stages do benefit from higher core counts, and
exceeded 5k iops write, being CPU bound.
Overall:
8: 4h30m
32: 4h
32 without USE_TREE: 3h30m
Conclusions: osm2pgsql is limited to effectively 4-8 cores, perhaps from a
single-threaded task. Good for a desktop i7, but doesn't really make full
use of a server CPU. The fastest machine I have access to for osm2pgsql
would probably be my overclocked gaming desktop with a 4-core i5, no HT.
I then tried non-slim mode. This used slightly more RAM then before, but at
no point did all the RAM get used. The total time was 6:27:12.
Conclusions: If you have enough ram for non-slim you have enough ram to
cache your database reads and writes, and --slim --drop is faster than
non-slim.
Something for discussion: Do we want to optimize non-slim mode or drop it?
Right now it's slower and has a very high RAM requirement, so it really
doesn't make sense to keep it as-is.
More information about the Tile-serving
mailing list