[OSM-dev] Speeding up Osm2pgsql through parallelization?

Kai Krueger kakrueger at gmail.com
Sun Oct 9 10:33:34 BST 2011


Hi,

I think I now have a first version of a patch to speed up Osm2pgsql by
parallelizing certain parts of it. Specifically it parallelizes the
stages of "Going over pending ways" / "Going over pending relations".

Previously osm2pgsql would fetch all ways / relations that are marked as
"pending" and process them one by one. With this patch osm2pgsql spawns
a (configurable) number of workers that go through  the list in
parallel. If osm2pgsql is given enough cache, these stages are mostly
CPU bound during initial import. Therefore, parallelization gives a near
linear speed up. If not enough cache is available, then there is still a
possibility of speed up, although that then depends heavily on the disk
subsystem.

As I only have a laptop available, I can't really do any extensive
(performance) testing on the patch, so how much speedup it will give in
reality I can't say.the

On 01/-10/-28163 12:59 PM, Frederik Ramm wrote:
> 
> 
> Have you considered multiprocessing (i.e. fork) instead of
> multithreading - would this perhaps make these things go away elegantly?
> Personally I abhor multithreading for the complexity it brings at
> (usually) little gain compared to simply forking a few worker processes
> but of course YMMV especially if you want tight communication between
> workers.
> 

In this version of the patch, I did choose the route of forking, which
indeed was considerably easier than to figure out which parts are and
are not thread safe.


I have done a bunch of testing with the patch and so far it looks like
everything is working. However, as I am not 100% sure, I didn't just
want to commit this patch to svn. Instead, it would be great if someone
could review the patch or test it, to make sure in works fully. I have
attached the patch to this email.

Any feedback or suggestions are welcome

Kai
-------------- next part --------------
A non-text attachment was scrubbed...
Name: osm2pgsql-parallelization.patch
Type: text/x-patch
Size: 19141 bytes
Desc: not available
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20111009/1353ad9d/attachment-0001.bin>


More information about the dev mailing list