[OSM-dev] Speeding up Osm2pgsql through parallelization?

Sat Sep 17 17:43:47 BST 2011

On 01/-10/-28163 12:59 PM, Hartmut Holzgraefe wrote:
[...]
>
> For the sorting code to perform well work_mem is key though, while
> for the index recreation step maintenance_work_mem is needed.
> Currently the wiki guide on pgsql configuration suggests static
> values for both (and until recently did not mention work_mem at all).
> There's also an issue with osm2pgsql not really returning the
> cache memory to the operating system due to heap fragmentation.
>
> Better results for large imports (table sizes much larger than
> RAM size) could probably be archived by:
>
> * making sure osm2pgsql properly returns the memory used for its
>    cache to the operating system, for this i've got a working patch:
>
>     https://github.com/hholzgra/osm2pgsql/tree/freeable_cache

OK, I have committed your patch and extended it to have a command line 
option to fall back to the old behavior.

In Linux, allocating the node cache as one large chunk works well, as 
internally Linux over commits memory and only allocates physical ram for 
those pages that are actually written too. So you can still specify a 
large cache value and only use as much physical memory as you need to 
cache all the nodes. As it is hard to guess how much cache one needs, 
particularly for diff-imports, this is pretty helpfull to not waste memory.

Other operating systems (Solaris? Mac OSX? Windows?) might behave 
differently and actually reserve  the full amount of memory, in which 
case one might want to fall back to the old behavior at least for diff 
imports.

>
> * serializing the index creation and clustering steps
>
>    running these in parallel makes sense where everything fits
>    into caches in RAM so that things get CPU bound, but on large
>    imports things will be IO bound anyway, and parallel index
>    builds just lead to lower cache hit rates which causes even
>    more IO load

I have committed this patch too, although I have changed it so that the 
default remains to do the indexing in parallel, and the command line 
switch changes the behavior to serial indexing.

On my preliminary benchmarks, doing the indexing in parallel was 
slightly faster than doing it one table at a time. However, I only tried 
it on small planet extracts (about 100Mb for the osm.pbf file). I also 
didn't play around with the postgresql settings of work_mem and 
maintanance_work_mem. Which is potentially where the benefit from doing 
things sequentially comes from, by being able to set those values higher.

>
> * start with low default work_mem and maintenance_work_mem settings
>    and raise them on a per statement level, so making the appropriate
>    buffer for a given operation (work_mem for ORDER BY,
>    maintenance_work_mem for CREATE INDEX) as large as possible and
>    then shrinking it back to its default size afterwards

Is it possible to adjust work_mem and maintanance_work_mem at run time?

>
> Whether CLUSTER or the current approach is better/faster for our
> imports needs to be benchmarked, my personal bet would be that
> CLUSTER wins as our data distribution over time is not totally
> random, but this really needs to me tested out.

I haven't yet benchmarked the difference between CLUSTER and the current 
sorting based method.

Has anyone else got numbers on this?

Kai

>
> (one additional advantage of CLUSTER would be that peak disk
> space usage during the operation would only be about two times
> the data size instead of three times with the current approach)
>