[OSM-dev] Speeding up Osm2pgsql through parallelization?

Kai Krueger kakrueger at gmail.com
Wed Sep 14 00:31:13 BST 2011


On 7/22/64 12:59 PM, Frederik Ramm wrote:
> Kai,
>
>   partial answer:
>
> On 09/13/2011 02:07 AM, Kai Krueger wrote:
>> 2) Currently all the (diff-) import is done in a single transaction.
>> Therefore other db users (e.g. renderers) don't see any change until the
>> full transaction is committed. In order to do things in parallel,
>> however, there needs to be intermediary commits
>
> [...]
>
>> The question though is this valid? For the initial import this is
>> probably not a problem as there won't be any db users concurrently until
>> the import is complete. However, diff imports with concurrent rendering
>> is a different matter. What will committing pending ways do to 
>> rendering?
>
> Renderers use the geometry tables; the "pending" way is in the data 
> table where it will not usually be touched by renderers. So I don't 
> see a problem here. I am however not familiar with internal Postgres 
> processing and I could imagine that there is a speed penalty in 
> commiting pending ways as opposed to resetting the pending flag in the 
> same transaction where it was set.

Good point. Yes the pending way stuff is on the ways table and not on 
the geometry rendering tables, so hopefully it shouldn't cause any 
direct breakage of the rendering. What possibly could happen is that you 
get some temporal inconsistencies, in the sense that on a single tile 
you might have some newer ways rendered but older polygons not showing 
up yet. But that should hopefully not really cause any problems.
>
>
>> 3) Currently the string cache is not thread safe. It is possible to
>> disable it via a single preprocessor define and then parallelizing at
>> least doesn't lead to crashes, but I assume it is there for a good
>> reason. Presumably with a bit of work, it should be possible to get the
>> string cache thread safe though as well. So assuming the other two
>> points aren't show stoppers, this should be possible to fix.
>
> Have you considered multiprocessing (i.e. fork) instead of 
> multithreading - would this perhaps make these things go away 
> elegantly? Personally I abhor multithreading for the complexity it 
> brings at (usually) little gain compared to simply forking a few 
> worker processes but of course YMMV especially if you want tight 
> communication between workers.
No, I hadn't considered multiprocessing, but again, that is a good point 
worth exploring further. Currently, what I have done does have a tight 
integration to share to loop counter between threads, but you can 
probably just split it into independent sections per worker process.

Overall, it does hopefully mean that it is worth exploring this avenue 
further though, and try and get a clean enough patch to consider 
applying it to osm2pgsql.

Kai
>
>
> Bye
> Frederik
>




More information about the dev mailing list