[OSM-dev] Speeding up Osm2pgsql through parallelization?
Frederik Ramm
frederik at remote.org
Tue Sep 13 01:20:00 BST 2011
Kai,
partial answer:
On 09/13/2011 02:07 AM, Kai Krueger wrote:
> 2) Currently all the (diff-) import is done in a single transaction.
> Therefore other db users (e.g. renderers) don't see any change until the
> full transaction is committed. In order to do things in parallel,
> however, there needs to be intermediary commits
[...]
> The question though is this valid? For the initial import this is
> probably not a problem as there won't be any db users concurrently until
> the import is complete. However, diff imports with concurrent rendering
> is a different matter. What will committing pending ways do to rendering?
Renderers use the geometry tables; the "pending" way is in the data
table where it will not usually be touched by renderers. So I don't see
a problem here. I am however not familiar with internal Postgres
processing and I could imagine that there is a speed penalty in
commiting pending ways as opposed to resetting the pending flag in the
same transaction where it was set.
> 3) Currently the string cache is not thread safe. It is possible to
> disable it via a single preprocessor define and then parallelizing at
> least doesn't lead to crashes, but I assume it is there for a good
> reason. Presumably with a bit of work, it should be possible to get the
> string cache thread safe though as well. So assuming the other two
> points aren't show stoppers, this should be possible to fix.
Have you considered multiprocessing (i.e. fork) instead of
multithreading - would this perhaps make these things go away elegantly?
Personally I abhor multithreading for the complexity it brings at
(usually) little gain compared to simply forking a few worker processes
but of course YMMV especially if you want tight communication between
workers.
Bye
Frederik
--
Frederik Ramm ## eMail frederik at remote.org ## N49°00'09" E008°23'33"
More information about the dev
mailing list