[OSM-dev] osmosis bz2 performance

Stefan Baebler stefan.baebler at gmail.com
Wed Feb 6 13:21:13 GMT 2008


Hi!

Today i was pleasantly surprised to see osmosis finished processing
today's planet.osm.gz in only 2 hours. Normally it took approx 6 hours
on the same machine, same bbox, same osmosis version (0.24), but
slightly smaller bz2 version of the whole planet.

http://osm.baebler.net/data/log.txt

Previous benchmark of compression algorithms on planet files:
On Jan 1, 2008 3:47 PM, Bruce Cowan <lists at bcowan.fastmail.co.uk> wrote:
> Testing on the latest UK planet (I'm not going to download the real
> planet), I got the following results (all the default settings):
>
> Original:
> 650625406 bytes
>
> gzip:
>  74858343 bytes, 37.198s to compress, 33.005s to decompress.
> 88.5% saving.
>
> bzip2:
>  57581322 bytes, 7m25.169s to compress, 45.983s to decompress.
> 91.1% saving, 23.0% over gzip.
>
> lzma:
>  38175614 bytes, 21m39.595s to compress, 36.187s to decompress.
> 94.1% saving, 49.0% over gzip, 33.7% over bzip2.

Bruce's test: UK planet decompression time with native debian tools:
bz2 / gz ratio = 46s / 33s = 1.39 ... 39% "overtime"

My test: whole planet decompression (+extracting bbox and writing it
to bz2!) time with Osmosis:
bz2 / gz ratio = 6h / 2h = 3 ... 200% "overtime"
At least.
If we say that it took roughly 1h in both cases for decompressed osm
processing and writing out the extracted bbox (compressed to bz2 in
both cases) and substract it from total times to get the bz2
decompression time, the ratio gets even worse:
bz2 / gz ratio = 5h / 1h = 5 ... 400% "overtime"

It seems that apache's bz2 implementation that is used in Osmosis is
very slow compared to the gz implementation. Could it simply be due to
Java or are other bz2 implementations in Java better?

Stefan




More information about the dev mailing list