[OSM-dev] Optimal free compression algorithm for OSM XML data

David Earl david at frankieandshadow.com
Thu May 10 09:34:16 BST 2007


And the conclusion is to change it, then?

One thing that would also speed up processing planet files is to correct the
UTF-8 so the need for the utf8 sanitizer goes away. There's only a small
number of errors. Of course, this requires changes to be checked and
rejected so others don't get reintroduced - maybe this happens already, I
don't know. A small change to sanitize would turn it into a checker.

I will volunteer to locate and correct all the problem entries if someone
else would put in utf8 validation on input.

David

> -----Original Message-----
> From: dev-bounces at openstreetmap.org
> [mailto:dev-bounces at openstreetmap.org]On Behalf Of Nick Hill
> Sent: 10 May 2007 09:26
> To: OSM-dev
> Subject: [OSM-dev] Optimal free compression algorithm for OSM XML data
>
>
> After a brief discussion at the developers conference in Oxford regarding
> compression algorithms for planet.osm, I decided to perform a
> series of tests
> using different algorithms available in free software.
>
> I chose the latest planet-070509.osm.
>
> The following algorithms:
> bzip2
> gzip -6
> zip (pkzip)
> gzip -9
> 7z
>
> I determined CPU time to compress, uncompress and compressed file size:
>
>
>                    Time   Time     File Size
>                    pack   Unpack   Mebibytes
>
> bzip2             7144   347      246
> 7-zip             4436   114      218.6
> gzip -6           186     36      345
> gzip -9           474     36	  322
> zip  Failed - file size too large
>
>  From the above, the user will likely have the best experience
> using 7-zip.
> Smallest download size, whilst costing only 1/3rd of bzip2 to
> unpack. For OSM,
> 7-zip is less costly, taking only 62% of the time it takes to
> compress the
> current bzip2.
>
> gzip unpacked much faster than my hard drive could write, 7zip
> was a little
> faster than my hard drive.
>
> These tests were performed with an AMD Athlon 2200+ 32 bit
> running at a clock
> speed of 1498Mhz from the year 2002.
>
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev





More information about the dev mailing list