[OSM-dev] Optimal free compression algorithm for OSM XML data
david at frankieandshadow.com
Thu May 10 09:34:16 BST 2007
And the conclusion is to change it, then?
One thing that would also speed up processing planet files is to correct the
UTF-8 so the need for the utf8 sanitizer goes away. There's only a small
number of errors. Of course, this requires changes to be checked and
rejected so others don't get reintroduced - maybe this happens already, I
don't know. A small change to sanitize would turn it into a checker.
I will volunteer to locate and correct all the problem entries if someone
else would put in utf8 validation on input.
> -----Original Message-----
> From: dev-bounces at openstreetmap.org
> [mailto:dev-bounces at openstreetmap.org]On Behalf Of Nick Hill
> Sent: 10 May 2007 09:26
> To: OSM-dev
> Subject: [OSM-dev] Optimal free compression algorithm for OSM XML data
> After a brief discussion at the developers conference in Oxford regarding
> compression algorithms for planet.osm, I decided to perform a
> series of tests
> using different algorithms available in free software.
> I chose the latest planet-070509.osm.
> The following algorithms:
> gzip -6
> zip (pkzip)
> gzip -9
> I determined CPU time to compress, uncompress and compressed file size:
> Time Time File Size
> pack Unpack Mebibytes
> bzip2 7144 347 246
> 7-zip 4436 114 218.6
> gzip -6 186 36 345
> gzip -9 474 36 322
> zip Failed - file size too large
> From the above, the user will likely have the best experience
> using 7-zip.
> Smallest download size, whilst costing only 1/3rd of bzip2 to
> unpack. For OSM,
> 7-zip is less costly, taking only 62% of the time it takes to
> compress the
> current bzip2.
> gzip unpacked much faster than my hard drive could write, 7zip
> was a little
> faster than my hard drive.
> These tests were performed with an AMD Athlon 2200+ 32 bit
> running at a clock
> speed of 1498Mhz from the year 2002.
> dev mailing list
> dev at openstreetmap.org
More information about the dev