[OSM-dev] Optimal free compression algorithm for OSM XML data

Nick Hill nick at nickhill.co.uk
Thu May 10 10:38:10 BST 2007


Hello David

I'll upload a 7z version of the current planet. If nobody finds serious problems 
with 7-zip, then I'll update the planet creation script to use 7z instead.

David Earl wrote:
> And the conclusion is to change it, then?
> 
> One thing that would also speed up processing planet files is to correct the
> UTF-8 so the need for the utf8 sanitizer goes away. There's only a small
> number of errors. Of course, this requires changes to be checked and
> rejected so others don't get reintroduced - maybe this happens already, I
> don't know. A small change to sanitize would turn it into a checker.
> 
> I will volunteer to locate and correct all the problem entries if someone
> else would put in utf8 validation on input.
> 
> David
> 
>> -----Original Message-----
>> From: dev-bounces at openstreetmap.org
>> [mailto:dev-bounces at openstreetmap.org]On Behalf Of Nick Hill
>> Sent: 10 May 2007 09:26
>> To: OSM-dev
>> Subject: [OSM-dev] Optimal free compression algorithm for OSM XML data
>>
>>
>> After a brief discussion at the developers conference in Oxford regarding
>> compression algorithms for planet.osm, I decided to perform a
>> series of tests
>> using different algorithms available in free software.
>>
>> I chose the latest planet-070509.osm.
>>
>> The following algorithms:
>> bzip2
>> gzip -6
>> zip (pkzip)
>> gzip -9
>> 7z
>>
>> I determined CPU time to compress, uncompress and compressed file size:
>>
>>
>>                    Time   Time     File Size
>>                    pack   Unpack   Mebibytes
>>
>> bzip2             7144   347      246
>> 7-zip             4436   114      218.6
>> gzip -6           186     36      345
>> gzip -9           474     36	  322
>> zip  Failed - file size too large
>>
>>  From the above, the user will likely have the best experience
>> using 7-zip.
>> Smallest download size, whilst costing only 1/3rd of bzip2 to
>> unpack. For OSM,
>> 7-zip is less costly, taking only 62% of the time it takes to
>> compress the
>> current bzip2.
>>
>> gzip unpacked much faster than my hard drive could write, 7zip
>> was a little
>> faster than my hard drive.
>>
>> These tests were performed with an AMD Athlon 2200+ 32 bit
>> running at a clock
>> speed of 1498Mhz from the year 2002.
>>
>>
>> _______________________________________________
>> dev mailing list
>> dev at openstreetmap.org
>> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
> 
> 




More information about the dev mailing list