[OSM-dev] Optimal free compression algorithm for OSM XML data
Nick Hill
nick at nickhill.co.uk
Thu May 10 10:38:10 BST 2007
Hello David
I'll upload a 7z version of the current planet. If nobody finds serious problems
with 7-zip, then I'll update the planet creation script to use 7z instead.
David Earl wrote:
> And the conclusion is to change it, then?
>
> One thing that would also speed up processing planet files is to correct the
> UTF-8 so the need for the utf8 sanitizer goes away. There's only a small
> number of errors. Of course, this requires changes to be checked and
> rejected so others don't get reintroduced - maybe this happens already, I
> don't know. A small change to sanitize would turn it into a checker.
>
> I will volunteer to locate and correct all the problem entries if someone
> else would put in utf8 validation on input.
>
> David
>
>> -----Original Message-----
>> From: dev-bounces at openstreetmap.org
>> [mailto:dev-bounces at openstreetmap.org]On Behalf Of Nick Hill
>> Sent: 10 May 2007 09:26
>> To: OSM-dev
>> Subject: [OSM-dev] Optimal free compression algorithm for OSM XML data
>>
>>
>> After a brief discussion at the developers conference in Oxford regarding
>> compression algorithms for planet.osm, I decided to perform a
>> series of tests
>> using different algorithms available in free software.
>>
>> I chose the latest planet-070509.osm.
>>
>> The following algorithms:
>> bzip2
>> gzip -6
>> zip (pkzip)
>> gzip -9
>> 7z
>>
>> I determined CPU time to compress, uncompress and compressed file size:
>>
>>
>> Time Time File Size
>> pack Unpack Mebibytes
>>
>> bzip2 7144 347 246
>> 7-zip 4436 114 218.6
>> gzip -6 186 36 345
>> gzip -9 474 36 322
>> zip Failed - file size too large
>>
>> From the above, the user will likely have the best experience
>> using 7-zip.
>> Smallest download size, whilst costing only 1/3rd of bzip2 to
>> unpack. For OSM,
>> 7-zip is less costly, taking only 62% of the time it takes to
>> compress the
>> current bzip2.
>>
>> gzip unpacked much faster than my hard drive could write, 7zip
>> was a little
>> faster than my hard drive.
>>
>> These tests were performed with an AMD Athlon 2200+ 32 bit
>> running at a clock
>> speed of 1498Mhz from the year 2002.
>>
>>
>> _______________________________________________
>> dev mailing list
>> dev at openstreetmap.org
>> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>
>
More information about the dev
mailing list