[OSM-dev] Optimal free compression algorithm for OSM XML data

lizard lizard at furcon.de
Fri May 11 14:33:39 BST 2007


hi, 
since i always do stupid stuff, i tryed a quickshot of creating a binary
osm format.
with 5 simple replace-commands in perl i saved ~ 400MB in uncompressed
format.
if we build a "real" bin-format for this (not just replacing tags :) )
and than use a normal compressen an that bin-file, i think we can get it
even smaller. Additionaly it is easy to use in systems with less power,
because it don't have so much overhead like xml.

here are my 2 simple (proove of concept) scripts :)

lizard at lizard-desktop:~/osm$ cat osm2bin.pl 
#!/usr/bin/perl
open ($ifp, '<planet.osm') || die $!;
open ($ofp, '>planet.bosm') || die $!;

# write a header with version info
print $ofp 'OSM' . chr(0) . chr(0) . chr(1); 

while (<$ifp>)
{
  $line = $_;
  $line =~ s/^  \<node id=/\x01/;
  $line =~ s/ lat=/\x02/;
  $line =~ s/ lon=/\x03/;
  $line =~ s/ timestamp=/\x04/;
  print $ofp $line;
}

close ($ofp);
close ($ifp);


#!/usr/bin/perl
open ($ifp, '<planet.bosm') || die $!;
open ($ofp, '>planet-verify.osm') || die $!;

read ($ifp, $buf, 6); $buf = undef; ## just ignore fileinfo :)

while (<$ifp>)
{
  $line = $_;
  $line =~ s/\x04/ timestamp=/;
  $line =~ s/\x03/ lon=/;
  $line =~ s/\x02/ lat=/;
  $line =~ s/\x01/  \<node id=/;
  print $ofp $line;
}

close ($ofp);
close ($ifp);



have fun, and let me know what u think about this :)


On Thu, 2007-05-10 at 15:53 +0100, Nick Hill wrote:
> Hello Shaun
> 
> Thank you for the pointers for Mac users and 7-zip.
> 
> I have uploaded a copy of the current planet.osm as 7z, where I have further 
> increased compression using bigger dictionary etc.
> 
> planet files are at:
> http://planet.openstreetmap.org/
> 
> The URL for the current planet.osm in 7z format is:
> http://planet.openstreetmap.org/planet-070509.osm.7z
> 
> The new file is 183Mb vs Bzip2 235Mb.
> 
\





More information about the dev mailing list