[OSM-dev] Final kinks in osmosis planet dumping

Brett Henderson brett at bretth.com
Mon Sep 10 11:35:00 BST 2007


Keith Sharp wrote:
> Aside from the historical regex parsers why do we have any space
> indenting in the planet.osm file?  In the first planet file I found on
> my filesystem (late July) it has 110478125 lines, at a guesstimate
> average of 4 bytes indenting per line that's 441912500 bytes of wasted
> space!
>
> It would be a nice task to write a simple tool that stripped the
> indenting and new lines from the most recent planet.osm file to allow
> accurate comparison of file sizes.  If I get time I'll try and do this
> in the next couple of days - unless someone beats me to it.
>
> Keith.
>   
If you have a java build environment (ie. jdk + ant), modify 
com.bretth.osmosis.core.xml.impl.ElementWriter.INDENT_SPACES_PER_LEVEL 
and recompile.  You can then rewrite the planet file using this command 
line.
osmosis --read-xml file=planetin.osm --write-xml planetout.osm

Dates are of constant length but not otherwise important for this so you 
can make it much quicker by adding the enableDateParsing=no
osmosis --read-xml enableDateParsing=no file=planetin.osm --write-xml 
planetout.osm





More information about the dev mailing list