[OSM-dev] Final kinks in osmosis planet dumping
Brett Henderson
brett at bretth.com
Mon Sep 10 11:35:00 BST 2007
Keith Sharp wrote:
> Aside from the historical regex parsers why do we have any space
> indenting in the planet.osm file? In the first planet file I found on
> my filesystem (late July) it has 110478125 lines, at a guesstimate
> average of 4 bytes indenting per line that's 441912500 bytes of wasted
> space!
>
> It would be a nice task to write a simple tool that stripped the
> indenting and new lines from the most recent planet.osm file to allow
> accurate comparison of file sizes. If I get time I'll try and do this
> in the next couple of days - unless someone beats me to it.
>
> Keith.
>
If you have a java build environment (ie. jdk + ant), modify
com.bretth.osmosis.core.xml.impl.ElementWriter.INDENT_SPACES_PER_LEVEL
and recompile. You can then rewrite the planet file using this command
line.
osmosis --read-xml file=planetin.osm --write-xml planetout.osm
Dates are of constant length but not otherwise important for this so you
can make it much quicker by adding the enableDateParsing=no
osmosis --read-xml enableDateParsing=no file=planetin.osm --write-xml
planetout.osm
More information about the dev
mailing list