[OSM-dev] planet info

Thu Feb 28 22:46:14 GMT 2008

On Thu, 2008-02-28 at 23:09 +0100, Frederik Ramm wrote:
> With the gzip and bunzip2 formats, the following holds (quote form
> manpage):
> 
> > bunzip2 will correctly decompress a file which is the concatenation
> > of two or more compressed files.  The result is the concatenation of
> > the corresponding uncompressed files.

This could be a useful feature, provided all the other bzip2
decompression libraries implement this too.

> One could thus run the standard job and omit the opening <osm> tag,
> and after it is finished, create a mini XML that has "<osm><statistics
> nodes='1234' ways='3456' relations='4444' />" in it, zip that and
> concatenate it with the big file.
> 
> Another option would of course be to simply write a second file with
> statistics after you've written the big one. David et al could then
> read that first if they need it.
> 
> > We could split the dump into multiple files each containing one of:-
> > - nodes
> > - ways 
> > - relations
> > - other metadata (like counts)
> > 
> > Is this a good idea or should it be left as a job for some post
> > processing script? 
> 
> This would of course make it more difficult to "simply pipe the
> planet" into something (e.g. Osmosis would then have to be called with
> something like --rxn=nodes.osm --rxw=ways.osm --rxr=relations.osm
> etc.) - then again maybe "simply piping the planet" to something is a
> dying fashion.
> 
Maybe we could extend the scheme above. If we created bzip2 archives for
nodes, ways & relations then we could concatenate them together to
produce a valid file. In theory we could then prepend another stream of
bzip2 compressed metadata including offsets indicating where each bz2
stream starts. 

This might allow a clever reader to seek forward in the file and read
out each of the sub-streams in an arbitrary sequence. Tools which do not
implement this would just see the contents as before but with a few new
bits of XML data at the front which they can ignore.

I'll try implementing a proof of concept and let you know how I get on.

	Jon