[OSM-dev] planet info

Thu Feb 28 22:09:06 GMT 2008

Hi Jon,

> It is more tricky than that unfortunately. The planet dump code streams
> data to STDOUT. It can not simply seek back to the beginning. We have
> similar disk space issues when generating the file so it gets piped
> directly to gzip so again there is no trivial way to update the contents
> after it is written. 

With the gzip and bunzip2 formats, the following holds (quote form
manpage):

> bunzip2 will correctly decompress a file which is the concatenation
> of two or more compressed files.  The result is the concatenation of
> the corresponding uncompressed files.

One could thus run the standard job and omit the opening <osm> tag,
and after it is finished, create a mini XML that has "<osm><statistics
nodes='1234' ways='3456' relations='4444' />" in it, zip that and
concatenate it with the big file.

Another option would of course be to simply write a second file with
statistics after you've written the big one. David et al could then
read that first if they need it.

> We could split the dump into multiple files each containing one of:-
> - nodes
> - ways 
> - relations
> - other metadata (like counts)
> 
> Is this a good idea or should it be left as a job for some post
> processing script? 

This would of course make it more difficult to "simply pipe the
planet" into something (e.g. Osmosis would then have to be called with
something like --rxn=nodes.osm --rxw=ways.osm --rxr=relations.osm
etc.) - then again maybe "simply piping the planet" to something is a
dying fashion.

To David, who complained about PHP not properly working with files
larger 2 GB, my advice would be (1) don't use PHP, and (2) if you
must, try:

$size = (float) exec("stat -c %s ".escapeshellarg($planetfilename));

> OTOH I like the fact that
> it is all in a single file at the moment.

Me too.

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00.09' E008°23.33'