[OSM-dev] planet info
frederik at remote.org
Thu Feb 28 22:09:06 GMT 2008
> It is more tricky than that unfortunately. The planet dump code streams
> data to STDOUT. It can not simply seek back to the beginning. We have
> similar disk space issues when generating the file so it gets piped
> directly to gzip so again there is no trivial way to update the contents
> after it is written.
With the gzip and bunzip2 formats, the following holds (quote form
> bunzip2 will correctly decompress a file which is the concatenation
> of two or more compressed files. The result is the concatenation of
> the corresponding uncompressed files.
One could thus run the standard job and omit the opening <osm> tag,
and after it is finished, create a mini XML that has "<osm><statistics
nodes='1234' ways='3456' relations='4444' />" in it, zip that and
concatenate it with the big file.
Another option would of course be to simply write a second file with
statistics after you've written the big one. David et al could then
read that first if they need it.
> We could split the dump into multiple files each containing one of:-
> - nodes
> - ways
> - relations
> - other metadata (like counts)
> Is this a good idea or should it be left as a job for some post
> processing script?
This would of course make it more difficult to "simply pipe the
planet" into something (e.g. Osmosis would then have to be called with
something like --rxn=nodes.osm --rxw=ways.osm --rxr=relations.osm
etc.) - then again maybe "simply piping the planet" to something is a
To David, who complained about PHP not properly working with files
larger 2 GB, my advice would be (1) don't use PHP, and (2) if you
$size = (float) exec("stat -c %s ".escapeshellarg($planetfilename));
> OTOH I like the fact that
> it is all in a single file at the moment.
Frederik Ramm ## eMail frederik at remote.org ## N49°00.09' E008°23.33'
More information about the dev