[OSM-dev] planet info

Thu Feb 28 20:00:06 GMT 2008

On Thu, 2008-02-28 at 12:27 +0000, David Earl wrote:
> How feasible would it be to put a set of attributes either on the top 
> level element or an element created for the purpose which tells me how 
> many nodes, ways and relations there are in the file. If you have the 
> counts to hand at the beginning, great, but if not if you wrote '... 
> nodecount="000000000000" waycount="000000000000" 
> relationcount="000000000000"' at the beginning, and then when you've 
> output the elements and counted them up as you do it, at the end seek 
> back and replace the zeros with the counts.
> 
> This would enable me and others to do progress reporting on making a 
> pass through the file. (I can't do it by file size and read position 
> because the filesize function won't go bigger than 2Gb in PHP, and I 
> can't count the elements before I start without completely decompressing 
> the file first, which I no longer have enough free disk to do).
> 

It is more tricky than that unfortunately. The planet dump code streams
data to STDOUT. It can not simply seek back to the beginning. We have
similar disk space issues when generating the file so it gets piped
directly to gzip so again there is no trivial way to update the contents
after it is written. 

It may be possible to put an estimate of the number of
nodes/ways/relations by running an additional query at the start of the
dump. The numbers may be slightly inaccurate as the DB is not locked so
things may get modified during the dump.

I had a similar request from someone to consider putting the relations
at the head of the file (or in a different file). If you are looking for
particular relations then having them at the end makes it difficult to
know which nodes/ways are important as you scan the file. 

We could split the dump into multiple files each containing one of:-
- nodes
- ways 
- relations
- other metadata (like counts)

Is this a good idea or should it be left as a job for some post
processing script? 

Having the relations in a different file would make it easier for
osm2pgsql to parse the multipolygon relations. OTOH I like the fact that
it is all in a single file at the moment.

	Jon