[OSM-dev] planet info

Dave Stubbs osm.list at randomjunk.co.uk
Fri Feb 29 09:23:28 GMT 2008


On Thu, Feb 28, 2008 at 10:47 PM, Jason Reid
<osm at bowvalleytechnologies.com> wrote:
>
> David Earl wrote:
>  > How feasible would it be to put a set of attributes either on the top
>  > level element or an element created for the purpose which tells me how
>  > many nodes, ways and relations there are in the file. If you have the
>  > counts to hand at the beginning, great, but if not if you wrote '...
>  > nodecount="000000000000" waycount="000000000000"
>  > relationcount="000000000000"' at the beginning, and then when you've
>  > output the elements and counted them up as you do it, at the end seek
>  > back and replace the zeros with the counts.
>  >
>  > This would enable me and others to do progress reporting on making a
>  > pass through the file. (I can't do it by file size and read position
>  > because the filesize function won't go bigger than 2Gb in PHP, and I
>  > can't count the elements before I start without completely decompressing
>  > the file first, which I no longer have enough free disk to do).
>  >
>  > David
>  >
>  > _______________________________________________
>  > dev mailing list
>  > dev at openstreetmap.org
>  > http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>  >
>  There is the planet statistics script that I wrote a while back (in
>  python) that I need to get around to popping into SVN, it doesn't count
>  nodes or relations currently, only ways, but it wouldn't be hard to add
>  (plus it would give it something to do since 92.5% of the objects in the
>  dump are nodes and it currently scans over them silently). It could be
>  modified to sit in between the output of the planet script and gzip and
>  calculate as the file is being compressed (the script uses a stream
>  consuming parser to read stdin, in my uses piping from bzcat currently,
>  and could pass the stream back out stdout unmodified)
>


I think if we wanted counting it would be simpler to just add it to
the C code rather than pipe through another application which actually
has the same limitations (no knowledge of counts at the start, and no
seek).

The other possibility would be to write to a whole sequence of files,
all compressed, and just tar the results with a stats meta file to
make a single file for download... most processors could be modified
to read tarballs quite easily, and if not you could untar them first -
it would basically be an OSM Jar but with choice of compression. Just
a random thought... I'm sure you can think of many holes.

Don't forget there's also
http://www.openstreetmap.org/stats/data_stats.html -- if you just want
a rough guess at the number of nodes/ways and you are dealing with a
recent planet, then you could just scrape that to get the numbers.

Dave




More information about the dev mailing list