[OSM-dev] Messing with planet
Andy Robinson (blackadder)
blackadderajr at googlemail.com
Sat Dec 22 12:52:16 GMT 2007
Jon Burgess [mailto:jburgess777 at googlemail.com] wrote:
>Sent: 22 December 2007 12:39 PM
>To: Andy Robinson (blackadder)
>Cc: 'osm-dev List'
>Subject: Re: [OSM-dev] Messing with planet
>
>On Sat, 2007-12-22 at 11:24 +0000, Andy Robinson (blackadder) wrote:
>> I tried to uncompress the latest planet last night (first time in quite a
>> while) which failed when the disk on my hand-me-down dev box was full :-(
>>
>> Would be helpful if those with planet experience could share your current
>> method of processing and analysing planet and the amount of disk and
>memory
>> you think is the current minimum.
>
>It depends, but in general I think you want your tools to be able to
>read a .bz2 or .gz file directly or be able to read from STDIN (so you
>can pipe from bzcat etc). When I was working a lot with the planet files
>I would convert the .bz2 to a .gz file since that is much faster to
>decompress (if you're going to be reading it a lot).
>
>The memory requirement entirely depends on how much state information
>you need to keep while processing the file. Don't expect to be able to
>keep the whole planet file in RAM unless you can devise a very compact
>representation of the data (even then you'll probably need >1GB of ram).
>
>> For instance, do all the tools require an
>> uncompressed planet (I'm thinking osmosis I guess now?).
>
>Osmosis will read .gz or .bz2 files directly:
> --read-xml compressionMethod=bzip2 ...
>
>> Also any guidance
>> on free HDD space size and memory settings when processing planet and
>also
>> if importing it into a blank mysql database. Is import of the whole
>planet
>> into the rails database scheme a workable option for an individual
>anymore?
>
>I have not looked at importing the planet into MySQL for a few months.
>Even then, it would takes several hours to import into MySQL and a
>couple of GB of disk space. The planet files has grown several times the
>size since then so I imagine it might easily take a day to import it now
>and over 10GB of disk space.
>
>I think you'd be much better off importing a subset, like the UK planet:
>http://nick.dev.openstreetmap.org/downloads/planet/
>
>
>> These are questions that were not really an issue 6 months ago when
>planet
>> was so much smaller. I appreciate once I have got back up to speed I
>could
>> be using the diff files to limit requirements.
>>
>> Specifically I wanted to be working on two aspects over the festive
>season.
>> 1. some evaluation of the rails stuff now that I have a working rails
>port
>> 2. some tag analysis and user stats stuff
>
>The user information is not present in the planet dumps so that may be
>tricky.
>
>My recommendation for any analysis would be to use a streaming solution,
>reading the file from STDIN.
>
>Alternatively osm2pgsql could be useful for you. It will import the
>planet file in a few hours*. The keys get mapped into columns in a
>couple of tables. Depending on what you need the existing tables may be
>OK for you, or you might need to adjust the list of exported tags in the
>source if you need more keys. If you are good with SQL then you can get
>all kinds of stats from the database tables,
> e.g. To retrieve the top 10 highway= values
>
>gis=> select highway,count(highway) as num from planet_osm_roads group by
>highway order by num desc limit 10;
> highway | num
>----------------+--------
> secondary | 323227
> primary | 175034
> motorway_link | 119377
> motorway | 53166
> trunk | 36471
> trunk_link | 11810
> primary_link | 8441
> secondary_link | 458
> residential | 433
> unclassified | 113
>(10 rows)
>
>This query took just a few seconds to run.
>
>Because the map features are stored in a spatial format it is also
>relatively easy obtain geo-referenced results without needing to deal
>with the node+way hierarchy. For more details see:
>http://trac.openstreetmap.org/browser/applications/utils/export/osm2pgsql/R
>EADME.txt
>
> Jon
>
>
>
>* The import of the planet on tile.openstreetmap.org this week took 8
>hours. The Postgres DB is 7GB, but the peak disk usage is probably in
>the region of 10 - 20GB during the import. Tile has 2GB of RAM and I'd
>recommend this as the useful minimum for the full planet import. 1GB is
>probably the real minimum but will need 1-2GB of swap space and will be
>slower.
>
> Jon
>
Jon, Thanks for this. Most useful. Quite a bit of it is not evident on the
wiki (mainly because planet has grown so much so quickly I guess) so I'll
add some updates once I have any other responses in.
Cheers
Andy
More information about the dev
mailing list