[OSM-dev] Messing with planet
Jon Burgess
jburgess777 at googlemail.com
Sat Dec 22 12:39:09 GMT 2007
On Sat, 2007-12-22 at 11:24 +0000, Andy Robinson (blackadder) wrote:
> I tried to uncompress the latest planet last night (first time in quite a
> while) which failed when the disk on my hand-me-down dev box was full :-(
>
> Would be helpful if those with planet experience could share your current
> method of processing and analysing planet and the amount of disk and memory
> you think is the current minimum.
It depends, but in general I think you want your tools to be able to
read a .bz2 or .gz file directly or be able to read from STDIN (so you
can pipe from bzcat etc). When I was working a lot with the planet files
I would convert the .bz2 to a .gz file since that is much faster to
decompress (if you're going to be reading it a lot).
The memory requirement entirely depends on how much state information
you need to keep while processing the file. Don't expect to be able to
keep the whole planet file in RAM unless you can devise a very compact
representation of the data (even then you'll probably need >1GB of ram).
> For instance, do all the tools require an
> uncompressed planet (I'm thinking osmosis I guess now?).
Osmosis will read .gz or .bz2 files directly:
--read-xml compressionMethod=bzip2 ...
> Also any guidance
> on free HDD space size and memory settings when processing planet and also
> if importing it into a blank mysql database. Is import of the whole planet
> into the rails database scheme a workable option for an individual anymore?
I have not looked at importing the planet into MySQL for a few months.
Even then, it would takes several hours to import into MySQL and a
couple of GB of disk space. The planet files has grown several times the
size since then so I imagine it might easily take a day to import it now
and over 10GB of disk space.
I think you'd be much better off importing a subset, like the UK planet:
http://nick.dev.openstreetmap.org/downloads/planet/
> These are questions that were not really an issue 6 months ago when planet
> was so much smaller. I appreciate once I have got back up to speed I could
> be using the diff files to limit requirements.
>
> Specifically I wanted to be working on two aspects over the festive season.
> 1. some evaluation of the rails stuff now that I have a working rails port
> 2. some tag analysis and user stats stuff
The user information is not present in the planet dumps so that may be
tricky.
My recommendation for any analysis would be to use a streaming solution,
reading the file from STDIN.
Alternatively osm2pgsql could be useful for you. It will import the
planet file in a few hours*. The keys get mapped into columns in a
couple of tables. Depending on what you need the existing tables may be
OK for you, or you might need to adjust the list of exported tags in the
source if you need more keys. If you are good with SQL then you can get
all kinds of stats from the database tables,
e.g. To retrieve the top 10 highway= values
gis=> select highway,count(highway) as num from planet_osm_roads group by highway order by num desc limit 10;
highway | num
----------------+--------
secondary | 323227
primary | 175034
motorway_link | 119377
motorway | 53166
trunk | 36471
trunk_link | 11810
primary_link | 8441
secondary_link | 458
residential | 433
unclassified | 113
(10 rows)
This query took just a few seconds to run.
Because the map features are stored in a spatial format it is also
relatively easy obtain geo-referenced results without needing to deal
with the node+way hierarchy. For more details see:
http://trac.openstreetmap.org/browser/applications/utils/export/osm2pgsql/README.txt
Jon
* The import of the planet on tile.openstreetmap.org this week took 8
hours. The Postgres DB is 7GB, but the peak disk usage is probably in
the region of 10 - 20GB during the import. Tile has 2GB of RAM and I'd
recommend this as the useful minimum for the full planet import. 1GB is
probably the real minimum but will need 1-2GB of swap space and will be
slower.
Jon
More information about the dev
mailing list