[OSM-dev] An alternative rendering-database setup and update strategy!
Sven Geggus
lists at fuchsschwanzdomain.de
Mon Nov 25 11:16:56 UTC 2013
Hello,
on my day-job I recently had to solve the Problem of setting up a Postgis
database contaning a full-planet extract using one of those cheap 180GB SSD
on a semi-potent machine (only 8GB of RAM).
I first tried to use osm2pgsql for this purpose which is almost impossible
for a couple of reasons:
* I would have needed more than 180GB of disk space (at least during import)
* The import would take a _very_ long time (several days rather than hours)
So I had to look for a backup strategy!
I found one which proved to be that good, that I would like to discuss it
here as an alternative way for rendering-database setup.
The first thing I discovered is the fact, that we currently use roughly twice
the disk-space for intermediate tables (or file in case of flatnode) than
for the processed data itself.
Here is how this looks like on tile.openstreetmap.de:
relation | total_size
---------------------------+------------
public.planet_osm_line | 61 GB
public.planet_osm_polygon | 59 GB
public.planet_osm_point | 10 GB
public.planet_osm_roads | 8 GB
public.planet_osm_ways | 174 GB
public.planet_osm_rels | 4 GB
+ flatnode.dat 20 GB
processed data: 61GB+59GB+10GB+8GB=138GB
intermediate data: 20GB+174GB+4GB=202GB
The most annoying part is the 174GB planet_osm_ways table.
So what I did to solve my problem was using pg_dump for the processed data
tables only and setting up my target database using pg_restore.
Advantages:
* Very fast data import even on machines where import using osm2pgsql would
be practically impossible
* Decent size of database dump (32GB in case of a --hstore-match-only database
which is about the same size as a planet dump)
Disadvantages:
* Currently no update strategy available
* Will need a "master" osm2pgsql database
* Will "inherit" table scheme from "master" database
Conclusion:
IMO the disadvantages can be resolved or at least mitigated in the following
way:
* we generate a downloadable dump of the database on
tile.openstreetmap.de (e.g. weekly)
* osm2pgsql needs to be patched to output the changes to the processed data
tables as SQL commands which can then be used to replicated the slave databases
* We already use --hstore-match-only database format. So flexibility in
table-layout is not that much of a concern as views to hstore column can be used
for rendering instead of tables.
I would like to hear your comments to this proposial.
Regards
Sven
--
Threading is a performance hack.
(The Art of Unix Programming by Eric S. Raymond)
/me is giggls at ircnet, http://sven.gegg.us/ on the Web
--
"Thinking of using NT for your critical apps?
Isn't there enough suffering in the world?"
(Advertisement of Sun Microsystems in Wall Street Journal)
/me is giggls at ircnet, http://sven.gegg.us/ on the Web
More information about the dev
mailing list