[OSM-dev] dev Digest, Vol 105, Issue 11

Michael Kussmaul kussmaul.list at nix.ch
Wed Dec 18 12:37:16 UTC 2013


I have a very similar setup: A cheap machine ($600 - 16GB RAM) and some SSDs - although I also started with a single 200GB SSD. I have this machine in production for a year now and produce weekly updates to my data - it's not super fast, but a weekly diff import takes 1-2 h. Full import still takes approx. 48 h the last time I tried.

1.) Why not buy a second SSD? You could then either strip those two SSD together on operating-system level (e.g. RAID-0 or mdadm stripe) or just use several table-spaces on those different disks.

2.) Table-spaces: PostgreSQL and osm2pgsql supports table-spaces - this basically means you can decide on table granularity which tables you want to store on normal disk or SSD. You can even migrate those tables later from one tablespace to another. In osm2pgsql take a look at the --tablespace-XXXX parameters. Additionally use the flat-node mode, as this uses less disk-space. My osm2pgsql command for initial import looks like this:

./osm2pgsql -r pbf --create --latlong --database planet --username gis --prefix planet --slim --style /home/map/import/tools/hstore.style --cache 12000 --number-processes 4 --tablespace-main-index vertex --tablespace-main-data vertex --tablespace-slim-index samsung --tablespace-slim-data samsung --hstore --flat-nodes /media/vertex/import_index/node.cache planet.osm.pbf

(I have two SSDs and created two tablespaces: vertex and samsung - to reflect my SSDs) - of course if you place more data on normal disk, the import-time will suffer.

3.) Perhaps you can take a look at http://imposm.org - an alternative to osm2pqsql - have not tried it yet though...

Michael

> Hello,
> 
> on my day-job I recently had to solve the Problem of setting up a Postgis
> database contaning a full-planet extract using one of those cheap 180GB SSD
> on a semi-potent machine (only 8GB of RAM).
> 
> I first tried to use osm2pgsql for this purpose which is almost impossible
> for a couple of reasons:
> 
> * I would have needed more than 180GB of disk space (at least during
> import)
> * The import would take a _very_ long time (several days rather than hours)
> 
> So I had to look for a backup strategy!
> 
> I found one which proved to be that good, that I would like to discuss it
> here as an alternative way for rendering-database setup.
> 
> The first thing I discovered is the fact, that we currently use roughly
> twice
> the disk-space for intermediate tables (or file in case of flatnode) than
> for the processed data itself.
> 
> Here is how this looks like on tile.openstreetmap.de:
> 
>         relation          | total_size
> ---------------------------+------------
> public.planet_osm_line    | 61 GB
> public.planet_osm_polygon | 59 GB
> public.planet_osm_point   | 10 GB
> public.planet_osm_roads   |  8 GB
> 
> public.planet_osm_ways    | 174 GB
> public.planet_osm_rels    |   4 GB
> 
> + flatnode.dat 20 GB
> 
> processed data: 61GB+59GB+10GB+8GB=138GB
> intermediate data: 20GB+174GB+4GB=202GB
> 
> The most annoying part is the 174GB planet_osm_ways table.
> 
> So what I did to solve my problem was using pg_dump for the processed data
> tables only and setting up my target database using pg_restore.
> 
> Advantages:
> * Very fast data import even on machines where import using osm2pgsql would
>  be practically impossible
> * Decent size of database dump (32GB in case of a --hstore-match-only
> database
>  which is about the same size as a planet dump)
> 
> Disadvantages:
> * Currently no update strategy available
> * Will need a "master" osm2pgsql database
> * Will "inherit" table scheme from "master" database
> 
> Conclusion:
> 
> IMO the disadvantages can be resolved or at least mitigated in the
> following
> way:
> 
> * we generate a downloadable dump of the database on
>  tile.openstreetmap.de (e.g. weekly)
> * osm2pgsql needs to be patched to output the changes to the processed data
>  tables as SQL commands which can then be used to replicated the slave
> databases
> * We already use --hstore-match-only database format. So flexibility in
>  table-layout is not that much of a concern as views to hstore column can
> be used
>  for rendering instead of tables.
> 
> I would like to hear your comments to this proposial.
> 
> Regards
> 
> Sven
> 
> --
> Threading is a performance hack.
> (The Art of Unix Programming by Eric S. Raymond)
> 
> /me is giggls at ircnet, http://sven.gegg.us/ on the Web
> 
> 
> --
> "Thinking of using NT for your critical apps?
>                                  Isn't there enough suffering in the
> world?"
>                   (Advertisement of Sun Microsystems in Wall Street
> Journal)
> /me is giggls at ircnet, http://sven.gegg.us/ on the Web
> 
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/dev



More information about the dev mailing list