[OSM-dev] production site disk use question

Michal Migurski mike at teczno.com
Tue Jan 1 19:36:13 GMT 2013


Hi Jeff,

One approach I've taken in this situation in the past is to do my importing on a temporary EC2 instance to get a bit of extra disk space, then save only the parts I need. EC2 is not known for speedy I/O, but the instance storage they make available (typically mounted at /mnt) is actually local to the running instance and quite fast to read from once you've written to it. You can also take advantage of spot pricing to bring your costs down substantially for these temporary machines, and use the high-RAM instance types to give yourself a bit of extra breathing room.

-mike.

On Jan 1, 2013, at 9:52 AM, Jeff Meyer wrote:

> Frederik, Sly - 
> 
> Thanks for that help! I clearly need to do something differently. Upped my VPS disk size & the same action filled up the disk after using ~86GB. 1.7GB .pbf to 86GB+ apidb! Yikes! I've added some info & warnings to the Rails Port page - http://wiki.openstreetmap.org/wiki/The_Rails_Port#Populating_the_database
> 
> Frederik - 
> 
> My guess is that I only need current tables - I'm setting up an instance of the OSM Stack to see how useful it might be for creating a shared environment for mapping cities and territories through history. (http://wiki.openstreetmap.org/wiki/OSM-Historic) My plan was to take the shorelines and natural features out of planet.osm, use that as a baseline, and then let people add historical information from there, as well as figuring out what tools need to be built to support this time-enabled concept. So, it seems like I'll need both the history & the current tables. In a non-history planet.osm extract, should those tables be the same in an initial import?
> 
> Thanks!
> Jeff
> 
> 
> 
> On Tue, Jan 1, 2013 at 3:42 AM, Frederik Ramm <frederik at remote.org> wrote:
> Hi,
> 
> 
> On 31.12.2012 18:25, Jeff Meyer wrote:
> For example, I just tried importing a 1.7GB planet-reduced.pbf into my
> rails port osm db and it failed after ~30 hrs because I ran out of disk
> space after it had eaten up 50GB of disk. Bad planning on my part, but
> how should I budget for this?
> 
> In addition to Sly's data:
> 
> A typical "apidb" setup has two sets of tables - "current" tables that have only the last version of each object, and "history" tables that contain every version (they don't have "history" in the name - the current nodes table is called current_nodes, and the history nodes table is called just nodes).
> 
> This means that if you import data from a non-history planet into an apidb database, you'll have everyting twice.
> 
> Depending on what you want to do with the data, you might really need that - or you might not. For example, if you wanted to run a read-only API that gives you data for a given bbox, only the "current" tables are required. For some other types of queries, only the history tables are be required.
> 
> So it might be possible for you to take a shortcut by importing things only once. Osmosis has an option called "populateCurrentTables" which is on by default, but you can switch that off and it will only create history tables. If you have an use case that only needs current tables, then Osmosis doesn't offer that but you could actually achieve that by creating views on the history tables, instead of copies. This will save time and space; of course if you do that then you can't apply updates to your database without breaking the views.
> 
> Bye
> Frederik
> 
> -- 
> Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00'09" E008°23'33"
> 
> 
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/dev
> 
> 
> 
> -- 
> Jeff Meyer
> Global World History Atlas
> www.gwhat.org
> jeff at gwhat.org
> 206-676-2347
> 
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/dev

----------------------------------------------------------------
michal migurski- contact info and pgp key:
sf/ca            http://mike.teczno.com/contact.html







More information about the dev mailing list