[OSM-dev] Problems with Planet-Excerpt Import into apidb

Fri May 22 15:53:04 BST 2009

Hi,

Am Fr, 22. Mai 2009 schrieb Brett Henderson:
> Holger Schöner wrote:
> > Okay, now the import of Europe (still the same data of about two weeks
> > ago) worked at least until before copying the data into the current
> > tables. Because this copying took forever again, I am now going to try
> > it without first running the second part of the setup scripts (creation
> > of keys and indexes).
[...]
> Let us know how you go.  If you come up with a good sequence of steps to
> follow we can document it as notes to the --write-apidb task on the
> osmosis detailed usage page.
> http://wiki.openstreetmap.org/wiki/Osmosis/DetailedUsage#--write-apidb_.2
>8--wd.29

I have now succeeded in importing the Europe extract (I think at least, I 
have not used it for anything ...). If my further usage of the database is 
going to work well and nobody has objections or knows a better way, I can 
put the steps into the wiki. The sequence of steps I used is (altogether 
taking about one and a half days, with breaks when I was not at the 
computer):

- Create db:
  sudo -u postgres createdb -E UTF8 -O <pguser> <dbname>

- Load first part of db schema into db (containing everything up to the
  last COPY statement):
  psql -d <dbname> -f apidb06-pgsql-latest-1.sql

- Import into history tables (one line ...):
  bzcat <osmdata>.osm.bz2 | JAVACMD_OPTIONS='-Xmx1536m' osmosis --read-
xml-0.6 file=/dev/stdin --write-apidb-0.6 lockTables=yes 
populateCurrentTables=no validateSchemaVersion=no database=<dbname>

- Copy  into current tables (e.g. in psql command line; is this the same
  osmosis is doing with the populateCurrentTables=yes option?):
  INSERT INTO current_nodes (id, latitude, longitude, changeset_id, visible, 
timestamp, tile, version) SELECT id, latitude, longitude, changeset_id, 
visible, timestamp, tile, version FROM nodes;
  INSERT INTO current_node_tags (id, k, v) SELECT id, k, v FROM node_tags;
  INSERT INTO current_relations (id, changeset_id, timestamp, visible, 
version) SELECT id, changeset_id, timestamp, visible, version FROM 
relations;
  INSERT INTO current_relation_tags (id, k, v) SELECT id, k, v FROM 
relation_tags;
  INSERT INTO current_relation_members (id, member_id, member_role, 
member_type, sequence_id) SELECT id, member_id, member_role, member_type, 
sequence_id FROM relation_members;
  INSERT INTO current_ways (id, changeset_id, timestamp, visible, version) 
SELECT id, changeset_id, timestamp, visible, version FROM ways;
  INSERT INTO current_way_tags (id, k, v) SELECT id, k, v FROM way_tags;
  INSERT INTO current_way_nodes (id, node_id, sequence_id) SELECT id, 
node_id, sequence_id FROM way_nodes;

- Create keys and indexes (everything after the last COPY statement):
  psql -d <dbname> -f apidb06-pgsql-latest-2.sql       

> I'd like to create a new task that creates PostgreSQL COPY files instead
> of talking directly to the db.  These are typically a much faster way of
> doing bulk imports.  But I won't get to it for a while.

This sounds to be a really good idea, although I do not have enough 
experience with postgresql to really know.

-- 
Holger Schoener		numenor at ancalime.de