[OSM-dev] Potlatch - Richard... I know

Tue Mar 31 16:45:05 BST 2009

Matt Amos wrote:
> On Tue, Mar 31, 2009 at 4:11 PM, 80n <80n80n at gmail.com> wrote:
>> Let me see if I understand correctly what you are saying.  You think that
>> the currently proposed migration from the old server and old schema to the
>> new server with a new schema that includes some referential integrity
>> constraints will take longer than the time that has been planned for it?
> 
> i think that is what stefan is saying, yes.

[I already pressed the reply button, but I was distracted]

Yes, I am saying a weekend is optimistic. If you do want to do this in 
one weekend I would suggest the following. As you can see I am actively 
solving Foreign Key/Duplicate issues on the current dataset until I have 
a planetdump that 1:1 inserts and is ok. From that point on I'll apply 
changesets. If this data is known you could load this date too, then you 
have about two weeks to get the foreign key constraints on the 'clean' 
database to work, and apply all changesets in API0.6 style.

>> I assume that Matt and co. have done some benchmark tests on the migration
>> and have some idea of how long they think it will take.
>>
>> Matt how long do your benchmark tests suggest the migration will take?  How
>> much confidence do you have in your test figures?
> 
> on the computer i have here (2x quad core opteron, 16Gb ram, 4x1Tb
> SATA raid0) the import takes 15h, changeset synthesis takes another
> 15h and the integrity checks + FK take about 10h, so about 40h in
> total. of course, if something goes wrong it doesn't give us much
> chance to fix it.
> 
> on the new db server we're 3.5h into the import with only 1 table
> remaining, so i'm very hopeful that will finish well before 15h.
> changeset synthesis and integrity checks should get a similar speedup
> from the faster hardware.
> 
> i am confident we can do the migration in the 4-day window that we're
> committed to, but i can't guarantee anything.

The system that I am now working on in respect to disk I guess 5x 1T and 
  64GB of RAM, 8x E5410. The 'import' after osmsucker processed it to 
CSV takes 1h (current tables), the integrity checks (using select) in 
totally take ~5h [it is really useless to directly do an alter table, 
since you will only get *one* violator not all], the applying of the 
foreign keys currently takes a little bit more than that.

You might improve speed by fixing table by table, instead of inserting 
everything in one bunch. So for example:

copy_into nodes
apply pk of nodes
copy_into node_tags
apply unique on node, k(, v)
apply fk on node, nodes(id)

Opposed to copy_into everything in one bunch. I might even suggest to 
take the sequence of:
nodes, node_tags, way_nds, ways, way_tags.

Backed by the amount of records to check.

COPY 327150289 RECORDS INTO nodes_legacy from 
'/export/data1/konink/nodes.csv' USING DELIMITERS ',', '\n', '''';
COPY 823503337 RECORDS INTO node_tags from 
'/export/data1/konink/node_tags.csv' USING DELIMITERS ',', '\n', '''';
COPY 26397315 RECORDS INTO ways from '/export/data1/konink/ways.csv' 
USING DELIMITERS ',', '\n', '''';
COPY 187146049 RECORDS INTO way_tags from 
'/export/data1/konink/way_tags.csv' USING DELIMITERS ',', '\n', '''';
COPY 361253470 RECORDS INTO way_nds from 
'/export/data1/konink/way_nds.csv' USING DELIMITERS ',', '\n', '''';
COPY 89224 RECORDS INTO relations from 
'/export/data1/konink/relations.csv' USING DELIMITERS ',', '\n', '''';
COPY 375830 RECORDS INTO relation_tags from 
'/export/data1/konink/relation_tags.csv' USING DELIMITERS ',', '\n', '''';
COPY 74041 RECORDS INTO relation_members_node from 
'/export/data1/konink/relation_member_node.csv' USING DELIMITERS ',', 
'\n', '''';

Stefan