[OSM-dev] Ideas for speeding up the TIGER import

Sat Sep 1 16:13:38 BST 2007

I think it may be wise to step back a minute and take a look at just how
to handle the TIGER data import. 

According to the stats page Tiger is adding 758M objects. 
http://dev.openstreetmap.org/~daveh/tiger/stats.html

Currently the OSM DB has around 36M objects so we are talking about
growing the DB by a factor of 20 times. I think we can probably do
better then simply importing it all via the API on the production
system.

One idea I had to improve the import speed is as follows:

- Setup an independent machine with the API running on an empty DB

- Tweak the rails code so that it starts allocation all table IDs from
some offset like 100M+ which we know are not used in the present OSM DB
(maybe creating some phony DB entries with ID=100M would be sufficient
for this).

- Turn off unnecessary indexes (some will probably still be required
since the RAILs code will probably be checking nodes are visible while
importing segments etc.). 

- Disable all disk syncs and enable aggressive write caching in the DB
engine. If we get a power failure or crash during the import then we can
pull in the data again (if it still takes weeks to import then backups
can be taken at appropriate intervals instead of starting from scratch).

The existing DB server has insufficient disk space to hold all the Tiger
data. There is a tentative plan to replace this server to a new machine
with much more disk space & RAM. Perhaps importing the Tiger data on to
this machine directly and then pulling in the current OSM data on top of
a Tiger DB on this new machine may be the best way to go.

	Jon