[OSM-dev] Ideas for speeding up the TIGER import

Robert (Jamie) Munro rjmunro at arjam.net
Sat Sep 1 18:54:22 BST 2007

Jon Burgess wrote:
> I think our present Tiger import process is fine for medium sized data
> sets, just not for Tiger sized data sets. If we take the AND data as a
> more typical example, it has ~10M objects. The tiger import is
> progressing at ~30 objects/s which means the AND import would complete
> in around 4 days. This seems reasonable to me. 

In one way to look at it, TIGER isn't one data set, it's 3000 small data
sets, one for each US county.

> I don't see how many more TIGER sized imports we are likely to get. If
> we do get some more then it is worth spending some time hand-holding the
> import process. 

TIGER covers about 6% of the worlds land area, and probably more than
that percentage of the worlds roads. I don't think many data sets exist
that are much bigger than that. Certainly not any free ones.

> Part of the reason why I wrote the original email was also to try to
> alert everyone to the sheer size of the data we are attempting to pull
> in. Once complete, the existing OSM data will be just 5% of the combined
> data set.
> We should not under estimate how many things are going to get broken by
> importing all the tiger data, e.g. 
> - the current disk space in DB.

Disks are cheap. I think the foundation could easily raise a few
thousand pounds and buy a new high-specification DB server. In fact,
they may have the money already.

Robert (Jamie) Munro
