[OSM-dev] Ideas for speeding up the TIGER import

Sat Sep 1 17:09:57 BST 2007

On 9/1/07, Jon Burgess <jburgess777 at googlemail.com> wrote:
> I think it may be wise to step back a minute and take a look at just how
> to handle the TIGER data import.

<snip>

> Currently the OSM DB has around 36M objects so we are talking about
> growing the DB by a factor of 20 times. I think we can probably do
> better then simply importing it all via the API on the production
> system.

Given the current setup, I don't tihnk running the script on the main
server will be any faster than currently. The bottleneck is the
database, nothing will change that.

> One idea I had to improve the import speed is as follows:

It is absolutly true that we could import it faster. But I don't think
we should do something drastic.

- With the current method we have gradual growth of the DB, which
means if there's an issue we have more time to deal with it.
- The current method is weeding out bugs in the system. TIGER isn't
going to be the only import and the experience we get now will help
with later imports.
- In the long term the speed will come from something like osmosis
writing to the DB directly. It's not quite ready yet but one day it
will be.

And finally:
- In terms of bang for buck the biggest benefits are going to come
from a bulk-upload interface in the server.

Actually, I'm happy that the upload will take a year. It encourages
people who are in the US to request their area be done earlier and
they can work on it. In a sense uploading data where there are no
users isn't very helpful, because no-one is going to check it. I think
if we want to build a community you can't give all the data in one go,
it gives people the idea there's nothing left to do. The delay makes
people think about the process, and that's always very important.

Have a nice day,
-- 
Martijn van Oosterhout <kleptog at gmail.com> http://svana.org/kleptog/