[OSM-dev] Effort in the US wasted until TIGER import is complete?

Wed Mar 28 05:25:10 BST 2007

Don,
Sorry, I've been a little busy this week, I'm going to start setting up 
the server on Thursday. I've got about 3/4 of the states downloaded (4.8 
GB zipped). I was going to load Debian on the server, are there any 
objections? After looking at the TIGER data, I've started thinking that 
it might work better to import the data as is into a mySQL database. 
Some of the files are just records of keys from other records in other 
files, ect, ect, ect. This could be why the original scrips only 
imported info from one of the files. There is much more information 
available but you need to be able to join data from different files. 
Going county by county will also be a problem if the data is imported 
from the files. Case in point look at I-70, how many counties does I-70 
go through as it crosses the country? If you go county by county each 
segment will be a way, rather then the whole road. If the data is pulled 
from a database with all the TIGER data loaded then all of I-70 can be 
treated as a way. Another observation I have made is that the TIGER data 
at it's basic format is a name (some times) and two sets of GPS cords. 
To convert this to the OSM each set of cords would be a node. But three 
segments in a row would share two nodes. This was the problem with the 
last import. To solve this, perhaps a simple "select distinct"  from a 
list of  nodes loaded in the database. Let me know what you all think 
but the more I think about it the more I think it will be easier to 
first import the pure TIGER data with no changes, then write code, or 
SPs to move the data to a OSM test db.

later, Nathan

Don Smith wrote:
> Is there a test machine setup?
> I'm still looking at the code. For simplicity's sake, I'd like to 
> remove the ability to daemonize the code, and just run it from the 
> command line as a regular process, with a high priority. If the idea 
> is to do this on a testing db instead of the main db, and then import 
> a dump of the db then I believe this makes sense. Another thing, 
> someone was talking about setting up a copy of osm to import into. 
> Instead I'd like to suggest that this data is just loaded into a blank 
> template until everything works, then worry about merging the us data, 
> either through using something like a temp tag to differentiate it and 
> merging manually, or by doing something programatic after the fact. 
> Whatever the user contributions are I would assume they're much 
> smaller, although probably more accurate than tiger data.
>
> Don Smith
> On Mar 23, 2007, at 1:29 AM, Nathan Rover wrote:
>
>> I'm thinking we need to set up a server with a mirror of the current
>> database, then just run a few counties then eventually a large batch
>> (one state at a time?). Then we can conduct some rigorous testing and if
>> the data looks good and we won't cause too much of problem on the
>> production server, we could then ship the data across the pond (ether
>> over the net or fedex some DVDs). Then one night perhaps an admin could
>> import the data. I've never been a big fan of messing with production
>> servers, and it seems to costly both in time and bandwidth to try and
>> run exports from TIGER data on a box in central Missouri, to the UK, at
>> one or three second intervals. especially when this will requirer lots
>> of testing to make it work correctly.
>>
>> I have the hardware for the mirror, and I'm working on getting the TIGER
>> data. If this is a direction everyone agrees with then the next thing I
>> need is a little guidance on how to set up a server that will be a good
>> software mirror to the production one. The closer the software setup
>> matches the production machine (especially the database system and the
>> APIs ) the better our testing can be and the less likely there will be a
>> problem down the road when we try and integrate the data onto one box.
>>
>> Can someone send me a copy of the ruby code?
>>
>> Nathan Rover
>>
>> Don Smith wrote:
>>> No, I'm currently not familliar with the data model so I should look
>>> into that.
>>> As for a 1 sec/insert cycle, if we don't do it on the primary db, it
>>> makes no immediate sense, and I'd be interested in  timings without
>>> it. I have no idea how many inserts are going on but, I would guess
>>> from a ballpark on the size of tiger that you'd be right.
>>> Again I'll look at the ruby code tonight, and if someone has a schema
>>> for osm that'd be nice. If whoever did the original script could
>>> outline their thinking that would be helpful as well.
>>> I am subscribed to the dev list.
>>> On Mar 23, 2007, at 12:06 AM, Thomas Lunde wrote:
>>>
>>>
>>>> On 3/22/07, Don Smith <dcsmith at gmail.com> wrote:
>>>>
>>>>> Thomas,
>>>>> do you have a machine setup? I'll look at the code tonight, but you
>>>>> seem to have a better grasp of the operations going on. Any ideas in
>>>>> what specifically needs to be done?
>>>>>
>>>>> Right now, the two tasks appear to be aggregation of related segments
>>>>> into ways (?) and the 1 sec insert cycle. I would suggest instead, if
>>>>> possible that we load the data into mysql, and then when it's ready,
>>>>> batch import the sql into the master db(does this make sense? Instead
>>>>> of running the script twice, run it once, dump the db, and then
>>>>> import the dump?).
>>>>>
>>>> Don -
>>>>
>>>> I do have a server that could be used, but it sounds like Nathan has a
>>>> better one.  He is/has downloading/downloaded the latest TIGER data.
>>>> Both of us need to have a better understanding of the OSM current data
>>>> model than at present.  Pointers at particular documentation would be
>>>> helpful, otherwise I'll just look around the site and the code.
>>>>
>>>> What I think I understand is that the 1 sec insert cycle of the old
>>>> Ruby code would still take weeks/months to do an import.  Is that
>>>> right?
>>>>
>>>> If so, it seems that there's got to be a better way. I agree with you
>>>> that using seperate servers to do a higher speed import and then to
>>>> dump the data from DB to DB directly would seem to be the smarter
>>>> approach.
>>>>
>>>> Are you already familiar with the OSM data model and/or with the old
>>>> Ruby import code?
>>>>
>>>> thomas
>>>>
>>>>
>>>>
>>>> Nathan,Don -- if y'all are subscribed to the Dev list, let me know and
>>>> I shan't cc: you directly.
>>>>
>>>
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev at openstreetmap.org
>>> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>>>
>>>
>>>
>>
>>
>>
>>
>> _______________________________________________
>> dev mailing list
>> dev at openstreetmap.org
>> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>
>
>