[OSM-dev] Effort in the US wasted until TIGER import is complete?

Wed Mar 28 05:26:40 BST 2007

Don,
yeah, idea expressed in last e-mail would also solve this problem.

Nathan

Don Smith wrote:
> Tiger contains roads, railroads, various other transportation 
> features, landmarks (such as churches, schools, parks, and cemeteries).
> More info here:
> http://www.census.gov/geo/www/tiger/tiger2006se/tgr2006se.html
>
> I didn't see the importer dealing with the other data, but I haven't 
> checked too closely. A new tiger file was released at the beginning of 
> the month, which claimed to correct some of the data, though in the 
> second half of the year they're moving to shapefiles from their own 
> proprietary text format.
>
> Don Smith
> On Mar 27, 2007, at 11:16 PM, Cory Lueninghoener wrote:
>
>> As someone who has done a fair amount of US mapping over the last
>> couple of months (see the Chicago area), I'm curious: what exactly
>> does the TIGER database hold?  Is it just street "segments" with names
>> and endpoints?  Does it have interstates, house number information,
>> any other street information (size, direction, etc.) or anything else
>> of use?  I definitely look forward to having at least a base for the
>> whole country done within a matter of weeks (months), but assuming
>> we'll still need to tag lots of information and add things like train
>> lines, interstates (?), parks, etc. I'll keep up my manual efforts
>> with plans to port them over when the time comes.
>>
>> On 3/27/07, Don Smith <dcsmith at gmail.com> wrote:
>>> Is there a test machine setup?
>>> I'm still looking at the code. For simplicity's sake, I'd like to
>>> remove the ability to daemonize the code, and just run it from the
>>> command line as a regular process, with a high priority. If the idea
>>> is to do this on a testing db instead of the main db, and then import
>>> a dump of the db then I believe this makes sense. Another thing,
>>> someone was talking about setting up a copy of osm to import into.
>>> Instead I'd like to suggest that this data is just loaded into a
>>> blank template until everything works, then worry about merging the
>>> us data, either through using something like a temp tag to
>>> differentiate it and merging manually, or by doing something
>>> programatic after the fact. Whatever the user contributions are I
>>> would assume they're much smaller, although probably more accurate
>>> than tiger data.
>>>
>>> Don Smith
>>> On Mar 23, 2007, at 1:29 AM, Nathan Rover wrote:
>>>
>>> > I'm thinking we need to set up a server with a mirror of the current
>>> > database, then just run a few counties then eventually a large batch
>>> > (one state at a time?). Then we can conduct some rigorous testing
>>> > and if
>>> > the data looks good and we won't cause too much of problem on the
>>> > production server, we could then ship the data across the pond (ether
>>> > over the net or fedex some DVDs). Then one night perhaps an admin
>>> > could
>>> > import the data. I've never been a big fan of messing with production
>>> > servers, and it seems to costly both in time and bandwidth to try and
>>> > run exports from TIGER data on a box in central Missouri, to the
>>> > UK, at
>>> > one or three second intervals. especially when this will requirer 
>>> lots
>>> > of testing to make it work correctly.
>>> >
>>> > I have the hardware for the mirror, and I'm working on getting the
>>> > TIGER
>>> > data. If this is a direction everyone agrees with then the next
>>> > thing I
>>> > need is a little guidance on how to set up a server that will be a
>>> > good
>>> > software mirror to the production one. The closer the software setup
>>> > matches the production machine (especially the database system and 
>>> the
>>> > APIs ) the better our testing can be and the less likely there will
>>> > be a
>>> > problem down the road when we try and integrate the data onto one 
>>> box.
>>> >
>>> > Can someone send me a copy of the ruby code?
>>> >
>>> > Nathan Rover
>>> >
>>> > Don Smith wrote:
>>> >> No, I'm currently not familliar with the data model so I should look
>>> >> into that.
>>> >> As for a 1 sec/insert cycle, if we don't do it on the primary db, it
>>> >> makes no immediate sense, and I'd be interested in  timings without
>>> >> it. I have no idea how many inserts are going on but, I would guess
>>> >> from a ballpark on the size of tiger that you'd be right.
>>> >> Again I'll look at the ruby code tonight, and if someone has a 
>>> schema
>>> >> for osm that'd be nice. If whoever did the original script could
>>> >> outline their thinking that would be helpful as well.
>>> >> I am subscribed to the dev list.
>>> >> On Mar 23, 2007, at 12:06 AM, Thomas Lunde wrote:
>>> >>
>>> >>
>>> >>> On 3/22/07, Don Smith <dcsmith at gmail.com> wrote:
>>> >>>
>>> >>>> Thomas,
>>> >>>> do you have a machine setup? I'll look at the code tonight, but 
>>> you
>>> >>>> seem to have a better grasp of the operations going on. Any
>>> >>>> ideas in
>>> >>>> what specifically needs to be done?
>>> >>>>
>>> >>>> Right now, the two tasks appear to be aggregation of related
>>> >>>> segments
>>> >>>> into ways (?) and the 1 sec insert cycle. I would suggest
>>> >>>> instead, if
>>> >>>> possible that we load the data into mysql, and then when it's
>>> >>>> ready,
>>> >>>> batch import the sql into the master db(does this make sense?
>>> >>>> Instead
>>> >>>> of running the script twice, run it once, dump the db, and then
>>> >>>> import the dump?).
>>> >>>>
>>> >>> Don -
>>> >>>
>>> >>> I do have a server that could be used, but it sounds like Nathan
>>> >>> has a
>>> >>> better one.  He is/has downloading/downloaded the latest TIGER 
>>> data.
>>> >>> Both of us need to have a better understanding of the OSM current
>>> >>> data
>>> >>> model than at present.  Pointers at particular documentation
>>> >>> would be
>>> >>> helpful, otherwise I'll just look around the site and the code.
>>> >>>
>>> >>> What I think I understand is that the 1 sec insert cycle of the old
>>> >>> Ruby code would still take weeks/months to do an import.  Is that
>>> >>> right?
>>> >>>
>>> >>> If so, it seems that there's got to be a better way. I agree with
>>> >>> you
>>> >>> that using seperate servers to do a higher speed import and then to
>>> >>> dump the data from DB to DB directly would seem to be the smarter
>>> >>> approach.
>>> >>>
>>> >>> Are you already familiar with the OSM data model and/or with the 
>>> old
>>> >>> Ruby import code?
>>> >>>
>>> >>> thomas
>>> >>>
>>> >>>
>>> >>>
>>> >>> Nathan,Don -- if y'all are subscribed to the Dev list, let me
>>> >>> know and
>>> >>> I shan't cc: you directly.
>>> >>>
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> dev mailing list
>>> >> dev at openstreetmap.org
>>> >> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>>> >>
>>> >>
>>> >>
>>> >
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > dev mailing list
>>> > dev at openstreetmap.org
>>> > http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>>>
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev at openstreetmap.org
>>> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>>>
>>
>>
>> --Cory Lueninghoener
>> Perl, C, & Linux Hacker
>> http://www.wirelesscouch.net/~cluening/
>
>
>