[OSM-dev] Effort in the US wasted until TIGER import is complete?

Don Smith dcsmith at gmail.com
Wed Mar 28 06:44:24 BST 2007


I agree with the idea of putting it in sql first. I think it would  
produce more reasonable data as you're correct that dealing with  
interstates, or even state routes across multiple counties would be a  
problem. My only concern is that using a database is always slower  
than file io and memory, especially with large record counts. However  
fast and wrong is worse than slow and right.

Did you get the latest release of the data (March 3?).

I assume you would need something to load the data, looking at the  
tiger data dictionary this does not seem to bad as each column has a  
fixed width, and strings could be trimmed. Is the machine you have in  
mind for the testing environment somewhat substantial? Also no  
objections to debian.

Don Smith
On Mar 28, 2007, at 12:26 AM, Nathan Rover wrote:

> Don,
> yeah, idea expressed in last e-mail would also solve this problem.
>
> Nathan
>
> Don Smith wrote:
>> Tiger contains roads, railroads, various other transportation  
>> features, landmarks (such as churches, schools, parks, and  
>> cemeteries).
>> More info here:
>> http://www.census.gov/geo/www/tiger/tiger2006se/tgr2006se.html
>>
>> I didn't see the importer dealing with the other data, but I  
>> haven't checked too closely. A new tiger file was released at the  
>> beginning of the month, which claimed to correct some of the data,  
>> though in the second half of the year they're moving to shapefiles  
>> from their own proprietary text format.
>>
>> Don Smith
>> On Mar 27, 2007, at 11:16 PM, Cory Lueninghoener wrote:
>>
>>> As someone who has done a fair amount of US mapping over the last
>>> couple of months (see the Chicago area), I'm curious: what exactly
>>> does the TIGER database hold?  Is it just street "segments" with  
>>> names
>>> and endpoints?  Does it have interstates, house number information,
>>> any other street information (size, direction, etc.) or anything  
>>> else
>>> of use?  I definitely look forward to having at least a base for the
>>> whole country done within a matter of weeks (months), but assuming
>>> we'll still need to tag lots of information and add things like  
>>> train
>>> lines, interstates (?), parks, etc. I'll keep up my manual efforts
>>> with plans to port them over when the time comes.
>>>
>>> On 3/27/07, Don Smith <dcsmith at gmail.com> wrote:
>>>> Is there a test machine setup?
>>>> I'm still looking at the code. For simplicity's sake, I'd like to
>>>> remove the ability to daemonize the code, and just run it from the
>>>> command line as a regular process, with a high priority. If the  
>>>> idea
>>>> is to do this on a testing db instead of the main db, and then  
>>>> import
>>>> a dump of the db then I believe this makes sense. Another thing,
>>>> someone was talking about setting up a copy of osm to import into.
>>>> Instead I'd like to suggest that this data is just loaded into a
>>>> blank template until everything works, then worry about merging the
>>>> us data, either through using something like a temp tag to
>>>> differentiate it and merging manually, or by doing something
>>>> programatic after the fact. Whatever the user contributions are I
>>>> would assume they're much smaller, although probably more accurate
>>>> than tiger data.
>>>>
>>>> Don Smith
>>>> On Mar 23, 2007, at 1:29 AM, Nathan Rover wrote:
>>>>
>>>> > I'm thinking we need to set up a server with a mirror of the  
>>>> current
>>>> > database, then just run a few counties then eventually a large  
>>>> batch
>>>> > (one state at a time?). Then we can conduct some rigorous testing
>>>> > and if
>>>> > the data looks good and we won't cause too much of problem on the
>>>> > production server, we could then ship the data across the pond  
>>>> (ether
>>>> > over the net or fedex some DVDs). Then one night perhaps an admin
>>>> > could
>>>> > import the data. I've never been a big fan of messing with  
>>>> production
>>>> > servers, and it seems to costly both in time and bandwidth to  
>>>> try and
>>>> > run exports from TIGER data on a box in central Missouri, to the
>>>> > UK, at
>>>> > one or three second intervals. especially when this will  
>>>> requirer lots
>>>> > of testing to make it work correctly.
>>>> >
>>>> > I have the hardware for the mirror, and I'm working on getting  
>>>> the
>>>> > TIGER
>>>> > data. If this is a direction everyone agrees with then the next
>>>> > thing I
>>>> > need is a little guidance on how to set up a server that will  
>>>> be a
>>>> > good
>>>> > software mirror to the production one. The closer the software  
>>>> setup
>>>> > matches the production machine (especially the database system  
>>>> and the
>>>> > APIs ) the better our testing can be and the less likely there  
>>>> will
>>>> > be a
>>>> > problem down the road when we try and integrate the data onto  
>>>> one box.
>>>> >
>>>> > Can someone send me a copy of the ruby code?
>>>> >
>>>> > Nathan Rover
>>>> >
>>>> > Don Smith wrote:
>>>> >> No, I'm currently not familliar with the data model so I  
>>>> should look
>>>> >> into that.
>>>> >> As for a 1 sec/insert cycle, if we don't do it on the primary  
>>>> db, it
>>>> >> makes no immediate sense, and I'd be interested in  timings  
>>>> without
>>>> >> it. I have no idea how many inserts are going on but, I would  
>>>> guess
>>>> >> from a ballpark on the size of tiger that you'd be right.
>>>> >> Again I'll look at the ruby code tonight, and if someone has  
>>>> a schema
>>>> >> for osm that'd be nice. If whoever did the original script could
>>>> >> outline their thinking that would be helpful as well.
>>>> >> I am subscribed to the dev list.
>>>> >> On Mar 23, 2007, at 12:06 AM, Thomas Lunde wrote:
>>>> >>
>>>> >>
>>>> >>> On 3/22/07, Don Smith <dcsmith at gmail.com> wrote:
>>>> >>>
>>>> >>>> Thomas,
>>>> >>>> do you have a machine setup? I'll look at the code tonight,  
>>>> but you
>>>> >>>> seem to have a better grasp of the operations going on. Any
>>>> >>>> ideas in
>>>> >>>> what specifically needs to be done?
>>>> >>>>
>>>> >>>> Right now, the two tasks appear to be aggregation of related
>>>> >>>> segments
>>>> >>>> into ways (?) and the 1 sec insert cycle. I would suggest
>>>> >>>> instead, if
>>>> >>>> possible that we load the data into mysql, and then when it's
>>>> >>>> ready,
>>>> >>>> batch import the sql into the master db(does this make sense?
>>>> >>>> Instead
>>>> >>>> of running the script twice, run it once, dump the db, and  
>>>> then
>>>> >>>> import the dump?).
>>>> >>>>
>>>> >>> Don -
>>>> >>>
>>>> >>> I do have a server that could be used, but it sounds like  
>>>> Nathan
>>>> >>> has a
>>>> >>> better one.  He is/has downloading/downloaded the latest  
>>>> TIGER data.
>>>> >>> Both of us need to have a better understanding of the OSM  
>>>> current
>>>> >>> data
>>>> >>> model than at present.  Pointers at particular documentation
>>>> >>> would be
>>>> >>> helpful, otherwise I'll just look around the site and the code.
>>>> >>>
>>>> >>> What I think I understand is that the 1 sec insert cycle of  
>>>> the old
>>>> >>> Ruby code would still take weeks/months to do an import.  Is  
>>>> that
>>>> >>> right?
>>>> >>>
>>>> >>> If so, it seems that there's got to be a better way. I agree  
>>>> with
>>>> >>> you
>>>> >>> that using seperate servers to do a higher speed import and  
>>>> then to
>>>> >>> dump the data from DB to DB directly would seem to be the  
>>>> smarter
>>>> >>> approach.
>>>> >>>
>>>> >>> Are you already familiar with the OSM data model and/or with  
>>>> the old
>>>> >>> Ruby import code?
>>>> >>>
>>>> >>> thomas
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> Nathan,Don -- if y'all are subscribed to the Dev list, let me
>>>> >>> know and
>>>> >>> I shan't cc: you directly.
>>>> >>>
>>>> >>
>>>> >>
>>>> >> _______________________________________________
>>>> >> dev mailing list
>>>> >> dev at openstreetmap.org
>>>> >> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>>>> >>
>>>> >>
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > dev mailing list
>>>> > dev at openstreetmap.org
>>>> > http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>>>>
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev at openstreetmap.org
>>>> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>>>>
>>>
>>>
>>> --Cory Lueninghoener
>>> Perl, C, & Linux Hacker
>>> http://www.wirelesscouch.net/~cluening/
>>
>>
>>
>
>
>





More information about the dev mailing list