[OSM-dev] OSM/US & TIGER (was NetSquared ominationN)

Mon Apr 23 20:46:45 BST 2007

Right,
I understand that these guys will tell me about how they technically
changed the data, my concern is what kind of data osm wants to store,
and what decisions the people running osm want to make about the data?
Is the right answer to store street numbers for both sides? or is it to
just go with what's there and take the lowest from either side and the
highest from either side and lose data. Am I even correct about how osm
stores data? The community seems to make decisions about what data is
stored, and being a junior member of the community I'm just asking what
other people think about the data, not technical issues.

On Mon, 2007-04-23 at 15:27 -0400, Andrew Turner wrote:
> I was thinking that you might direct these questions to the devs
> behind GraphServer to see how they've addressed them. They're actually
> building routing (another need OSM has been looking at) and so I would
> assume has dealt somewhat with addresses & long-ways.
> 
> They then get this into a PostGIS db to then do whatever with (e.g.
> dump out to OSM using the existing PostGIS output scripts).
> 
> 
> 
> On 4/23/07, Don Smith <dcsmith at gmail.com> wrote:
> > That's great!!!! My only question is what decisions do they make with
> > the data. I really think a discussion should be had, with someone who is
> > very familliar with how osm is storing various pieces of data, and
> > discuss what to do with tiger's data.
> >
> > Again a few questions to start things off:
> > Tiger stores street number beginning and end for each side of the
> > street. The doc I saw for openstreetmap said it was one range for each.
> > Same for zip codes. Also,how should very long interstates be entered?
> > Should they be one long way? Someone said that would be a problem? Well
> > doesn't that cause a problem using the data for routing if it's broken
> > up?
> >
> > Etc....
> > I don't want to be an astronaut with these questions, I'd just like
> > someone (or ones)much more familliar with the osm format to make a
> > decree.
> >
> > Don Smith
> >
> > On Mon, 2007-04-23 at 14:55 -0400, Andrew Turner wrote:
> > > I haven't heard of this before, but very pertinent to this discussion:
> > >
> > > http://graphserver.sourceforge.net/
> > >
> > > "Graphserver is a webservice server providing shortest-path
> > > itineraries on large graphs. Graphserver currently comes packaged with
> > > scripts to load TIGER/line road maps, and transit data in the Google
> > > Transit Feed Specification format, though grapsherver is by no means
> > > limited to these formats."
> > >
> > > Looks like there is a chance the devs could be at WhereCamp as well.
> > > What do you think? Anyone used this before?
> > >
> > > Andrew
> > >
> > > On 4/23/07, Mikel Maron <mikel_maron at yahoo.com> wrote:
> > > > Getting Tiger into OSM seems like an excellent hacking activity for WhereCamp in June
> > > >
> > > > http://wherecamp.pbwiki.com/WhereCampSF
> > > >
> > > > ----- Original Message ----
> > > > From: Andy Robinson <Andy_J_Robinson at blueyonder.co.uk>
> > > > To: Don Smith <dcsmith at gmail.com>; Andrew Turner <ajturner at highearthorbit.com>
> > > > Cc: 80n <80n80n at gmail.com>; SteveC <steve at asklater.com>; Mikel Maron <mikel_maron at yahoo.com>; Dev mail list <dev at openstreetmap.org>
> > > > Sent: Friday, April 20, 2007 11:31:33 PM
> > > > Subject: RE: NetSquared ominationN
> > > >
> > > > Don,
> > > >
> > > > Neat, all sounds good.
> > > >
> > > > Has anyone had any thoughts yet on what needs to be done to shift the data
> > > > to fit the OSM schema, ie the reuse of common nodes (segment end points).
> > > >
> > > > Cheers
> > > >
> > > > Andy
> > > >
> > > > Andy Robinson
> > > > Andy_J_Robinson at blueyonder.co.uk
> > > >
> > > > >-----Original Message-----
> > > > >From: Don Smith [mailto:dcsmith at gmail.com]
> > > > >Sent: 20 April 2007 10:35 PM
> > > > >To: Andrew Turner
> > > > >Cc: 80n; Andy Robinson; SteveC; Mikel Maron; Dev mail list
> > > > >Subject: Re: NetSquared ominationN
> > > > >
> > > > >I'm still working out the exact approach and learning python as I do it.
> > > > >The data is not a 100% match as, for example, tiger specifies zip code
> > > > >to left, and zip code to right as opposed to a zip code for a line. Also
> > > > >while segments have only two points (a begin lon/lat, and an end) where
> > > > >the rest is filled in by points along the segment, there appear to be
> > > > >multiple segments for the same road(I'm unsure why this is?).
> > > > >
> > > > >Also, someone said that long ways would not be a good idea, why is this?
> > > > >
> > > > >Finally as to the requirements, the tiger files I believer are around
> > > > >4gig zipped, and since they're text, they zip very well. For a 3.3M zip
> > > > >file I get about 28M unzipped. When loaded into the initial(NON OSM
> > > > >SCHEMA) database, however, I get about 2-3M (Converting strings to
> > > > >numbers, trimming out irrelevant stuff). So spacewise I would probably
> > > > >say something like 75G free if we are going to unzip everything all at
> > > > >once and then load, alternatively we could have the program unzip, load,
> > > > >and then remove the extracted files.
> > > > >This would get the data into a holding database, which will make it much
> > > > >easier to deal with.
> > > > >I honestly don't see this as a big bang process, and believe that the
> > > > >data will have a few surprises in it (I've already found roads with no
> > > > >names that I haven't figured out why they're there yet).
> > > > >To me the most important first step is to get all road data (And only
> > > > >road data (with street numbers and zipcodes)) into holding tables mysql.
> > > > >This should be okay, however tiger has a provision that the primary
> > > > >description is of the line's most prominent feature, so there's a chance
> > > > >that this might miss something, but I don't think so. I believe that
> > > > >doing this will allow us to tinker with the data until we're sure it's
> > > > >ok, and maybe do something like load all interstates to production
> > > > >first.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >On Thu, 2007-04-19 at 07:52 -0700, Andrew Turner wrote:
> > > > >> Agreed that the project didn't seem 'noble' enough since it wasn't
> > > > >> mapping Africa or otherwise. Interesting that "Maps 2.0' got picked.
> > > > >> Maybe they have money now to offer OSM ;)
> > > > >>
> > > > >> I do agree the project for US kickstart is still needed. Timeframe
> > > > >> wise would be to put together the plan and the beginning of the import
> > > > >> code and then go around Where/WhereCamp invigorating the US geo
> > > > >> community to make it happen.
> > > > >>
> > > > >> With regards to boxxen, Anselm may have space - I can ping him. What
> > > > >> are the expected requirements, based on existing h/w used? Otherwise,
> > > > >> an option may be to do a US donation drive to build up the small?
> > > > >> amount - good hosting is only like $30-50/month.
> > > > >>
> > > > >> I'm still in CA at Loc Int & web2.0 and can gather more thoughts after
> > > > >> I get back.
> > > > >> Andrew
> > > > >>
> > > > >> On 4/19/07, Don Smith <dcsmith at gmail.com> wrote:
> > > > >> > So,
> > > > >> > The last place this was left was that we were going to load the
> > > > >> > tiger/line data into mysql, then run some stored procedures to get the
> > > > >> > data into osm format.
> > > > >> > There was going to be a dedicated machine for the project, but I have
> > > > >no
> > > > >> > idea where that ended up. I switched my desktop with the idea if the
> > > > >> > machine wasn't ready I'd start development, but haven't gotten around
> > > > >to
> > > > >> > it yet. I guess my task will be to layout the tables, and write a
> > > > >loader
> > > > >> > in python. Once that's done, the rest should be the writing of a few
> > > > >> > stored procedures to convert it to osm data (Which I haven't looked at
> > > > >> > closely yet).
> > > > >> >
> > > > >> > Don Smith
> > > > >> >
> > > > >> > On Thu, 2007-04-19 at 09:25 +0100, 80n wrote:
> > > > >> > > Well, we weren't one of the winning 20.  But hopefully the project
> > > > >> > > will have got some exposure from this process anyway.  Anyway, I'd
> > > > >> > > like to thank everybody for all the time and work they put into the
> > > > >> > > proposal.
> > > > >> > >
> > > > >> > > I think the main problem with our proposal was that it was focussed
> > > > >on
> > > > >> > > helping the project succeed in the US.  If we'd have made a pitch for
> > > > >> > > creating a free map of some small village in Africa it would have
> > > > >come
> > > > >> > > over better.  Anyone else have any thoughts or observations, in
> > > > >> > > retrospect, about what we did wrong and what we could do better next
> > > > >> > > time?
> > > > >> > >
> > > > >> > > The orginal goal of kick-starting OSM in the US still exists.  Does
> > > > >> > > anyone have any other ideas or strategies that we can try?  The
> > > > >> > > TIGER/Line data still needs to be dealt with and will continue to be
> > > > >> > > one of the obstacles until we address it.  Don, I think you were
> > > > >> > > planning to work on this, what would help you to get it done?
> > > > >> > >
> > > > >> > > 80n
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >>
> > > > >>
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
> 
>