[OSM-dev] OSM/US & TIGER (was NetSquared ominationN)

Tue Apr 24 12:19:28 BST 2007

Bearing in mind our simplified model (some might even suggest its naïve ;-)
), then do we really need to know which side of the street an address is on?
While we could set the left or right side with the appropriate reference
there is a danger that corruption could easily creep in over time as the raw
data is further edited and refined by contributors. If the information on
zip code and address numbers is referenced for the beginning and end of each
TIGER-Line street segment then logically we may be better off tagging the
appropriate end nodes with the number data.

If this simplified approach were taken the for instance, if the "from" data
is 1100 and 1101 then the node would get tagged 1100;1101 or similar. The
node at the other end of the segment would carry say 1190;1191. Now this is
all fine and dandy for lone TIGER segments, but as soon as we have a side
street then we end up with an additional pair of numbers applying to the
applicable node, so perhaps we need to associate the address numbers either
with the way ID (I'm purposely ignoring OSM segments if we plan to deprecate
them at some point) or with the street name. I prefer the former because
it’s the unique identifier that all ways have, whereas not all streets in
the TIGER data necessarily have a street name. However using the way id
requires that the way is created before the node tags are created.

So, how might it look?

First perhaps we want to make a point of noting the source, so let's see
TIGER in everything. TIGER is so important we can simply adjust the standard
renders to pick up the namespace.

Which leads to something like:

For ways:-
TIGER:highway=unclassified (is there any basis on which we can automatically
assign something more than unclassified?)
TIGER:name=streetname

For nodes:
TIGER:postal_code=WID;236779;21560;21561;WID;236780;21562;21563  (the "WID"
identifies that a way ID follows in the sequence of values. Following
number(s) are the zip code(s) that apply to the node until another WID and
id number pair appear should there be more than one way attached to the
node.
TIGER:address_number= WID;236779;1900;1901;WID;236780;100;101;  (The id of
the way 

I'm sure someone can point out isses with the semicolon separated list, and
perhaps a better separator/deliniator can be suggested. Or perhaps someone
has a more elegant way of presenting this. A guess a little more care would
also be needed if any of the address number data in TIGER-Line in not
numeric.

Anyway, that’s my off-the-cuff thoughts.

Cheers

Andy

Andy Robinson
Andy_J_Robinson at blueyonder.co.uk 

>-----Original Message-----
>From: Don Smith [mailto:dcsmith at gmail.com]
>Sent: 23 April 2007 8:47 PM
>To: Andrew Turner
>Cc: Mikel Maron; Andy Robinson; 80n; SteveC; Dev mail list
>Subject: Re: OSM/US & TIGER (was NetSquared ominationN)
>
>Right,
>I understand that these guys will tell me about how they technically
>changed the data, my concern is what kind of data osm wants to store,
>and what decisions the people running osm want to make about the data?
>Is the right answer to store street numbers for both sides? or is it to
>just go with what's there and take the lowest from either side and the
>highest from either side and lose data. Am I even correct about how osm
>stores data? The community seems to make decisions about what data is
>stored, and being a junior member of the community I'm just asking what
>other people think about the data, not technical issues.
>
>On Mon, 2007-04-23 at 15:27 -0400, Andrew Turner wrote:
>> I was thinking that you might direct these questions to the devs
>> behind GraphServer to see how they've addressed them. They're actually
>> building routing (another need OSM has been looking at) and so I would
>> assume has dealt somewhat with addresses & long-ways.
>>
>> They then get this into a PostGIS db to then do whatever with (e.g.
>> dump out to OSM using the existing PostGIS output scripts).
>>
>>
>>
>> On 4/23/07, Don Smith <dcsmith at gmail.com> wrote:
>> > That's great!!!! My only question is what decisions do they make with
>> > the data. I really think a discussion should be had, with someone who
>is
>> > very familliar with how osm is storing various pieces of data, and
>> > discuss what to do with tiger's data.
>> >
>> > Again a few questions to start things off:
>> > Tiger stores street number beginning and end for each side of the
>> > street. The doc I saw for openstreetmap said it was one range for each.
>> > Same for zip codes. Also,how should very long interstates be entered?
>> > Should they be one long way? Someone said that would be a problem? Well
>> > doesn't that cause a problem using the data for routing if it's broken
>> > up?
>> >
>> > Etc....
>> > I don't want to be an astronaut with these questions, I'd just like
>> > someone (or ones)much more familliar with the osm format to make a
>> > decree.
>> >
>> > Don Smith
>> >
>> > On Mon, 2007-04-23 at 14:55 -0400, Andrew Turner wrote:
>> > > I haven't heard of this before, but very pertinent to this
>discussion:
>> > >
>> > > http://graphserver.sourceforge.net/
>> > >
>> > > "Graphserver is a webservice server providing shortest-path
>> > > itineraries on large graphs. Graphserver currently comes packaged
>with
>> > > scripts to load TIGER/line road maps, and transit data in the Google
>> > > Transit Feed Specification format, though grapsherver is by no means
>> > > limited to these formats."
>> > >
>> > > Looks like there is a chance the devs could be at WhereCamp as well.
>> > > What do you think? Anyone used this before?
>> > >
>> > > Andrew
>> > >
>> > > On 4/23/07, Mikel Maron <mikel_maron at yahoo.com> wrote:
>> > > > Getting Tiger into OSM seems like an excellent hacking activity for
>WhereCamp in June
>> > > >
>> > > > http://wherecamp.pbwiki.com/WhereCampSF
>> > > >
>> > > > ----- Original Message ----
>> > > > From: Andy Robinson <Andy_J_Robinson at blueyonder.co.uk>
>> > > > To: Don Smith <dcsmith at gmail.com>; Andrew Turner
><ajturner at highearthorbit.com>
>> > > > Cc: 80n <80n80n at gmail.com>; SteveC <steve at asklater.com>; Mikel
>Maron <mikel_maron at yahoo.com>; Dev mail list <dev at openstreetmap.org>
>> > > > Sent: Friday, April 20, 2007 11:31:33 PM
>> > > > Subject: RE: NetSquared ominationN
>> > > >
>> > > > Don,
>> > > >
>> > > > Neat, all sounds good.
>> > > >
>> > > > Has anyone had any thoughts yet on what needs to be done to shift
>the data
>> > > > to fit the OSM schema, ie the reuse of common nodes (segment end
>points).
>> > > >
>> > > > Cheers
>> > > >
>> > > > Andy
>> > > >
>> > > > Andy Robinson
>> > > > Andy_J_Robinson at blueyonder.co.uk
>> > > >
>> > > > >-----Original Message-----
>> > > > >From: Don Smith [mailto:dcsmith at gmail.com]
>> > > > >Sent: 20 April 2007 10:35 PM
>> > > > >To: Andrew Turner
>> > > > >Cc: 80n; Andy Robinson; SteveC; Mikel Maron; Dev mail list
>> > > > >Subject: Re: NetSquared ominationN
>> > > > >
>> > > > >I'm still working out the exact approach and learning python as I
>do it.
>> > > > >The data is not a 100% match as, for example, tiger specifies zip
>code
>> > > > >to left, and zip code to right as opposed to a zip code for a
>line. Also
>> > > > >while segments have only two points (a begin lon/lat, and an end)
>where
>> > > > >the rest is filled in by points along the segment, there appear to
>be
>> > > > >multiple segments for the same road(I'm unsure why this is?).
>> > > > >
>> > > > >Also, someone said that long ways would not be a good idea, why is
>this?
>> > > > >
>> > > > >Finally as to the requirements, the tiger files I believer are
>around
>> > > > >4gig zipped, and since they're text, they zip very well. For a
>3.3M zip
>> > > > >file I get about 28M unzipped. When loaded into the initial(NON
>OSM
>> > > > >SCHEMA) database, however, I get about 2-3M (Converting strings to
>> > > > >numbers, trimming out irrelevant stuff). So spacewise I would
>probably
>> > > > >say something like 75G free if we are going to unzip everything
>all at
>> > > > >once and then load, alternatively we could have the program unzip,
>load,
>> > > > >and then remove the extracted files.
>> > > > >This would get the data into a holding database, which will make
>it much
>> > > > >easier to deal with.
>> > > > >I honestly don't see this as a big bang process, and believe that
>the
>> > > > >data will have a few surprises in it (I've already found roads
>with no
>> > > > >names that I haven't figured out why they're there yet).
>> > > > >To me the most important first step is to get all road data (And
>only
>> > > > >road data (with street numbers and zipcodes)) into holding tables
>mysql.
>> > > > >This should be okay, however tiger has a provision that the
>primary
>> > > > >description is of the line's most prominent feature, so there's a
>chance
>> > > > >that this might miss something, but I don't think so. I believe
>that
>> > > > >doing this will allow us to tinker with the data until we're sure
>it's
>> > > > >ok, and maybe do something like load all interstates to production
>> > > > >first.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >On Thu, 2007-04-19 at 07:52 -0700, Andrew Turner wrote:
>> > > > >> Agreed that the project didn't seem 'noble' enough since it
>wasn't
>> > > > >> mapping Africa or otherwise. Interesting that "Maps 2.0' got
>picked.
>> > > > >> Maybe they have money now to offer OSM ;)
>> > > > >>
>> > > > >> I do agree the project for US kickstart is still needed.
>Timeframe
>> > > > >> wise would be to put together the plan and the beginning of the
>import
>> > > > >> code and then go around Where/WhereCamp invigorating the US geo
>> > > > >> community to make it happen.
>> > > > >>
>> > > > >> With regards to boxxen, Anselm may have space - I can ping him.
>What
>> > > > >> are the expected requirements, based on existing h/w used?
>Otherwise,
>> > > > >> an option may be to do a US donation drive to build up the
>small?
>> > > > >> amount - good hosting is only like $30-50/month.
>> > > > >>
>> > > > >> I'm still in CA at Loc Int & web2.0 and can gather more thoughts
>after
>> > > > >> I get back.
>> > > > >> Andrew
>> > > > >>
>> > > > >> On 4/19/07, Don Smith <dcsmith at gmail.com> wrote:
>> > > > >> > So,
>> > > > >> > The last place this was left was that we were going to load
>the
>> > > > >> > tiger/line data into mysql, then run some stored procedures to
>get the
>> > > > >> > data into osm format.
>> > > > >> > There was going to be a dedicated machine for the project, but
>I have
>> > > > >no
>> > > > >> > idea where that ended up. I switched my desktop with the idea
>if the
>> > > > >> > machine wasn't ready I'd start development, but haven't gotten
>around
>> > > > >to
>> > > > >> > it yet. I guess my task will be to layout the tables, and
>write a
>> > > > >loader
>> > > > >> > in python. Once that's done, the rest should be the writing of
>a few
>> > > > >> > stored procedures to convert it to osm data (Which I haven't
>looked at
>> > > > >> > closely yet).
>> > > > >> >
>> > > > >> > Don Smith
>> > > > >> >
>> > > > >> > On Thu, 2007-04-19 at 09:25 +0100, 80n wrote:
>> > > > >> > > Well, we weren't one of the winning 20.  But hopefully the
>project
>> > > > >> > > will have got some exposure from this process anyway.
>Anyway, I'd
>> > > > >> > > like to thank everybody for all the time and work they put
>into the
>> > > > >> > > proposal.
>> > > > >> > >
>> > > > >> > > I think the main problem with our proposal was that it was
>focussed
>> > > > >on
>> > > > >> > > helping the project succeed in the US.  If we'd have made a
>pitch for
>> > > > >> > > creating a free map of some small village in Africa it would
>have
>> > > > >come
>> > > > >> > > over better.  Anyone else have any thoughts or observations,
>in
>> > > > >> > > retrospect, about what we did wrong and what we could do
>better next
>> > > > >> > > time?
>> > > > >> > >
>> > > > >> > > The orginal goal of kick-starting OSM in the US still
>exists.  Does
>> > > > >> > > anyone have any other ideas or strategies that we can try?
>The
>> > > > >> > > TIGER/Line data still needs to be dealt with and will
>continue to be
>> > > > >> > > one of the obstacles until we address it.  Don, I think you
>were
>> > > > >> > > planning to work on this, what would help you to get it
>done?
>> > > > >> > >
>> > > > >> > > 80n
>> > > > >> > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> >
>> > > > >> >
>> > > > >>
>> > > > >>
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > >
>> > >
>> >
>> >
>>
>>