[Talk-us] Tiger US address importing
Jim Brown
jim at cloudmade.com
Fri Sep 18 10:13:56 BST 2009
I think getting the address interpolation lines in from 2008 (rather than 2009) makes sense because it will match the geometry of unchanged lines currently in the DB (about 89% of the Tiger road data has not been edited as of now, it was 95% about 6 months ago)...
We could take the lines from 2009, but then matching the roads they go with could be more difficult... as a background to work on the 2008 might be better and eaiser to match up. Unless we are thinking of doing a full road update, then I think we should view this as completing the import of 2008 by getting the addressing interpolation lines in where possible.
If we associate the interpolation lines with the roads they apply to, then when they are then edited in the future the addressing data is updated. This is particularly important because we see a lot of arial imagry editing of Tiger data (lots of people fixing roads and connecting the disconnected tiger road segments). If this is how people are editing areas then it is very difficult for them to get address data with it. So it would be cool to get the address lines in with the data so its geometry is corrected along with the roads geometry.
As for detecting which roads to import, one approach is:
1. Look at the last edit data and user id of the osm data, if they are all still from the import then they are original and are eligible for importing interpolation lines around. We could then filter furhter out any lines that have addressing data on them already, or meet other criteria (road type, proximity to well edited areas etc). We should be as cautious as possible here and I think we will still hit most of the 89% yet to be edited.
2. On these eligible lines, If the tiger tags are still present, then we can use them to fetch the interpolation lines from Tiger. However, if they are not present for a significnat portion of the eligible roads then we can do a geometry match (we have planet line tables at CloudMade in our data warehouse so we could do this...). If we are going to match on geometry however, we might just want to do this from the start against a load of the interpolation lines to see which are valid and skip step 1 (not sure)...
3. In the end we produce the list of interpolation lines to import and associate.
I think the idea here is exactly the same as the Tiger import.... It would be to give a back ground data set for people to work against and as the tiger roads are fixed, the addressing data would be fixed at the same time...
We would also have a useful set of US addresses in OSM to work with that would match the Tiger data and evolve with the Tiger data. As we know, showing data like addressing in routing, maps and geocoding will encourage people to get in and fix it :) ...
As for your point about county data, I agree completely... we should also be importing county data sources as they come available and can be validated. Tools that help do this are important. I think that the tiger data is a valuable backdrop to do this against.
Jim
Aschiell wrote>>
From: Apollinaris Schoell [mailto:aschoell at gmail.com]
Sent: 17 September 2009 19:26
To: Jim Brown
Cc: talk-us at openstreetmap.org; tbook at libero.it; Matt Amos
Subject: Re: [Talk-us] Tiger US address importing
definitely something which should be continued. but it's not trivial.
some considerations
- 2009 update for tiger date should come soon. and hopefully it has less errors. makes sense to wait for it
- in areas where data hasn't been touched it's also very likely the areas which are of bad tiger data. adding more broken data doesn't help osm at all.
- areas which haven't been touched are most likely areas no one is interested in. why clutter the database with more broken data.
- how do you check that tiger data was not changed? You can't take the version of a way and assume there was no change in geometry there is an ongoing discussion about deep history and this is a non trivial problem. since all tiger nodes are cleaned of their obsolete tags any tiger node will be pushed to version 2 or more. You can not rely on version number. you must verify the position itself against an old history dump.
From: Apollinaris Schoell [mailto:aschoell at gmail.com]
Sent: 17 September 2009 21:04
To: Richard Shank
Cc: Jim Brown; tbook at libero.it; Matt Amos; talk-us at openstreetmap.org
Subject: Re: [Talk-us] Tiger US address importing
I know user nmixter has started to to compile a list for california for free county gis data. Can't connect to the wiki right now. But easy to find from the California page.
On Thu, Sep 17, 2009 at 11:32 AM, Richard Shank <develop at zestic.com<mailto:develop at zestic.com>> wrote:
Apollinaris Schoell wrote:
- more and more counties make their data available for the public. I hope such data is authoritative and much more useful. wherever possible this should be used instead.
Is there a complied list of these counties? Since everything is handled county by county anyway, this may be the place to start.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-us/attachments/20090918/7f0ba0a2/attachment.html>
More information about the Talk-us
mailing list