[OSM-talk] TIGER 101

David Groom reviews at pacific-rim.net
Fri Dec 1 13:56:08 GMT 2006




> Ben Gimpert wrote:
>> On Thu, 30 Nov 06 @02:49pm, Schuyler Erle wrote:
>>> * On 30-Nov-2006 at  2:18PM PST, Ben Gimpert said:
>>>> Every time you encounter a TIGER lat/long point, you'll need to do a
>>>> search across the lat/long of already-imported nodes within, say, a 
>>>> 1000
>>>> miles.  (Beware of very long, straight rural roads.)  If you find a 
>>>> node
>>>> with the same lat/long -- where "same" is some function incoporating 
>>>> the
>>>> rural-ness of the area -- then you can reuse that node.
>>> That was my point. Since TIGER/Line *is* topological by design, just
>>> like OSM, it is definitely possible to reuse nodes out of the box.
>>> More to the point, you don't need to keep track of every node in the
>>> universe, just every node in a simple TIGER/Line file. Not impossible
>>> at all.
>>
>> You're assuming that -- for example -- there are no roads that cross
>> county borders, no roads that might span more than one FIPS .RT1/2 file.
>> I doubt this is the case.
>
> Why don't we just write something that once tiger has finished downloads
> chunks of the USA through the API, looks for nodes that are closer than,
> say, 10m from each other, and merges them, replacing them with a node at
> their average point? If 10m isn't wide enough for some areas of the
> country, fine, it doesn't matter, we're still better off than not having
> run the process.
>
> Later we could refine the script to use some sort of heuristic to
> determine the best threshold for an area, based on the amount of data
> around them, for example. So if there's two nodes within half a mile and
> no other nodes for 100 miles around, merge the 2 nodes. If we run this
> new script, as long as it doesn't produce false positives, it doesn't
> matter if the end result isn't perfect. People can fix it later.
>
> I think what is important is that we get all of Tiger into the system
> roughly so that people can then work on improving it.

I'd have to disagree with the above point.  Certainly in cities laid out on 
a standard gridiron block pattern it seems to me there will be four times as 
many nodes as needed, and with each segment being a way, an indeterminate 
multiplication in the number of ways.  The data storage requirements, 
together with the processing requirements of any planet.osm would seem to me 
to be enough reason to stop the TIGER import.

Even if you look at TIGER data that is in OSM as a way of  source of 
infomation from which to create "proper" OSM data, we currently don't have 
the tools to effectively do this.

David


>Otherwise we are
> going to run into problems where someone has manually added a bunch of
> roads, and Tiger wants to add the same roads. Unless we write something
> very clever, we are going to get all the roads twice, and we won't even
> be sure which is the most accurate.
>
> Robert (Jamie) Munro 







More information about the talk mailing list