[OSM-talk] TIGER 101

Fri Dec 1 14:40:20 GMT 2006

Robert (Jamie) Munro wrote:
>Sent: 01 December 2006 2:22 PM
>To: David Groom; talk at openstreetmap.org
>Subject: Re: [OSM-talk] TIGER 101
>
>David Groom wrote:
>>
>>Robert(Jamie) Munro wrote:
>>> Later we could refine the script to use some sort of heuristic to
>>> determine the best threshold for an area, based on the amount of data
>>> around them, for example. So if there's two nodes within half a mile and
>>> no other nodes for 100 miles around, merge the 2 nodes. If we run this
>>> new script, as long as it doesn't produce false positives, it doesn't
>>> matter if the end result isn't perfect. People can fix it later.
>>>
>>> I think what is important is that we get all of Tiger into the system
>>> roughly so that people can then work on improving it.
>>
>> I'd have to disagree with the above point.  Certainly in cities laid out
>on
>> a standard gridiron block pattern it seems to me there will be four times
>as
>> many nodes as needed, and with each segment being a way, an indeterminate
>> multiplication in the number of ways.  The data storage requirements,
>> together with the processing requirements of any planet.osm would seem to
>me
>> to be enough reason to stop the TIGER import.
>>
>> Even if you look at TIGER data that is in OSM as a way of  source of
>> infomation from which to create "proper" OSM data, we currently don't
>have
>> the tools to effectively do this.
>
>Yes, but we could write them, and should write them, not only for Tiger
>but for other badly entered manual data. Storage really isn't a big
>issue. The storage for Tiger can only be a fraction of storing GPS
>points for a similar area if we were to gather them at the kind of
>resolution needed to derive a map as good as the tiger map.
>
>Also, you ignored my last point, which is probably the most important:
>
>>> Otherwise we are
>>> going to run into problems where someone has manually added a bunch of
>>> roads, and Tiger wants to add the same roads. Unless we write something
>>> very clever, we are going to get all the roads twice, and we won't even
>>> be sure which is the most accurate.
>
>We need to get the data in, so that people can see it, then we can clean
>it, either manually, assisted (with tools like the new automatic
>waymaker for JOSM - we could have an automatic node-merger for josm for
>example), or fully automatically by a bot.
>

The right thing to do is take stock. We kicked of the TIGER import
originally to deal with the concern about duplication of data with time and
so it is important we get it sorted soon. But on the other hand if we load
in a pile of data which cannot reasonably be further edited then it's a
waste of effort. Much better to take stock, rewrite the import tools and any
cleaning up processes and then run again. The amount of duplicate data
that's going to occur in a few weeks say is going to be trivial compared
with the volume of data that does need to do its job first time around.

Cheers

Andy Robinson
Andy_J_Robinson at blueyonder.co.uk 

>Robert (Jamie) Munro
>
>