[OSM-dev] Effort in the US wasted until TIGER import is complete?

Wed Mar 21 10:32:33 GMT 2007

Thomas,

The issue that caused the data to be deleted was that each pair of nodes for
each TIGER segment was placed in the database as unique ID's. As a result
there was no connection between adjacent segments making up a street. That
is, there would be two nodes at the end of two adjacent segments rather than
one common node.

The reason that was given why this was the way the original script imported
the data was that the TIGER data is not that precise and therefore there was
a question as to how the common node position could be identified if the two
adjacent segments did not have the exact same end co-ordinates. It was also
suggested that the error of position for the supposedly common data in TIGER
varies depending on whether you were in a rural or urban location.

Before the data was deleted I took a very close look at several places where
data had been imported. I could not find a discernable difference in
duplicate node position.

Therefore probably the best approach to take is to modify the original
import script so that it looks for and reuses the closest existing node (if
one exists) lying within say a 1 metre radius of the required new node
position. That should cover the vast majority of the data without too many
issues.

It really needs a coder to revisit the original script and modify it to
reuse nodes that have already been created by the import. I also believe
that it should also seek to join up the adjacent segments that carry the
same tags into ways, the original import did not create whole street ways.
Obviously for very long streets it would be beneficial to limit the length
of any individual way.

There should also be some more consideration as to what tag data should be
imported. As far as I recall, the original import covered the street name
and the start zip and end zip for each segment. Is there other data in the
TIGER set that would be useful to import at the same time?

Once a modified script is ready then a small set of sample county TIGER data
needs to be imported, both for an urban location and then for a rural one.
This should iron out the remaining issues. Picking a couple of counties with
the smallest data set should enable an import and deletion cycle if required
without taking up too much time and resource.

For the full roll out of TIGER data there needs to be some thought given to
how the script should run. Originally the script imported I think at 1
second intervals. This was downgraded to 3 sec intervals when server load
was a problem. If we wanted to import the data quickly, then we could take a
tiles at home type approach where counties are doled out and uploaded by
different machines running the same script. However for this approach to
work we may have to consider what platform improvements are necessary to
without making the platform too slow for other users.

The original approach on one machine at 1 sec insert cycle produces a huge
amount of data quite quickly (it by far swamped the volume of data for the
rest of the world put together in the time it ran) it will still take a
considerable time to import everything, many many months rather than weeks.

Obviously all of this needs to acknowledge that some US users have already
contributed their own data and we would not want to see their hard work
wasted or corrupted. Imports to counties where data already exists should
automatically fall over until the users can be contacted so that a community
decision can be made as to whether the county will be initially mapped from
TIGER data or developed from GPS & aerial imagery data.

Cheers

Andy

Andy Robinson
Andy_J_Robinson at blueyonder.co.uk 

>-----Original Message-----
>From: dev-bounces at openstreetmap.org [mailto:dev-bounces at openstreetmap.org]
>On Behalf Of Thomas Lunde
>Sent: 21 March 2007 12:09 AM
>To: dev at openstreetmap.org
>Subject: [OSM-dev] Effort in the US wasted until TIGER import is complete?
>
>Hello -
>
>The TIGER page [1] indicates that the initial effort was removed from
>the database because of data corruption.
>
>Other postings seem to indicate that a new TIGER import will overwrite
>existing US data.[2]
>
>The last message on the mailing lists I can find about the TIGER
>import is from Jan 16, 2007 and indicates that the import is on an
>indefinite hold.[3]
>
>The status page[4] seems to confirm this, as it shows the last header as:
>TIGER -> OSM Import Status
>(as of Thu Nov 30 14:47:41 +0000 2006)
>
>
>Is there anything a US-oriented would-be OSM participant can do to
>help with the import?  Is the project stalled by a lack of CPU, disk,
>person-hours for coding, or something else?
>
>
>Thanks for any pointers you can provide to ways that I can help.
>
>Thomas
>
>
>
>
>[1] http://wiki.openstreetmap.org/index.php/Tiger
>
>[2]  Sorry, I can't find this again... I think it was on the Talk list.
>
>[3]  http://lists.openstreetmap.org/pipermail/talk/2007-January/010246.html
>
>[4] http://svn.openstreetmap.org/utils/tiger_import/status
>
>_______________________________________________
>dev mailing list
>dev at openstreetmap.org
>http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev

E-mail message checked by PC Tools Spyware Doctor (5.0.0.169)
Database version: 5.06900
http://www.pctools.com/spyware-doctor/