[Talk-us] Using TIGER to find missing road segments in OSM after license change

Skye Book skye.book at gmail.com
Thu Mar 29 15:51:56 BST 2012


On Mar 29, 2012, at 10:10 AM, Josh Doe wrote:

> On Thu, Mar 29, 2012 at 9:45 AM, Ian Dees <ian.dees at gmail.com> wrote:
>> After loading Cook County TIGER road features and OSM linear features into
>> PostGIS, I ran a simple query to find how well the roads matched:
>> 
>> SELECT a.name, b.fullname, ST_HausdorffDistance(a.geom, b.geom) as dist
>>     FROM cook_tiger a, cook_osm b
>>     WHERE (a.geom && b.geom) AND ST_HausdorffDistance(a.geom, b.geom) <
>> 0.0005
>>     LIMIT 50
>> 
>> This returned results that made sense (the names matched in all 50 results).
>> 
>> I removed the LIMIT clause and let it run before going to work to see how
>> many of the TIGER records match existing OSM features.
>> 
>> Next up is building a table of TIGER -> OSM matches and using that to find
>> TIGER rows that don't have a corresponding OSM feature.
>> 
>> If anyone has any ideas for speeding this up I'd love to hear it. It took
>> well over a couple hours to run one county. There are a lot of counties in
>> the US.
> 
> Very cool! To speed this up perhaps try limiting the number of times
> ST_HausdorffDistance is executed. First only run it for ways which are
> "close", such as falling inside a buffer, or even faster inside a
> bounding box. For a trivial speedup generate a table with distances
> first, then use the WHERE clause. However I have no idea how to form
> such queries!

If you're using one of the prereleases of PostGIS 2 + PostgreSQL 9.1 you can also do a nearest neighbor query (if you've build a GiST index on the coordinates).  If you haven't already built the index, however, better off just going with a bounding box function as Josh mentioned..

-Skye


More information about the Talk-us mailing list