[Talk-us] Using TIGER to find missing road segments in OSM after license change

Martijn van Exel m at rtijn.org
Thu Mar 29 15:49:04 BST 2012


On Thu, Mar 29, 2012 at 8:21 AM, Ian Dees <ian.dees at gmail.com> wrote:

> On Thu, Mar 29, 2012 at 9:10 AM, Josh Doe <josh at joshdoe.com> wrote:
>> On Thu, Mar 29, 2012 at 9:45 AM, Ian Dees <ian.dees at gmail.com> wrote:
>> > After loading Cook County TIGER road features and OSM linear features
>> into
>> > PostGIS, I ran a simple query to find how well the roads matched:
>> >
>> > SELECT a.name, b.fullname, ST_HausdorffDistance(a.geom, b.geom) as dist
>> >      FROM cook_tiger a, cook_osm b
>> >      WHERE (a.geom && b.geom) AND ST_HausdorffDistance(a.geom, b.geom) <
>> > 0.0005
>> >      LIMIT 50
>> >
>> > This returned results that made sense (the names matched in all 50
>> results).
>> >
>> > I removed the LIMIT clause and let it run before going to work to see
>> how
>> > many of the TIGER records match existing OSM features.
>> >
>> > Next up is building a table of TIGER -> OSM matches and using that to
>> find
>> > TIGER rows that don't have a corresponding OSM feature.
>> >
>> > If anyone has any ideas for speeding this up I'd love to hear it. It
>> took
>> > well over a couple hours to run one county. There are a lot of counties
>> in
>> > the US.
>> Very cool! To speed this up perhaps try limiting the number of times
>> ST_HausdorffDistance is executed. First only run it for ways which are
>> "close", such as falling inside a buffer, or even faster inside a
>> bounding box. For a trivial speedup generate a table with distances
>> first, then use the WHERE clause. However I have no idea how to form
>> such queries!
> The bounds overlap check (a.geom && b.geom) speeds things up drastically,
> but because Cook County contains Chicago (which is very road-dense), I
> imagine there are tons of HausdorffDistance calls that don't need to
> happen. If I thought I was going to run this tons of times I could generate
> a table of all possible hausdorff distances, but there would be a lot of
> rows (if I remember my high school stats, it would be len(cook_tiger) *
> len(cook_osm) rows).
> I may try switching to one of PostGIS's "overlap" or "touching" calls to
> limit the number of calls even more, but I think I'd miss lots of possible
> matches that way (if the roads are offset enough to not ever touch).
> _______________________________________________
> Talk-us mailing list
> Talk-us at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/talk-us
I'm going to look at this same problem for Salt Lake County just to see if
any different issues arise for a different geography, and hope to provide
some more input soon.


martijn van exel
geospatial omnivore
1109 1st ave #2
salt lake city, ut 84103
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-us/attachments/20120329/1794a377/attachment.html>

More information about the Talk-us mailing list