[Talk-us] Using TIGER to find missing road segments in OSM after license change

Ian Dees ian.dees at gmail.com
Thu Mar 29 15:21:15 BST 2012


On Thu, Mar 29, 2012 at 9:10 AM, Josh Doe <josh at joshdoe.com> wrote:

> On Thu, Mar 29, 2012 at 9:45 AM, Ian Dees <ian.dees at gmail.com> wrote:
> > After loading Cook County TIGER road features and OSM linear features
> into
> > PostGIS, I ran a simple query to find how well the roads matched:
> >
> > SELECT a.name, b.fullname, ST_HausdorffDistance(a.geom, b.geom) as dist
> >      FROM cook_tiger a, cook_osm b
> >      WHERE (a.geom && b.geom) AND ST_HausdorffDistance(a.geom, b.geom) <
> > 0.0005
> >      LIMIT 50
> >
> > This returned results that made sense (the names matched in all 50
> results).
> >
> > I removed the LIMIT clause and let it run before going to work to see how
> > many of the TIGER records match existing OSM features.
> >
> > Next up is building a table of TIGER -> OSM matches and using that to
> find
> > TIGER rows that don't have a corresponding OSM feature.
> >
> > If anyone has any ideas for speeding this up I'd love to hear it. It took
> > well over a couple hours to run one county. There are a lot of counties
> in
> > the US.
>
> Very cool! To speed this up perhaps try limiting the number of times
> ST_HausdorffDistance is executed. First only run it for ways which are
> "close", such as falling inside a buffer, or even faster inside a
> bounding box. For a trivial speedup generate a table with distances
> first, then use the WHERE clause. However I have no idea how to form
> such queries!


The bounds overlap check (a.geom && b.geom) speeds things up drastically,
but because Cook County contains Chicago (which is very road-dense), I
imagine there are tons of HausdorffDistance calls that don't need to
happen. If I thought I was going to run this tons of times I could generate
a table of all possible hausdorff distances, but there would be a lot of
rows (if I remember my high school stats, it would be len(cook_tiger) *
len(cook_osm) rows).

I may try switching to one of PostGIS's "overlap" or "touching" calls to
limit the number of calls even more, but I think I'd miss lots of possible
matches that way (if the roads are offset enough to not ever touch).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-us/attachments/20120329/f144260b/attachment.html>


More information about the Talk-us mailing list