Hi,<br><br><div class="gmail_quote">On Thu, Mar 29, 2012 at 8:21 AM, Ian Dees <span dir="ltr"><<a href="mailto:ian.dees@gmail.com">ian.dees@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="gmail_quote"><div class="im">On Thu, Mar 29, 2012 at 9:10 AM, Josh Doe <span dir="ltr"><<a href="mailto:josh@joshdoe.com" target="_blank">josh@joshdoe.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>On Thu, Mar 29, 2012 at 9:45 AM, Ian Dees <<a href="mailto:ian.dees@gmail.com" target="_blank">ian.dees@gmail.com</a>> wrote:<br>
> After loading Cook County TIGER road features and OSM linear features into<br>
> PostGIS, I ran a simple query to find how well the roads matched:<br>
><br>
> SELECT <a href="http://a.name" target="_blank">a.name</a>, b.fullname, ST_HausdorffDistance(a.geom, b.geom) as dist<br>
> FROM cook_tiger a, cook_osm b<br>
> WHERE (a.geom && b.geom) AND ST_HausdorffDistance(a.geom, b.geom) <<br>
> 0.0005<br>
> LIMIT 50<br>
><br>
> This returned results that made sense (the names matched in all 50 results).<br>
><br>
> I removed the LIMIT clause and let it run before going to work to see how<br>
> many of the TIGER records match existing OSM features.<br>
><br>
> Next up is building a table of TIGER -> OSM matches and using that to find<br>
> TIGER rows that don't have a corresponding OSM feature.<br>
><br>
> If anyone has any ideas for speeding this up I'd love to hear it. It took<br>
> well over a couple hours to run one county. There are a lot of counties in<br>
> the US.<br>
<br>
</div>Very cool! To speed this up perhaps try limiting the number of times<br>
ST_HausdorffDistance is executed. First only run it for ways which are<br>
"close", such as falling inside a buffer, or even faster inside a<br>
bounding box. For a trivial speedup generate a table with distances<br>
first, then use the WHERE clause. However I have no idea how to form<br>
such queries!</blockquote><div><br></div></div><div>The bounds overlap check (a.geom && b.geom) speeds things up drastically, but because Cook County contains Chicago (which is very road-dense), I imagine there are tons of HausdorffDistance calls that don't need to happen. If I thought I was going to run this tons of times I could generate a table of all possible hausdorff distances, but there would be a lot of rows (if I remember my high school stats, it would be len(cook_tiger) * len(cook_osm) rows).</div>
<div><br></div><div>I may try switching to one of PostGIS's "overlap" or "touching" calls to limit the number of calls even more, but I think I'd miss lots of possible matches that way (if the roads are offset enough to not ever touch).</div>
</div>
<br>_______________________________________________<br>
Talk-us mailing list<br>
<a href="mailto:Talk-us@openstreetmap.org">Talk-us@openstreetmap.org</a><br>
<a href="http://lists.openstreetmap.org/listinfo/talk-us" target="_blank">http://lists.openstreetmap.org/listinfo/talk-us</a><br>
<br></blockquote></div><br>I'm going to look at this same problem for Salt Lake County just to see if any different issues arise for a different geography, and hope to provide some more input soon.<br><br>Martijn<br clear="all">
<br>-- <br>martijn van exel<br>geospatial omnivore<br>1109 1st ave #2<br>salt lake city, ut 84103<br>801-550-5815<br><a href="http://oegeo.wordpress.com" target="_blank">http://oegeo.wordpress.com</a><br>