[OSM-dev] Simplifying data for routing

Sat Jul 28 15:55:23 BST 2007

Hopefully this doesn't turn into a spin-off of the "dropping  
segments" discussion. :) I'm trying to work wit the data model as it  
is now, and will refine for changes if or when they happen.

Since writing my last post, I've tweaked my GPS navigation app to use  
SQL geometric data types. I'm now storing everything as POIs (single  
points not connected to segments and not tagged as a road), paths  
(all segments on a way combined into a single linestring) and areas  
(which I don't support but which is just short-hand for closed ways  
so I can quickly determine all the contained locations a point is in.)

I've been reading about shortest path algorithms, and as they all  
seem to work on graphs, I've been thinking about the graph of  
unprocessed OSM data and wondering how much post-processing I need to  
do to make routing possible. I'm better at grokking concepts than  
code, which is why I've not yet looked at the recently-announced JOSM  
navigator plugin.

Since I'm dropping most of the OSM nodes, I'm thinking that the  
vertices on the graph should be intersections. Part of my current  
post-processing involves doing a brute-force check of any newly-saved  
Path for any that intersect it. I then create an intersection feature  
as a point, linking to all the "branches." As such, I can now find  
and describe nearby intersections in terms of how many "ways" they  
have and render that description as something like "5-way  
intersection: E 51st St. with Duval St. and Guadalupe St."

I have several questions. First, is using intersections for the  
vertices of the routing graph the right way (no pun intended) to do  
it? I thought about using nodes, but if I understand the model,  
there's no reason to expect nodes to be placed at intersections.

Second, I'm still trying to grok a*/Dijkstra algorithms for shortest  
path. Seems like several routing apps I've checked out prefer  
Dijkstra, but is there any reason for picking it over a*? My  
understanding is that a* stops investigating paths that aren't  
meaningful, so wouldn't move toward a point to the north by going  
south. What am I missing? Is a* impractical for some reason?

Third, I'm not quite sure how costly these algorithms are, and  
whether it would be better to calculate intersections on the fly or  
do so ahead of time. Ahead of time seems to make the import take much  
longer (the TIGER data for Travis County, just over 50 megs, takes  
about an hour for me to import sans intersection calculation and  
clocks in at just over 20 megs in the database, while intersection  
calculation adds another 10 megs and 2 hours.)

Finally, can anyone suggest any database optimizations I might make  
to help PostgreSQL cope with a bunch of inserts? I'm using Ruby/ 
ActiveRecord, so there really isn't a way to dump a bunch of SQL  
commands and do an import that way. I'd actually like to pre- 
calculate the intersection data as it makes finding intersections  
much easier and more syntactically beautiful (Intersection.find 
(:all, :conditions => "distance(geometry, '#{self.geometry.as_wkt}')  
< 0.0003") vs. a larger function finding all nearby paths, checking  
intersection status, creating an intersection geometry then checking  
again to handle intersections with more than 4 ways.) I just don't  
want users spending three hours importing data first, and since my  
initial target was a high-end embedded system (though I'm starting to  
think I'll need a tiny subnotebook now) smaller storage requirements  
are key. It seems like I should be able to optimize *something* out  
of PostgreSQL for this type of environment since it's only being used  
by one user and for a specific domain, but I can't find what. Or  
would MySQL be a better fit for a lower-spec environment? It'd  
definitely simplify installation, not needing to install postgis, but  
I couldn't get 5.0 to support the distance function. Is it that  
"distance" is under a different name, or is it truly not supported?

Thanks a bunch.