[Geocoding] Agenda

Tom Hughes tom at compton.nu
Mon Jul 13 21:45:47 BST 2009


To followup from the discussions at SOTM I thought I would post a 
summary of where we are now and what I see as the challenges we need to 
address to move forward with improving geocoding on the web site.

To start with we obviously have the current namefinder. There are a 
number of problems with that, mostly technical issues with how the 
database schema is arranged. Those issues are:

   - Because the data is held in MyISAM tables there are serious
     lock contention issues when doing updates on the same database
     that is used for searches. I believe that David plans to
     address this by using two parallel databases and flipping
     back and forth between as a short term solution.

   - Because the schema works by searching for each word in the
     query separately the resulting join basically defeats the
     MySQL optimiser and causes it to do table scans. Switching
     to PostgreSQL should help here as it can do bitmap index
     scans and use an index for each word. Searches for common
     words will still be expensive however.

Obviously there are almost certainly enhancements that could be made to 
the actual indexing and query parsing, but those are things that can be 
addressed incrementally at least to some extent.

Due to the above issues I experimented with a PostgreSQL based geocoder 
that used Postgres text indexes to allow whole phrases to be searched 
for and PostGIS geo-indexes to help resolve the location side of 
queries. That work was then taken over by Brian Quinion (twain47) who 
has done substantial work in improving it.

The current version of that work is running on one of our machines with 
a frontend on dev, but it looks like it's not actually working right 
now. When it is it can be found at:

   http://apis.dev.openstreetmap.org/~twain/

Moving forward our challenges are obviously primarily how we resolve the 
issues with the current geocoder and build on the work that David and 
Brian have done to create the best possible geocoder for the site.

It might perhaps be helpful if David and Brian could each give a brief 
summary of where they think the two current codesbases are, and how they 
think we should best move forward.

We also need to consider where the Geocommons work fits into all this 
and whether we can leverage and/or contribute to their work at all.

Basically however my message to all of you is to please talk about where 
you think we are and where you think we should be going and how we can 
get there.

Tom

-- 
Tom Hughes (tom at compton.nu)
http://www.compton.nu/




More information about the Geocoding mailing list