[Geocoding] Agenda
Tom Hughes
tom at compton.nu
Mon Jul 13 21:45:47 BST 2009
To followup from the discussions at SOTM I thought I would post a
summary of where we are now and what I see as the challenges we need to
address to move forward with improving geocoding on the web site.
To start with we obviously have the current namefinder. There are a
number of problems with that, mostly technical issues with how the
database schema is arranged. Those issues are:
- Because the data is held in MyISAM tables there are serious
lock contention issues when doing updates on the same database
that is used for searches. I believe that David plans to
address this by using two parallel databases and flipping
back and forth between as a short term solution.
- Because the schema works by searching for each word in the
query separately the resulting join basically defeats the
MySQL optimiser and causes it to do table scans. Switching
to PostgreSQL should help here as it can do bitmap index
scans and use an index for each word. Searches for common
words will still be expensive however.
Obviously there are almost certainly enhancements that could be made to
the actual indexing and query parsing, but those are things that can be
addressed incrementally at least to some extent.
Due to the above issues I experimented with a PostgreSQL based geocoder
that used Postgres text indexes to allow whole phrases to be searched
for and PostGIS geo-indexes to help resolve the location side of
queries. That work was then taken over by Brian Quinion (twain47) who
has done substantial work in improving it.
The current version of that work is running on one of our machines with
a frontend on dev, but it looks like it's not actually working right
now. When it is it can be found at:
http://apis.dev.openstreetmap.org/~twain/
Moving forward our challenges are obviously primarily how we resolve the
issues with the current geocoder and build on the work that David and
Brian have done to create the best possible geocoder for the site.
It might perhaps be helpful if David and Brian could each give a brief
summary of where they think the two current codesbases are, and how they
think we should best move forward.
We also need to consider where the Geocommons work fits into all this
and whether we can leverage and/or contribute to their work at all.
Basically however my message to all of you is to please talk about where
you think we are and where you think we should be going and how we can
get there.
Tom
--
Tom Hughes (tom at compton.nu)
http://www.compton.nu/
More information about the Geocoding
mailing list