[Geocoding] Agenda
Brian Quinion
openstreetmap at brian.quinion.co.uk
Tue Jul 14 01:09:44 BST 2009
Hi,
> http://apis.dev.openstreetmap.org/~twain/
First of all - I've put this (hopefully!) back to a working state so
you can have a play if you wish. UK data only at the moment and the
whole thing is slow because I'm doing a full planet import to another
database. The fact that this causes performance problems is obviously
an issue in its own right.
> It might perhaps be helpful if David and Brian could each give a brief
> summary of where they think the two current codesbases are, and how they
> think we should best move forward.
Possibly it is best to start with a bit of background. Ideally I'd
have done this over a pint at SotM if only I'd been there, but here we
go...
I originally started working on this because I'd build a very basic
geocoder for my work based on php and postgresql. I was aware that
there where some performance problems with namefinder and approached
various people on IRC to see if I could take a stab at the problem. I
had a look at namefinder, and Tom let me have a copy of his original
gazetteer and of the two Tom's code seemed to match most closely with
what I'd already written, plus I liked that the import routine would
be written in C for performance.
By the hack weekend I had something that seemed to work for the UK and
was allowed to use a spare server to try implementing it for the whole
planet.
Unfortunately it turned out that my original technique didn't scale
very well (putting it mildly) and I've ended up rewriting it
completely over the last month.
The current codebase consists of:
a postgresql import modules for osm2psql
a postgresql module with a c helper function
a set of plpgsql database functions and triggers that handle indexing
a short php script that performs the search queries and returns the results
The code has support for simple text queries, 'near' queries, house
level addressing (including interpolation from number ranges), various
levels of postcode to handle the different countries standards and
special support of interpolating unknown UK postcodes. It can handle
the various name:en, name:fr, etc. standards used in OSM and return
address strings based on the browser accept-language settings. There
is support alt names, common names and similar.
Search performance is generally fairly good with queries taking around
0.01 to 0.1 of a second databased side with some addition time spent
presenting the data using php. Index generation performance is
dreadful, and it can take up to 5 days to reindex a full planet file.
The UK is processed in around 6 hours. I have some ideas on how to
improve this but recently decide to stop fiddling with the code and
try and get something actually finished even if it isn't perfect.
Although the UK version has had some testing as yet the full planet
version hasn't been tested at all. I was hoping to make a test
version available some time early next week and get some feedback on
how well it works for international addresses although I'll happily
put this on hold if needed until we have worked out where we are
going.
Look forward to hearing from everyone else.
Cheers,
--
Brian
More information about the Geocoding
mailing list