[OSM-dev] Anyone with a speedy gazetteer

David Earl david at frankieandshadow.com
Mon Jan 12 15:11:11 GMT 2009


On 12/01/2009 14:21, Milo van der Linden wrote:
> I strongly suggest that you read the postgresql text search[1] chapter 
> in depth. You will find that a lot of textual and multilingual 
> confusions can be solved with that function set. the name "text search" 
> is by far too simple for what it covers...

I haven't finished yet, but reading the beginning tells me that much of 
what this does is in fact what I am doing explicitly already - breaking 
up the strings into tokens, canonical forms, thesaurus type stuff, 
indexing the tokens, and so on.

As Tom says, the language issue may be a problem, but OTOH I see it can 
in principle support user defined alternatives to words, and we're 
dealing with a limited set of terms here, so there may well be mileage 
here and would massively simplify the client side - let the DB do a lot 
of what I am doing.

Not clear it would necessarily be faster either. I could spend ages 
rewriting to use it and find their implementation is no faster than 
mine. OK, not likely I admit, but the advantage of having everything SQL 
server side and optimised by their developers could be offset by the 
generality needed in their searching.

I need to read more. The big problem here is that it means a complete 
re-write, so it ain't quick to do.

David





More information about the dev mailing list