[Geocoding] make_standard_name() and Virginia

Brian Quinion openstreetmap at brian.quinion.co.uk
Mon May 7 15:39:59 BST 2012


On 7 May 2012 04:10, Brian DeRocher <brian at derocher.org> wrote:
> I disagree that make_standard_string() needs to work for all major
> languages.  I mean it does, but you should know the language ahead of time.
>  Since the problem of standardizing words is language related, can you first
> use the HTTP header Accept-Language to pick the language (or use geoip), and
> then standardize according to rules of that language?

I think you may be slightly missinterpretting the purpose of
make_standard_string.  It is not so much an attempt to clean up the
string as it is an attempt to generate a standard simplified search
token with as high a chance to including the required search result
and as small as possible other collisions.  It probably makes more
sense to think of it as a specialised Metaphone type algorithm.

The fact that it produces such poor tokens for the cases given is an
issue.  We really do need to do something about that since we get too
high a search space at the moment for short tokens.  It may be as
simple as adding a length constraint for some of the rules - although
the overlaps between us states and words for 'the' in other languages
is annoying!

If you are able to find a way to improve the token generator (given
these constraints) obviously it would be welcome!
--
 Brian



More information about the Geocoding mailing list