[Geocoding] Support to stop words?

Sarah Hoffmann lonvia at denofr.de
Mon Apr 10 19:13:08 UTC 2017


Hi,

On Mon, Apr 10, 2017 at 01:30:48PM -0300, Vitor George wrote:
> Hi Sarah,
> 
> Thank you very much for the response.
> 
> Another related question, does this abbreviation wiki page is been used as
> a data source in Nominatim?
> 
> https://wiki.openstreetmap.org/wiki/Name_finder:Abbreviations
> 
> The text says it does but I couldn't find any reference to it in the docs
> at the Github repository.

There are a few abbreviations in use, see
https://github.com/openstreetmap/Nominatim/blob/master/module/tokenstringreplacements.inc

But I haven't update the list in a long time for pretty much the
same reason: when used without care they can really cause havoc.
For example, French 'rue' (street) and English 'river' both have
'r' as abbreviation. There have been cases where rivers showed up
in the results when somebody was searching for a street in France.

Kind regards

Sarah

> On Wed, Apr 5, 2017 at 4:42 PM, Sarah Hoffmann <lonvia at denofr.de> wrote:
> 
> > Hi,
> >
> > On Wed, Apr 05, 2017 at 08:37:52AM -0300, Vitor George wrote:
> > > ​When searching​
> > >  "Biblioteca Prestes Maia" [1]
> > > ​the ​
> > > best result should be "Biblioteca Prefeito Prestes Maia" [2], instead the
> > > results are
> > > ​ other *bibliotecas* (libraries) with different names. This affects very
> > > much the usability of Nominatim in Portuguese.
> > >
> > > Is there support to stop words? Could Nominatim be using PostgreSQL full
> > > text search?
> >
> > Nominatim has limited support for searching for partial words and also
> > for a few stop words. However, the latter is very difficult to implement
> > in a system that has to work with arbirary languages. As it happens,
> > search in your example trips over stop words. What happens is this:
> >
> > 'en' is marked as a stop word in Nominatim because it is 'and' in
> > some languages. Stop words are handled in Nominatim by removing them
> > completely from search terms and queries. Nominatim can also handle
> > so-called special phrases which are used for POI search. These allow
> > you to enter phrases like 'restaurant near Trafalgar Square' or
> > 'supermarket in Berlin'. One of the Spanish special phrases is
> > 'biblioteca in'. Because of the stop word deletion that gets shortend
> > to 'biblioteca'. So, one of the interpretations of your search
> > query above becomes 'find me a library in Prestes Maia'. And that's
> > the results you see.
> >
> > To resolve this, the stop word handling in Nominatim needs to be
> > changed to not unconditionally delete stop words but leave them
> > in where they are essential. This is unfortunately not a simple
> > change and requires rewriting some of the fundamentals of how
> > query normalisation works. Until it's done I'm rather reluctant
> > to add new stop words or special terms.
> >
> > Kind regards
> >
> > Sarah
> >
> >



More information about the Geocoding mailing list