[Geocoding] Support to stop words?

Sarah Hoffmann lonvia at denofr.de
Wed Apr 5 19:42:38 UTC 2017


Hi,

On Wed, Apr 05, 2017 at 08:37:52AM -0300, Vitor George wrote:
> ​When searching​
>  "Biblioteca Prestes Maia" [1]
> ​the ​
> best result should be "Biblioteca Prefeito Prestes Maia" [2], instead the
> results are
> ​ other *bibliotecas* (libraries) with different names. This affects very
> much the usability of Nominatim in Portuguese.
> 
> Is there support to stop words? Could Nominatim be using PostgreSQL full
> text search?

Nominatim has limited support for searching for partial words and also
for a few stop words. However, the latter is very difficult to implement
in a system that has to work with arbirary languages. As it happens,
search in your example trips over stop words. What happens is this:

'en' is marked as a stop word in Nominatim because it is 'and' in
some languages. Stop words are handled in Nominatim by removing them
completely from search terms and queries. Nominatim can also handle
so-called special phrases which are used for POI search. These allow
you to enter phrases like 'restaurant near Trafalgar Square' or
'supermarket in Berlin'. One of the Spanish special phrases is
'biblioteca in'. Because of the stop word deletion that gets shortend
to 'biblioteca'. So, one of the interpretations of your search
query above becomes 'find me a library in Prestes Maia'. And that's
the results you see.

To resolve this, the stop word handling in Nominatim needs to be
changed to not unconditionally delete stop words but leave them
in where they are essential. This is unfortunately not a simple
change and requires rewriting some of the fundamentals of how
query normalisation works. Until it's done I'm rather reluctant
to add new stop words or special terms.

Kind regards

Sarah



More information about the Geocoding mailing list