[Geocoding] Regarding issue #967

Sarah Hoffmann lonvia at denofr.de
Wed Apr 1 06:12:43 UTC 2020


Hi Rahul,

On Wed, Apr 01, 2020 at 05:36:00AM +0530, K Rahul Reddy wrote:
> For issue #967 <https://github.com/osm-search/Nominatim/issues/967>, These
> are some points I found so far:
> 
>     In Geocode.php lookup(),
> 
> 1) The sNormQuery is made by using PHP's Transliterator.
> 
> 2) The normalization method make_standard_name is used on phrases in line
> 630. This is an sql function which returns
> trim(public.gettokenstring(public.transliteration(name))).
> 
>     We need to replace %09-%0d characters in phrases. This can be done
> simply by adding
> 
>                 $sPhrase = preg_replace('/[\x09|\x0a|\x0b|\x0c|\x0d]/', ' ',
> $sPhrase);
> 
>     before normalization function is called.
> 
> 3) Other solution would be to change normalization(breaks the DB). The
> transliteration() uses the utfasciitable.h
> 
>     Changing UTFASCIILOOKUP by replacing 9-13 th position elements by '2'
> does the job.
> 
> 
> I have tested both the ways, and both seem to work as expected. What should
> I do now?

Go for solution 3). It is true that it breaks the DB but only for places
that have characters %09-%0d in their name. That's basically data that is
broken in the OSM database already and should be fixed. Therefore it is
okay to make an exception to the rule not to change the normalization.

Cheers

Sarah



More information about the Geocoding mailing list