[Geocoding] Regarding issue #967

Wed Apr 1 17:49:52 UTC 2020

Hi,

I have written test cases in test/bdd. But I found something else while 
doing so. setNearPointFromQuery function used to detect LatLon pairs is 
processed separately. This causes the last two examples in the following 
Scenario to fail.

     Scenario Outline: Search with white space characters
         When sending json search query "<data>"
         Then exactly 1 result is returned

     Examples:
       | data |
       | amerlugalpe, N 47.15739° E 9.61264° |
       | amerlugalpe,    N 47.15739° E 9.61264° |
       |     amerlugalpe    ,     N 47.15739° E 9.61264° |
       | amerlugalpe, N 47.15739°         E 9.61264° |
       | amerlugalpe, N 47.15739° E    9.61264° |

This could be fixed by using a preg_replace in setNearPointFromQuery 
function in SearchContext.php or by applying preg_replace on $sQuery. 
The former will fix LatLon, but the main query string will still have 
those characters.

Which approach should I follow? Or should I ignore this, as this is a 
part of LanLon, and would not consist of other white space characters in 
general?

Regards,

Rahul

On 01/04/20 11:42 am, Sarah Hoffmann wrote:
> Hi Rahul,
>
> On Wed, Apr 01, 2020 at 05:36:00AM +0530, K Rahul Reddy wrote:
>> For issue #967 <https://github.com/osm-search/Nominatim/issues/967>, These
>> are some points I found so far:
>>
>>      In Geocode.php lookup(),
>>
>> 1) The sNormQuery is made by using PHP's Transliterator.
>>
>> 2) The normalization method make_standard_name is used on phrases in line
>> 630. This is an sql function which returns
>> trim(public.gettokenstring(public.transliteration(name))).
>>
>>      We need to replace %09-%0d characters in phrases. This can be done
>> simply by adding
>>
>>                  $sPhrase = preg_replace('/[\x09|\x0a|\x0b|\x0c|\x0d]/', ' ',
>> $sPhrase);
>>
>>      before normalization function is called.
>>
>> 3) Other solution would be to change normalization(breaks the DB). The
>> transliteration() uses the utfasciitable.h
>>
>>      Changing UTFASCIILOOKUP by replacing 9-13 th position elements by '2'
>> does the job.
>>
>>
>> I have tested both the ways, and both seem to work as expected. What should
>> I do now?
> Go for solution 3). It is true that it breaks the DB but only for places
> that have characters %09-%0d in their name. That's basically data that is
> broken in the OSM database already and should be fixed. Therefore it is
> okay to make an exception to the rule not to change the normalization.
>
> Cheers
>
> Sarah