[Geocoding] Regarding issue #967
K Rahul Reddy
k_rahul_reddy at outlook.com
Wed Apr 1 17:49:52 UTC 2020
Hi,
I have written test cases in test/bdd. But I found something else while
doing so. setNearPointFromQuery function used to detect LatLon pairs is
processed separately. This causes the last two examples in the following
Scenario to fail.
Scenario Outline: Search with white space characters
When sending json search query "<data>"
Then exactly 1 result is returned
Examples:
| data |
| amerlugalpe, N 47.15739° E 9.61264° |
| amerlugalpe, N 47.15739° E 9.61264° |
| amerlugalpe , N 47.15739° E 9.61264° |
| amerlugalpe, N 47.15739° E 9.61264° |
| amerlugalpe, N 47.15739° E 9.61264° |
This could be fixed by using a preg_replace in setNearPointFromQuery
function in SearchContext.php or by applying preg_replace on $sQuery.
The former will fix LatLon, but the main query string will still have
those characters.
Which approach should I follow? Or should I ignore this, as this is a
part of LanLon, and would not consist of other white space characters in
general?
Regards,
Rahul
On 01/04/20 11:42 am, Sarah Hoffmann wrote:
> Hi Rahul,
>
> On Wed, Apr 01, 2020 at 05:36:00AM +0530, K Rahul Reddy wrote:
>> For issue #967 <https://github.com/osm-search/Nominatim/issues/967>, These
>> are some points I found so far:
>>
>> In Geocode.php lookup(),
>>
>> 1) The sNormQuery is made by using PHP's Transliterator.
>>
>> 2) The normalization method make_standard_name is used on phrases in line
>> 630. This is an sql function which returns
>> trim(public.gettokenstring(public.transliteration(name))).
>>
>> We need to replace %09-%0d characters in phrases. This can be done
>> simply by adding
>>
>> $sPhrase = preg_replace('/[\x09|\x0a|\x0b|\x0c|\x0d]/', ' ',
>> $sPhrase);
>>
>> before normalization function is called.
>>
>> 3) Other solution would be to change normalization(breaks the DB). The
>> transliteration() uses the utfasciitable.h
>>
>> Changing UTFASCIILOOKUP by replacing 9-13 th position elements by '2'
>> does the job.
>>
>>
>> I have tested both the ways, and both seem to work as expected. What should
>> I do now?
> Go for solution 3). It is true that it breaks the DB but only for places
> that have characters %09-%0d in their name. That's basically data that is
> broken in the OSM database already and should be fixed. Therefore it is
> okay to make an exception to the rule not to change the normalization.
>
> Cheers
>
> Sarah
More information about the Geocoding
mailing list