[Geocoding] Regarding issue #967
Sarah Hoffmann
lonvia at denofr.de
Thu Apr 2 19:51:08 UTC 2020
Hi,
On Wed, Apr 01, 2020 at 11:19:52PM +0530, K Rahul Reddy wrote:
> I have written test cases in test/bdd. But I found something else while
> doing so. setNearPointFromQuery function used to detect LatLon pairs is
> processed separately. This causes the last two examples in the following
> Scenario to fail.
>
> Scenario Outline: Search with white space characters
> When sending json search query "<data>"
> Then exactly 1 result is returned
>
> Examples:
> | data |
> | amerlugalpe, N 47.15739° E 9.61264° |
> | amerlugalpe, N 47.15739° E 9.61264° |
> | amerlugalpe , N 47.15739° E 9.61264° |
> | amerlugalpe, N 47.15739° E 9.61264° |
> | amerlugalpe
, N 47.15739° E 9.61264° |
>
>
> This could be fixed by using a preg_replace in setNearPointFromQuery
> function in SearchContext.php or by applying preg_replace on $sQuery. The
> former will fix LatLon, but the main query string will still have those
> characters.
Looks like the regexes in parseLatLong() are rather picky there and
only accept real spaces. That could be replaced with the more generic '\s'.
Cheers
Sarah
>
> Which approach should I follow? Or should I ignore this, as this is a part
> of LanLon, and would not consist of other white space characters in general?
>
> Regards,
>
> Rahul
>
> On 01/04/20 11:42 am, Sarah Hoffmann wrote:
> > Hi Rahul,
> >
> > On Wed, Apr 01, 2020 at 05:36:00AM +0530, K Rahul Reddy wrote:
> > > For issue #967 <https://github.com/osm-search/Nominatim/issues/967>, These
> > > are some points I found so far:
> > >
> > > In Geocode.php lookup(),
> > >
> > > 1) The sNormQuery is made by using PHP's Transliterator.
> > >
> > > 2) The normalization method make_standard_name is used on phrases in line
> > > 630. This is an sql function which returns
> > > trim(public.gettokenstring(public.transliteration(name))).
> > >
> > > We need to replace %09-%0d characters in phrases. This can be done
> > > simply by adding
> > >
> > > $sPhrase = preg_replace('/[\x09|\x0a|\x0b|\x0c|\x0d]/', ' ',
> > > $sPhrase);
> > >
> > > before normalization function is called.
> > >
> > > 3) Other solution would be to change normalization(breaks the DB). The
> > > transliteration() uses the utfasciitable.h
> > >
> > > Changing UTFASCIILOOKUP by replacing 9-13 th position elements by '2'
> > > does the job.
> > >
> > >
> > > I have tested both the ways, and both seem to work as expected. What should
> > > I do now?
> > Go for solution 3). It is true that it breaks the DB but only for places
> > that have characters %09-%0d in their name. That's basically data that is
> > broken in the OSM database already and should be fixed. Therefore it is
> > okay to make an exception to the rule not to change the normalization.
> >
> > Cheers
> >
> > Sarah
More information about the Geocoding
mailing list