[Geocoding] support tab as space delimiter #967
Sarah Hoffmann
lonvia at denofr.de
Sat Mar 21 11:31:05 UTC 2020
Hi,
On Fri, Mar 20, 2020 at 09:36:43PM +0530, K Rahul Reddy wrote:
> Hi all!
>
> I am working on issue #967
> <https://github.com/openstreetmap/Nominatim/issues/967>. I have been asked
> to work on |/lib/Phrase.php 's| ||constructor
>
> But I noticed that Phrase.php constructor does not receive %09-%13
> characters. They are somehow removed.
>
> On further inspection, I found that this constructor is called by
> Geocode.php. The parameter passed itself does not have these characters. I
> found that
>
> $sPhrase = $this->oDB->getOne(
> 'SELECT make_standard_name(:phrase)',
> array(':phrase' => $sPhrase),
> 'Cannot normalize query string (is it a UTF-8 string?)'
> );
>
> in Geocode.php alters the $sPhrase and removes those characters. Should I go
> ahead and look into the the implementation further, or is there any other
> way?
You are on the right track there. The normalisation is done in the
Postgresql module that can be found in the module/ path. There is a
huge table there that does the translation von UTF8 to ASCII. That would
be the best place to do the translation from these special characters
to a space.
> PS: I initially planned to replace return function of getString function in
> ParameterParser.php with
>
> return preg_replace('/[\x09|\x10|\x11|\x12|\x13]/', ' ',
> $this->aParams[$sName]);
I'm not entirely opposed to hacking this in like this. However, the
approach above of doing it in the normalization step has the advantage
that the same is applied during the import of the OSM data. So we'd
also get rid of bad characters there.
Sarah
More information about the Geocoding
mailing list