[OSM-talk] Name finder and home page search working again

Thu Mar 27 11:52:52 GMT 2008

On 27/03/2008 09:07, Stefan Baebler wrote:
> On Thu, Mar 27, 2008 at 12:55 AM, David Earl <david at frankieandshadow.com> wrote:
>> On 26/03/2008 19:31, OJ W wrote:
>>  > Is it doing anything with the multilingual names in OSM (name:de=... and
>>  > similar)?
>>  Yes, that should have been on my list in the previous message.
>>  This always was included. e.g. try searching for Cologne and Köln (or Koln)
> 
> Venice in Italy has name:sl=Benetke
> Italy has name:sl=Italija
> 
> Searching either for "benetke" or "italija" works as expected.
> searching for "benetke, italija" (sl, sl) fails miserably - returning
> empty page.

I suspect the empty page was because the search timed out. It should 
have said no matches I guess.

> However searching for "benetke, italy" (sl, en) or "benetke, italia"
> (sl, it) works flawlessly.
> 
> how exactly is the context determined?
> a) is_in tag on the node, thus requiring additional (imo redundant)
> is_in:sl="Italija, Evropa" tag on Venice
> b) closeness of the nodes (might be ok for cities, but countries vary
> a lot in size)
> c) inclusion in a context polygon (country, city)

The name finder wiki entry outlines the method: 
http://wiki.openstreetmap.org/index.php/Name_finder

There are two different kinds of search: unqualified and qualified (the 
latter with a comma or "near". In the first case, the name is just 
looked up, with variations. In the second it looks for the qualifying 
place (place=city,town,small_town,village,suburb or hamlet) and then 
searches for the bit before the comma "close to" the place or places found.

Searches can also be further qualified using is_in (NOTE: only is_in - I 
don't recognize is_in:lang, as its the first I've heard of it).

For a full search that's two commas: "Hinton Road, Fulbourn, UK". If the 
is_in qualified search fails, though, it tries again without the 
qualifier (because there are so many is_in's missing or inaccurate).

The reason "benetke, italy" or "benetke, italia" work though is because 
they aren't quite working "flawlessly". What these are actually doing is 
looking for "benetke" near a place called "italy" which doesn't exist as 
a place (settlement, as above), so it then tries a the search again 
without qualification. So what you;'re seeing is ther same as if you 
simply searched for "benetke".

> Similar problem is with "dunaj, avstrija" (sl, sl) = "vienna, austria"
> (en, en) = "wien, osterreich" (de, de)
> 
> Perhaps matches in the same language should be ranked higher than
> matches fro mmixed languages
> 
> Bigger entities should also be ranked higher (continent > country >
> city > town > village > street ...) if no difference in context is
> found

Indeed, and they are. But countries and regions are not places 
(settlements) for the purposes of searching. You are searching "near to" 
a place, not "within a" country - as you say The country is too big for 
this to work at all reliably.

Having said all that, I think I can improve this: in particular with 
respect to language variations in country names (the information needs 
to come from somewhere if it isn't in alternate is_in forms though), and 
to interpret the form "a, b" as 'place is_in country' as well as 'object 
near place'

> Try searching for "europe" or "austria". For the latter unique
> "avstrija" (sl) gives far better results :)

That's because though I don't use the country as a qualifier, it is 
still a node with a name that gets put in the general search index. 
You'll presumably get back "Austria Street" and "National Gallery of 
Austria" in the same search as well but Austria will come first because 
it is an exact match while the others have additional words. Osterriech 
will also work, I presume, because the node is name=O[umlaut]sterreich 
and name:en:Austria

> Performance is much better than yesterday!

You probably were using it yesterday while it was still construting the 
index update, which went on until about 10am. Today it failed (the 
machine ran out of memory) and I restarted it about 9am; I expect it 
will continue for some time: it's on 20% at present though the early 
stages often take the longest (because there are so many "1st street" 
entries and the like).

David