[OSM-talk] Name finder and home page search working again
David Earl
david at frankieandshadow.com
Thu Mar 27 11:52:52 GMT 2008
On 27/03/2008 09:07, Stefan Baebler wrote:
> On Thu, Mar 27, 2008 at 12:55 AM, David Earl <david at frankieandshadow.com> wrote:
>> On 26/03/2008 19:31, OJ W wrote:
>> > Is it doing anything with the multilingual names in OSM (name:de=... and
>> > similar)?
>> Yes, that should have been on my list in the previous message.
>> This always was included. e.g. try searching for Cologne and Köln (or Koln)
>
> Venice in Italy has name:sl=Benetke
> Italy has name:sl=Italija
>
> Searching either for "benetke" or "italija" works as expected.
> searching for "benetke, italija" (sl, sl) fails miserably - returning
> empty page.
I suspect the empty page was because the search timed out. It should
have said no matches I guess.
> However searching for "benetke, italy" (sl, en) or "benetke, italia"
> (sl, it) works flawlessly.
>
> how exactly is the context determined?
> a) is_in tag on the node, thus requiring additional (imo redundant)
> is_in:sl="Italija, Evropa" tag on Venice
> b) closeness of the nodes (might be ok for cities, but countries vary
> a lot in size)
> c) inclusion in a context polygon (country, city)
The name finder wiki entry outlines the method:
http://wiki.openstreetmap.org/index.php/Name_finder
There are two different kinds of search: unqualified and qualified (the
latter with a comma or "near". In the first case, the name is just
looked up, with variations. In the second it looks for the qualifying
place (place=city,town,small_town,village,suburb or hamlet) and then
searches for the bit before the comma "close to" the place or places found.
Searches can also be further qualified using is_in (NOTE: only is_in - I
don't recognize is_in:lang, as its the first I've heard of it).
For a full search that's two commas: "Hinton Road, Fulbourn, UK". If the
is_in qualified search fails, though, it tries again without the
qualifier (because there are so many is_in's missing or inaccurate).
The reason "benetke, italy" or "benetke, italia" work though is because
they aren't quite working "flawlessly". What these are actually doing is
looking for "benetke" near a place called "italy" which doesn't exist as
a place (settlement, as above), so it then tries a the search again
without qualification. So what you;'re seeing is ther same as if you
simply searched for "benetke".
> Similar problem is with "dunaj, avstrija" (sl, sl) = "vienna, austria"
> (en, en) = "wien, osterreich" (de, de)
>
> Perhaps matches in the same language should be ranked higher than
> matches fro mmixed languages
>
> Bigger entities should also be ranked higher (continent > country >
> city > town > village > street ...) if no difference in context is
> found
Indeed, and they are. But countries and regions are not places
(settlements) for the purposes of searching. You are searching "near to"
a place, not "within a" country - as you say The country is too big for
this to work at all reliably.
Having said all that, I think I can improve this: in particular with
respect to language variations in country names (the information needs
to come from somewhere if it isn't in alternate is_in forms though), and
to interpret the form "a, b" as 'place is_in country' as well as 'object
near place'
> Try searching for "europe" or "austria". For the latter unique
> "avstrija" (sl) gives far better results :)
That's because though I don't use the country as a qualifier, it is
still a node with a name that gets put in the general search index.
You'll presumably get back "Austria Street" and "National Gallery of
Austria" in the same search as well but Austria will come first because
it is an exact match while the others have additional words. Osterriech
will also work, I presume, because the node is name=O[umlaut]sterreich
and name:en:Austria
> Performance is much better than yesterday!
You probably were using it yesterday while it was still construting the
index update, which went on until about 10am. Today it failed (the
machine ran out of memory) and I restarted it about 9am; I expect it
will continue for some time: it's on 20% at present though the early
stages often take the longest (because there are so many "1st street"
entries and the like).
David
More information about the talk
mailing list