[Geocoding] How does Nominatim determine importance of places?

Sarah Hoffmann lonvia at denofr.de
Tue Aug 6 19:08:48 UTC 2013


Hi, 

On Sun, Aug 04, 2013 at 11:34:06AM -0400, Alex Weissman wrote:
> I'm using Nominatim to reverse-geocode natural language location
> descriptions for a research project. I spent some time looking through the
> source code (in particular, website/search.php), but I can't seem to make
> heads or tails of how the "importance" score is calculated.
> 
> From what I can tell, there is some baseline calculation and then numerous
> tweaks - one line, for example, says
> 
> $aResult['importance'] = $aResult['importance'] + ($iCountWords*0.1); //
> 0.1 is a completely arbitrary number but something in the range 0.1 to 0.5
> would seem right
> 
> I also noticed in the documentation that Nominatim will use Wikipedia to
> improve the ranking of results, but once again nothing specific beyond "the
> importance value is calculated as log(totalcount)/log(max totalcount)." I
> assume that "totalcount" is the number of internal links to an article
> about a specific location in the result set, and "max totalcount" is the
> maximum of that value across the entire result set. But this only tells me
> the scoring contribution from Wikipedia, and not how the baseline score is
> calculated.
> 
> My question is, what properties of the OSM data go into the calculation,
> and then how is the importance score actually calculated? What special
> tweaks and thresholds should I be aware of?

The major weight of importance comes indeed from the Wikipedia link count.
If no article can be found for an object, the base score is based on the
object rank (country, county, city, etc.)

There are a few minor tweaks to this wikipedia importance. The one
you have found is the reranking by exactness of match with the query
(the one you cited above). The more words from the query appear verbatim
in the display name (that's the one including the address) of the result,
the higher it gets ranked.

The second reranking is related to the viewbox. If you supply a viewbox
parameter, then anything within or close to the viewbox is ranked higher.
(e.g. https://github.com/twain47/Nominatim/blob/master/website/search.php#L976)

There is also a small tweak to take the importance of the address members
into account but that only has an effect if objects have an equal importance.
(e.g. https://github.com/twain47/Nominatim/blob/master/website/search.php#L1241)

I think that's about all that there is.

Sarah



More information about the Geocoding mailing list