[Geocoding] [OpenStreetMap] #5015: Russian geocoding support in Nominatim

OpenStreetMap trac at noreply.openstreetmap.org
Sat Oct 19 20:14:20 UTC 2013


#5015: Russian geocoding support in Nominatim
----------------------------+-------------------------
 Reporter:  Zibrnbernstein  |      Owner:  geocoding@…
     Type:  defect          |     Status:  new
 Priority:  major           |  Milestone:
Component:  nominatim       |    Version:
 Keywords:                  |
----------------------------+-------------------------
 As of now, Russian geocoding support in Nominatim is totally broken. I'm
 filing this meta-ticket to track progress on individual tickets and to
 gather relevant information.

 The background is that I've tried to conduct a sociological study that
 involved computing coordinates for hundreds of thousands of addresses. For
 that, I planned to deploy a local Nominatim instance, but it turned out
 that for most of addresses it simply doesn't work. For now, I resort to
 using Yandex (Russia's #1 search engine) geocoding API that works like a
 charm, but is not suitable for bulk queries. Another point is that there
 are desktop applications being developed that use geocode-glib library
 (GNOME Maps, for example) that, in turn, uses Nominatim API inside.

 The problem is that Russian addresses nomenclature is very diverse and
 informal. Here is a brief summary; if needed, I can create a wiki article
 on that.

 1) The "street" term includes not only "улица" (a street proper), but also
 "переулок" (side-street), "проезд" (passage), "проспект" (avenue), "шоссе"
 (highway), "тупик" (cul-de-sac), "мост" (bridge), "площадь" (square) and
 some others. These are used in full or abbreviated form ("улица" -> "ул.",
 "проспект" -> "пр-т"), and can be both appended or prepended to the name.
 Sometimes, "Большой" (major) or "Малый" (minor) are the part of the name,
 and the word order is arbitrary. Thus, "Большой Ордынский пер." and
 "Ордынский Большой переулок" refer to the same. #4703

 Examples: "ул. Арбат", "Красная Площадь", "Филиппов пер.", "Энтузиастов
 шоссе"

 2) The building number nomenclature is also very diverse. Usually, there
 is a top-level prefix: "дом" (house) or "владение" (property), followed by
 the main number. These prefixes can be abbreviated as "д." оr "вл." or
 even omitted. #4647

 Besides the main number, there can be also letter indexes, different sub-
 numbers and combinations of those:
 - letter index is a letter (usually "а", "б", "в") appended to the
 building number without a space;
 - sub-building is either a "строение" or "корпус". These are similar, but
 not interchangeable. These can be spelled full-form ("дом 1 строение 2)"
 or abbreviated in different ways: "д. 1 стр. 2", "д. 1с2", "3 корп. 1",
 "3к1". As you see, are short form ("стр. 3") and one-letter form ("с3");
 both period and space can be omitted when appending it to the main number.
 Moreover, a sub-building number can have a letter index itself;
 - finally, the slash syntax is used when the building has dual address.
 For example, a building on the corner of two streets can be addressed as
 both "ул. Малая Ордынка 30" and "Большой Ордынский пер., 6с1", while full
 address is "Малая Ордынка 30/6с1".

 3) Rarely, but there can be ranges used as building numbers. For example,
 there is one single building with an address "Лесная ул., 10-16". This
 means that this building should be a hit for requests like "Лесная, 12" or
 "Лесная, 14" (but not "Лесная, 11" - there are even and odd sides of the
 street usually).

 4) The "е" (ie) and "ё" (yo) letters should be treated as identical; the
 queries should be case insensitive. #2467 #4819 #2758

 As a solution, I can imagine some code that canonicalizes the requested
 address. For this to work, all the Russian addresses in OSM will need to
 be canonicalized, too (probably, with the help of the same code).

-- 
Ticket URL: <https://trac.openstreetmap.org/ticket/5015>
OpenStreetMap <http://www.openstreetmap.org/>
OpenStreetMap is a free editable map of the whole world



More information about the Geocoding mailing list