[Geocoding] [OpenStreetMap] #5015: Russian geocoding support in Nominatim
OpenStreetMap
trac at noreply.openstreetmap.org
Sat Oct 19 20:14:20 UTC 2013
#5015: Russian geocoding support in Nominatim
----------------------------+-------------------------
Reporter: Zibrnbernstein | Owner: geocoding@…
Type: defect | Status: new
Priority: major | Milestone:
Component: nominatim | Version:
Keywords: |
----------------------------+-------------------------
As of now, Russian geocoding support in Nominatim is totally broken. I'm
filing this meta-ticket to track progress on individual tickets and to
gather relevant information.
The background is that I've tried to conduct a sociological study that
involved computing coordinates for hundreds of thousands of addresses. For
that, I planned to deploy a local Nominatim instance, but it turned out
that for most of addresses it simply doesn't work. For now, I resort to
using Yandex (Russia's #1 search engine) geocoding API that works like a
charm, but is not suitable for bulk queries. Another point is that there
are desktop applications being developed that use geocode-glib library
(GNOME Maps, for example) that, in turn, uses Nominatim API inside.
The problem is that Russian addresses nomenclature is very diverse and
informal. Here is a brief summary; if needed, I can create a wiki article
on that.
1) The "street" term includes not only "улица" (a street proper), but also
"переулок" (side-street), "проезд" (passage), "проспект" (avenue), "шоссе"
(highway), "тупик" (cul-de-sac), "мост" (bridge), "площадь" (square) and
some others. These are used in full or abbreviated form ("улица" -> "ул.",
"проспект" -> "пр-т"), and can be both appended or prepended to the name.
Sometimes, "Большой" (major) or "Малый" (minor) are the part of the name,
and the word order is arbitrary. Thus, "Большой Ордынский пер." and
"Ордынский Большой переулок" refer to the same. #4703
Examples: "ул. Арбат", "Красная Площадь", "Филиппов пер.", "Энтузиастов
шоссе"
2) The building number nomenclature is also very diverse. Usually, there
is a top-level prefix: "дом" (house) or "владение" (property), followed by
the main number. These prefixes can be abbreviated as "д." оr "вл." or
even omitted. #4647
Besides the main number, there can be also letter indexes, different sub-
numbers and combinations of those:
- letter index is a letter (usually "а", "б", "в") appended to the
building number without a space;
- sub-building is either a "строение" or "корпус". These are similar, but
not interchangeable. These can be spelled full-form ("дом 1 строение 2)"
or abbreviated in different ways: "д. 1 стр. 2", "д. 1с2", "3 корп. 1",
"3к1". As you see, are short form ("стр. 3") and one-letter form ("с3");
both period and space can be omitted when appending it to the main number.
Moreover, a sub-building number can have a letter index itself;
- finally, the slash syntax is used when the building has dual address.
For example, a building on the corner of two streets can be addressed as
both "ул. Малая Ордынка 30" and "Большой Ордынский пер., 6с1", while full
address is "Малая Ордынка 30/6с1".
3) Rarely, but there can be ranges used as building numbers. For example,
there is one single building with an address "Лесная ул., 10-16". This
means that this building should be a hit for requests like "Лесная, 12" or
"Лесная, 14" (but not "Лесная, 11" - there are even and odd sides of the
street usually).
4) The "е" (ie) and "ё" (yo) letters should be treated as identical; the
queries should be case insensitive. #2467 #4819 #2758
As a solution, I can imagine some code that canonicalizes the requested
address. For this to work, all the Russian addresses in OSM will need to
be canonicalized, too (probably, with the help of the same code).
--
Ticket URL: <https://trac.openstreetmap.org/ticket/5015>
OpenStreetMap <http://www.openstreetmap.org/>
OpenStreetMap is a free editable map of the whole world
More information about the Geocoding
mailing list