[Geocoding] GSoC 2020 Proposal

Sarah Hoffmann lonvia at denofr.de
Mon Mar 23 20:44:58 UTC 2020


Hi Rahul,

and welcome to GSoC.

On Mon, Mar 23, 2020 at 05:18:23PM +0000, Rahul Reddy wrote:
> I am K Rahul Reddy, third year Computer Science student at National Institute of Technology Karnataka, Surathkal, India. I have been going through Nominatim source code and the proposed GSoC projects. I would like to work on the Search suggestions project. I am currently familiarizing myself with the photon project and going through the FSA data representation demonstrated in https://blog.burntsushi.net/transducers/. I’ll share my proposal draft if the project is available.
> 
> Please let me know if the project is available.

Yes, the project is definitly still available. Here is a bit more background
to it:

The target audience for osm.org is first of all mappers.
For the search box that means that mappers should be able to
immediately find new data that they have just entered into
OpenStreetMap. And given that OSM is a world-wide database, it
has to work in any language.

Providing a minutely updated database is actually quite a
difficult requirement to fulfil, both for the implementation
and for resource use. So my suggestion here would be to work
around the problem by really only providing suggestions through
the new service but do the actual lookup still through the
Nominatim API. So somebody would start typing and get a number
of suggestions from the new service. If the user choses one
of them, we'd send the suggested string to the Nominatim API
and get results based on the latest OSM data. With such a
setup, it would be okay, if the database for suggestion is
a week or two behind the latest OSM data. That means that it
would be okay to create a new static database every one or
two weeks and use that. It might simplify the task.

We also have some technical constraints. Search is currently powered
by two servers. We used to have one in active production and the
other as cold standby. The plan always was that the standby
server should be used to provide the suggestion database and API.
Due to the increased load we now have both in production. I'd
still speculate for this project that we will get a third server
and we have a cold standby for search suggestions once more.
Creating or updating the database should not interfere too much
with such a server. It is okay to stop updating the server with
new OSM data for a couple of hours to create a new suggestions
database but it shouldn't take days.

There is no hard requirement to use photon for the project
but the suggestions must be based on the Nominatim database
to ensure that suggestion and search API deliver similar
enough results to not confuse the user. Photon can already
deliver on that. So, my suggestion would be to start from
Photon and set up quickly a prototype of search with
suggestions. Then however, I would suggest to fork photon
and concetrate on developing a version that is specifically
geared at providing suggestions. Photon is a full-featured
geo-search engine and contains a lot of data and code that
is not required for the task. The smaller the database can
be made in the end, the better.

That was quite a bit of information but I hope it helps you
understand a bit better the scope of the project. Feel free
to ask about more details if parts are still unclear.

Kind regards

Sarah



More information about the Geocoding mailing list