[Geocoding] GSoC 2021 - Extracting QA reports from Nominatim

Thu Mar 25 11:19:53 UTC 2021

Hi Antonin,

On Wed, Mar 24, 2021 at 11:00:34PM +0100, Antonin Jolivat wrote:
> First I would like to talk about my approach on the projects list. At the
> beginning I planned to focus on the project "Interface for reporting search
> bugs for Nominatim" but I saw that Yash Srivastava was already focused on
> it. I discussed with him and told him that I will go for the project
> "Extracting QA reports from Nominatim" because I think that it is
> counterproductive to make a proposal on the same project for both of us,
> and I don't mind as the two projects interest me a lot. However we were
> wondering if Nominatim has enough slots available to possibly take both of
> us. If you have enough vision on it and it is something you can communicate
> on, it would be very motivating to have this information (It is only in a
> theoretical way, of course it would be under your decision based on our
> proposals).

Two students for the Nominatim project (on different proposals) are possible
in terms of mentoring capacity. However, note that this is not the only factor
for you getting accepted or not. You are competing against all students applying
for a project under the OSM org and we will choose from those the ones with
the strongest proposal. That is not to discourage you. It is just a reminder
that because it looks like you are the only one interested in one specific
project idea, you will automatically get accepted. There is still strong
competition out there, this year more than ever. Choose the project that
interests you most and where you think you can really create a good proposal
and make a good contribution.

> My current questions are:
> - If I understood well, should this tool interfere during the data import
> process (so when the data are processed)?

No, the tool should be regulary run against an existing database. The idea
is the following: we have the global instance of Nominatim running at
https://nominatim.openstreetmap.org. Once a night, we would stop the regular
update, let the tool run over the database and create the error reports
in the form of vector tiles that are suitable for being displayed with
Osmoscope, then continue upating the database with the newest data from OSM.

The reports are then copied to whereever the instance of Osmoscope runs
and served from there. Hopefully, we have some official URL
https://nominatim.openstreetmap.org/osmoscope or something like that.

> - I know I will need to strengthen my mapping skills and my understanding
> of how Nominatim processes data, but in the meantime, could you give me an
> example of how one inconsistency can be discovered in Nominatim, like
> "admin boundaries with more than one member with role 'label'" for example?

You should look through the database structure of Nominatim. There is
unfortunately still very little documentation. The most important table
to look at is the 'placex' table. This contains all the "direct" information
we keep about the searchable objects. 'place_addressline' is needed to build
the hierarchy (this street is in this city, where 'street' and 'city' are
both OSM object contained in the placex table), although this might not be
needed for the current QA report suggestions. Anything about members needs
to be looked up in the table 'planet_osm_rels'.

> - I saw that osmoscope could be a good candidate, does the expected result
> look like this one of Brouter: http://brouter.de/osmoscope/?

Yes, that's the idea.

> - I have some questions about it as I don't know much about the ecosystem,
> I only want to understand the needs better. I see that osmoscope is not
> maintained and updated anymore since 2 years isn't it a problem?

It has been dormant but the maintainer is willing to take PRs on a short
notice for the project, should there be bugs or feature requests.
Osmoscope is not a strong requirement but it has the advantage that it is
available and already has a convenient interface.

The focus of this project shouldn't be so much on the UI but more on setting
up the infrastructure for extracting the errors. This is not trivial. It
needs to be efficient because the tool to extract the reports should not
run longer than half an hour. There might be quite a lot of errors. You can't 
dump everything into a single big file. It would kill the browser when the
user tries to view it. It needs to be stored in 'vector tiles' and simplified
for lower zoom level (i.e. get a good overview when you look at the whole
world).

> And do you
> really think that a lot of mappers consult it in order to make corrections
> on wrong data? Could you elaborate on what would be your ideal result (even
> if it is not the simpler) about the presentation of our QA reports? Like
> would it be a specific custom tool or only by using osmoscope or maybe
> another thing that I don't think about? My global idea of it is that it
> should be easily accessible and intuitive so that any random mapper can use
> it and modify data accordingly, it would be a pity if these reports are not
> used.

Mappers are used to working with all kinds of QA tools and Osmoscope was
designed with mappers in mind. So it hopefully solves the presentation side.
You'd still need to think about how to present and describe the errors so
that it is obvious to the mappers how they can fix them.

Sarah

> 
> Thanks for your time and your answers.
> Antonin

> _______________________________________________
> Geocoding mailing list
> Geocoding at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/geocoding