[Imports] Import of Flemish Government data (building footprints and addresses)
Pieter Vander Vennet
pietervdvn at gmail.com
Thu Nov 1 14:39:09 UTC 2018
Hello everyone,
Original Poster here.
Thanks for your remarks. A lot of people active in the Belgian community
are following this discussion with interest and some bewilderment. Like the
first message, this is a reply that has been written by several members
together.
We did have a hard time filtering the actual discussion about our import
from the thread. We would kindly invite everyone to take the discussion of
other imports and the more philosophical points to a more appropriate
place (e.g. the talk-list) and stick to the topic of the Flemish buildings
import.
We have been working on this for two years and don't mind working on it for
a few more months, before running the actual integration. We are, in the
first place, looking for advice and guidance so as to further improve our
methods - together with with a go-ahead once all issues are resolved.
In case this was not clear before: this is _not_ a flat import where all
the data is dumped into OSM automatically. This is an effort of the Belgian
community, where mappers select a subset of the data they choose to
integrate piecemeal, usually one street at a time. The data is loaded in
JOSM and checked against the aerial imagery involving the mapper's common
sense.
As an example, this is a screenshot of the tool in action:
https://matrix.org/_matrix/media/v1/download/matrix.org/WmemnfTQZOSfoKoIlByFBHmv
. You can see that the tool reuses tags from OSM and is offering the mapper
the choice which geometry should be used (by not saving the import). In
combination with aerial imagery, this should cause little to no problems in
regard with the original OSM-data.
Mateusz and Frederik, your points regarding documentation are being
addressed as we speak.
# External IDs
The most discussed point seems to be the IDs we wanted to include.
We believe the IDs will significantly ease updates to our buildings when
the GRB is updated, and make the whole process more robust.
They provide a way to update and cross reference the data now and in the
future. Exactly by keeping these IDs, updating data down the road will be
smoother and prevent later changes from OSM contributors to be overwritten.
As an added benefit, keeping IDs makes the import tool-agnostic. It will
also make it much easier to flag errors in the source data.
We aren't afraid that people will refrain from editing source-tagged
objects: new users (using the iD editor) will probably not notice those
tags in the first place; advanced users will know about them. And
intermediate users will probably look them up. Merging and splitting are
rather rare operations – once the geometry is accurate, they should become
unnecessary operations. Finally, as the data set is available for free
under an open license, anyone can verify the data independently (though not on
the ground, of course).
We will address the worries about the external IDs on a point by point
basis below:
>From *Frederik Ramm:*
- *If you delete a building* *that has such an ID, how will you ensure i*
*t** isn't brought in again* *through a later "update import"? Etc.**"*
There will not be "an update import". The tool is built for continuous use.
It will improve the geometry of existing buildings and already looks at the
source tags. The tool is built for heavy mappers who will be monitored by
our closely knit community. Importing and updating will be done street by
street or block by block basis. We believe we can trust these mappers to
analyze situations like this on a case by case basis. In this case, whether
of not the deleted building had an ID does not matter much.
These points, respectively by Frederik Ramm, Christoph Hormann and Mateusz
Konieczny are quite similar. *We'll answer them together.*
- *The idea of having an "audit trail" for every single geometry by way
of an Id for that individual geometry is interesting, but I think that it
is totally sufficient if a changeset carries the information that this
changeset has been imported from XYZ data source at time stamp T;
everything else can be researched down the line if the need should*
- * For this purpose it is completely unnecessary to bother the OSM
community with external IDs. If you want to check if the data has been
unchanged since you added it then do exactly that - check if there are any
newer versions of the objects that have originally been added in the
import.*
- ** What is the point of adding tags like source:geometry:entity given
that like any other tags they may be edited once added to OSM?*
It is not, and has never been, the intention to fully automate this. As we
are trying to make clear, this info is needed and used all the time, and
not at some vague "maybe we do an update sometime". We are not using the
IDs to make a direct (full database) comparison to see what exists in one
and not in the other on a gigantic scale. We are not looking for an 'auto
update OSM with all new buildings added to GRB'. We use the IDs, on a
SINGLE BUILDING comparison basis only to see what's changed (geometry
touch-ups, or entirely replacing a building).
"THEY WON'T BE STABLE ANYWAY"
If the external references are edited, the tool will flag the building as
needing an update. The only thing a (deliberate or accidental) unneeded
change of the ID would do, is alert the tool's users that something doesn't
add up. Either the building has been replaced in reality (and thus having a
new UIDN), and you can improve the geometry of the new building. If no
apparent change to the physical building can be traced, the UIDN can be
restored. The amount of 'false positives' due to unintended edits of the
IDs is expect not to come remotely close to the useful flaggings.
Without the tags, it's hard to tell which buildings have been imported: you
would need complicated spatial heuristics because we don't blindly copy
buildings, we improve them through other sources. Once mapped, the geometry
changing would happen more often than the tags changing, so we'd have a lot
more false positives.
"WHY NOT JUST CREATE A DATABASE OF LINKS EXTERNALLY?"
In theory it would be possible to have the tool keep a register of which
OSM ID maps to which ID in the GRB or to evaluate changesets to get similar
info. Some issues with this approach:
- Because we aren't doing an automatic import but manually add the
buildings through JOSM, this would require us to copy the OSM ID of each
building manually, or rely on heuristics.
- We don't want to centralize the link between OSM and GRB. There isn't
one single person doing the import, it's something several people in the
community work on. Anyone could host the tool if its current maintainer
disappears.
- Pushing the analysis of what exactly has happened towards a changeset
is an unnecessary burden on a mapper. The changeset will not just contain
"here be new buildings", but also "in this case, we just used part of the
geometry of the GRB building, but we left part of the building geometry
intact because that was better in OSM". After having analysed the
changeset, the mapper would still have to look up in an import database
what exactly the relationship between the OSM and GRB objects was at the
time of import.
- An external database providing the link between OSM and GRB objects
would be outdated after a day and almost impossible to update with changes
on the OSM side
"WHY NOT JUST COMPARE GEOMETRIES?"
Doing a geometrical analysis to analyze differences would be impractical,
not just because this would be computationally heavy, but also because it
would lead to too much false positives. For example because of tiny
changes, but also because we do not blindly use source geometries since we
first address overnoding.
I hope that all confusion is cleared now and that we can move this issue
forward.
With Regards,
The Belgian Mappers,
Pieter Vander Vennet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports/attachments/20181101/e246098a/attachment-0001.html>
More information about the Imports
mailing list