[OSM-legal-talk] Collective databases (was: Using OSM data, to generate game worlds.)

Christoph Hormann chris_hormann at gmx.de
Thu Feb 11 20:34:54 UTC 2021


On Thursday 11 February 2021, Frederik Ramm wrote:
>
> This is a very interesting question and I am not sure what my own
> opinion on it is.
>
> Your reasoning seems to be that, because the LIDAR dataset only says
> "a height of 123m has been observed at lon=x, lat=y", you can only
> match it to OSM by merging it with the OSM geometry, and therefore
> the two data sets are not sufficiently independent. Is that correct?

Formulated in a generic form in pseudo-SQL syntax the height of a
building with an osm based footprint footprint_osm would be estimated
with the help of LIDAR point cloud data points_lidar using something
like

height = F(ST_Intersection(footprint_osm, points_lidar))

where F() is some kind of function - in the simplest case calculating
the range of z coordinates in the data.  Now ST_Intersection() is
commutative - hence height is either a derivative of footprint_osm and
points_lidar or not a derivative of either.  It would be weird to claim
that it is a derivative of points_lidar but not of footprint_osm.

> A frequent example for a collective database is taking a list of
> restaurants and their locations from OSM, and then from a different
> source adding reviews for these restaurants. Would you agree that
> this is a collective database?
> https://wiki.openstreetmap.org/wiki/Collective_Database_Guideline
> does the same with phone numbers as an example.
>
> In the restaurant example, the review source will likely identify the
> restaurants by name or address, and you will have to look at OSM's
> name and address in order to create the link that makes the two data
> sets into a collection.
>
> Is there a fundamental difference between using a name from OSM to
> match it with the review data I am taking from a different source, or
> using a geography from OSM to match it with LIDAR data from a
> different source?

You already indicate the answer in your scenario i think.  In case of
the restaurants a spatial intersection would not work, nor would any
other form of spatial matching - like in case of two restaurants in the
same building.  Your rely on having a 1:1 relationship between the
entries in both databases a priori.  That is evidently the case in your
example if both data sets are without errors so the task is only to
determine that relationship and to deal gracefully with errors in
either of the data sets.  So this is a case of direct application of
the Fairhurst doctrine:

https://wiki.openstreetmap.org/wiki/Open_Data_License/Metadata_Layers_-_Guideline#Fairhurst_doctrine

which essentially says determining 1:1 match between databases (one that
pre-exists due to the very nature of the two databases referring to the
same real world things) never represents a work of substance according
to database law.  And that is definitely not the case in the Building
footprint + LIDAR example.

Unfortunately (at least to my knowledge) the LWG and the OSMF board have
during development of the Collective Database Guideline not openly
discussed with the community what the limitations to the Fairhurst
doctrine need to make sure it stays compatible with the spirit and
letter of the ODbL.  As a result the Collective Database Guideline
essentially adopts the Fairhurst doctrine without any limits and
without even limiting it (as i did and as i think Richard also did when
he introduced the idea) to pre-existing 1:1 relationships due to the
nature of the data, and this way - in combination with the other
guidelines - essentially can be read to abolish share-alike in all
practically relevant cases.

So to answer your earlier question:  Yes, i think the match of OSM
restaurants with proprietary reviews should be considered a collective
database because it is based on a pre-existing 1:1 match between the
OSM restaurants and the places the reviews refer to due to the real
world identity between the two things.  Because of this identity the
reference between the two databases itself does not represent anything
of substance and does not reduce their independence IMO.

--
Christoph Hormann
https://www.imagico.de/



More information about the legal-talk mailing list