[OSM-legal-talk] OSM for training ML machines

Wed Apr 10 08:35:39 UTC 2019

I will add my 2 cents in the same pot as Kathleen.

A typical "learned" model, based on a ML algorithm and a substantial
extract of OSM data:
That seems like a Produced Work to me.

Hence...
- licence for the training inputs (underlying database, data
structures built before learning): release under ODbL (Derivative
Database; publish the entire database; or alterations; or algorithm)
- licence for the model (weights, internal data structures built
during learning): Produced Work, release under any license that you
like (Share Alike: no), required to credit OpenStreetMap (Attribution:
yes)
- licence for the results (outputs): provided there are an
insubstantial extract or contain no OSM data, release under any
license that you like (Share Alike: no), not required to credit
OpenStreetMap (Attribution: no)

If the results (outputs) are used to create a new database that
contains the whole or a substantial part of the contents of the OSM
database, this new database would be considered a Derivative Database
and would trigger share-alike obligations under section 4.4.b of the
ODbL. [shameless plug of Geocoding guideline]

In fact, I think the Geocoding guideline is a very good starting point
and could be extended to cover other applications (ML-based or not).
Geocoder underlying database ~equivalent~ training inputs
Geocoder application ~equivalent~ ML-based model
Geocoding results ~equivalent~ model outputs

This is my understanding or interpretation of the current materials:
https://opendatacommons.org/licenses/odbl/1.0/
https://wiki.osmfoundation.org/wiki/Licence/Community_Guidelines/Produced_Work_-_Guideline
https://wiki.openstreetmap.org/wiki/Open_Data_License/Produced_Work_-_Guideline
https://wiki.osmfoundation.org/wiki/Licence/Community_Guidelines/Geocoding_-_Guideline

-- althio

On Tue, 9 Apr 2019 at 15:35, Kathleen Lu via legal-talk
<legal-talk at openstreetmap.org> wrote:
>
> My two cents:
> I'm not sure what you mean by internal data structures. If OSM data is used to train a ML algorithm, then I would think that the training inputs could be a substantial extract (possibly a trivial transformation of an extract). But what is trained would be an algorithm/weights, which I generally do not think of as a database at all? But since it uses an OSM database, a Produced Work seems the right concept:
> "a work (such as an image, audiovisual material, text,
> or sounds) resulting from using the whole or a Substantial part of the
> Contents (via a search or other query) from this Database, a Derivative
> Database, or this Database as part of a Collective Database."
> -Kathleen
>
>
>
> On Tue, Apr 9, 2019 at 5:06 AM Frederik Ramm <frederik at remote.org> wrote:
>>
>> Hi,
>>
>> is it a community consensus that, when someone uses OSM to train their
>> machine learning "black box", the internal data structures built during
>> learning constitute a derivative database? Or are there people who argue
>> that somehow the "black box" can ingest OSM data at will and still
>> remain 100% intellectual property of its operator?
>>
>> Further, assuming that we have a system that has ingested OSM by deep
>> learning and we say that this means its internal database is ODbL, what
>> would this mean for the output later produced by the same machine?
>>
>> Bye
>> Frederik
>>
>> --
>> Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00'09" E008°23'33"
>>
>> _______________________________________________
>> legal-talk mailing list
>> legal-talk at openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/legal-talk
>
> _______________________________________________
> legal-talk mailing list
> legal-talk at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/legal-talk