[OSM-legal-talk] ODbL: Where do we stand regarding collective/derivative databases

Tue Jul 28 02:31:23 BST 2009

On 7/27/09, Frederik Ramm <frederik at remote.org> wrote:
>     generally good progress on ODbL; many things have been cleared up
> and we will soon be at a point where the proposal for a license change
> is not some cloudy abstract thing any longer but a very concrete
> proposal that people can evaluate.

i'm glad you think so :-)

> After the LWG has made an effort to resolve the questions about what is
> substantial and what is a derived work, in my eyes there's one big issue
> that remains, and that is "what is a derivative database".

LWG cannot entirely resolve these questions, as they need open
discussion and community consensus (which we obviously can't provide
on our own). even then, final interpretation is up to the courts.

> To recap, my understanding is that if I produce (+publish) works based
> on a derivative database that I have created then I have to make that
> database available, fully, under ODbL. If I, on the other hand, produce
> works based on a collective database that is half ODbL and half
> proprietary, then I only have to make the ODbL part available. Is that
> everyone else's reading as well?

it's my understanding also.

> Let us look at someone who mixes OpenStreetMap and Navteq data. Say I
> produce map tiles (clearly a produced work, no?) where all the streets
> come from Navteq, but all the footways come from OpenStreetMap. There
> are a number of ways to do this, all leading to the exact same result,
> and nobody from the outside can see which of 1,2,3 I am using:
>
> 1. Configure my Mapnik tile generator so that it accesses two different
> postgis databases - one containing Navteq and one containing OSM - to
> produce merged map tiles.

or, equally, configure two different mapnik instances to produce two
different tilesets (one with a transparent background) and composite
the two in a later postprocessing stage.

> 2. Pour OSM and Navteq data into the same postgis instance but have
> different tables (e.g. planet_osm_roads and navteq_roads) which are
> joined by Mapnik's SELECT statement.
>
> 3. Extract all footway geometries from OSM and insert them into my
> postgis database containing Navteq street data, then run Mapnik on the
> resulting database.
>
> The way I read the license, option 1 would be definitely ok, option 3
> would definitely lead to my having to release the Navteq data, and
> option 2 would be somewhere in between (probably ok until unknown to me,
> Matt comes along and makes Mapnik internally create temporary tables on
> the fly for better performance in which case I'd be creating temporary
> derivative databases without even noticing...)
>
> Evil business genius that I am, I would of course claim to be doing 1
> even when doing 3 and nobody would have the right to challenge me,
> right? Which would ultimately mean that:

i'd say that if 1 were technically plausible, then what's the real
difference between that and 2 or 3?

> "If there is any conceivable way that a produced work could have been
> created by using a collective rather than a derivative database, then
> only the ODbL licensed part of the data source has to be released."
>
> This is becoming interesting, we're very much into real-world business
> scenarios now. There are lots of people who'd shy away from using OSM
> outright but if they could use a Navteq basemap and sprinkle that with
> any additional detail that OSM might have that would be just great for
> them.

indeed, so your use-case with navteq roads and OSM footways wouldn't
require the navteq data to be opened.

unless you don't like the way the navteq and OSM data don't quite
match up; the point at which you alter the OSM/navteq data to match
up, then 1, 2 and 3 become inconceivable and the navteq data must be
released (along with the whole/diffs to OSM data).

it gets more difficult when we consider the collision detection in
mapnik - does the fact that an item isn't rendered in one layer
because it would occlude a previously rendered item constitute a
derivative database inside mapnik, or is it simply part of the
creative process of produced works?

> Let us look at someone who has a Navteq and an OSM data base, and runs a
> comparing analysis which results in *removing* all features from the OSM
> database which were also in Navteq. He clearly creates a derivative
> database but one which has no data added, just data deleted. He now
> employs technique #1 from above to merge the Navteq data set and the
> reduced OSM data set into one that contains the "best of both worlds".
> Since he is clearly operating on a collective database, he only has to
> release the derived OSM database under ODbL - the value of which is
> almost zero to the community since it has no data added (the only thing
> you can do with it is find out which of OSM's features are present in
> Navteq as well).
>
> Is everything I write here correct and compatible with what others are
> thinking? Is there some lawyer opinion on cases like this documented
> somewhere in the vast depths of our Wiki and LWG minutes?

my reading would be that the deletions from the OSM data are a
derivative database of both the OSM data and the navteq data and that
the combination of navteq + (OSM - derivative) constitutes a public
use of that derivative database, requiring the release of the navteq
data.

as far as i know, we haven't had counsel on this specific case.

> (I'm just trying to determine what exactly ODbL mandates - not trying to
> find out what would be desirable in an ideal world.)

ask N people, get N * (N + 1) answers ;-)

my philosophy with respect to derivative databases hinges on the
definition of "produced work" - everything that comes out of an ODbL
database is one of; non-"substantial", a "derivative database" or a
"produced work". where the boundaries between these lie is open for
debate. currently, the guidelines on the wiki suggest that any
"substantial" extract from the database which isn't in an image format
(abbreviated -  see details on wiki) is a derivative database.
therefore, the point of rendering to an image format is the crucial
point. if the output P = a(b(X) @ c(Y)) where X and Y are input
databases (X is ODbL derived), a,b&c are functions and @ is an
operator, then we have the following cases:

1) b&c are rendering functions and @ is an operator on "images". then
X must be released but not Y. a can be any function.

2) a is a rendering function and @ is an operator on data. X and Y
must both be released. b and c can be any non-rendering functions.

3) a and b are rendering functions and @ is an operator between data
and an "image". X must be released, but not Y (as it is combined with
a produced work, not the original data).

4) a and c are rendering functions and @ is an operator between data
and an "image". i'm not so sure about this, but imho X and c(Y) must
be released, but maybe not the original Y.

disclaimer: i am not a lawyer, these views are my own, etc...

cheers,

matt