[Osmf-talk] CC BY SA 2.0 and backup plan

Sat Dec 5 20:03:12 UTC 2009

80n wrote:
> On Sat, Dec 5, 2009 at 7:07 PM, Matt Amos <matt at asklater.com 
> <mailto:matt at asklater.com>> wrote:
> 
>     80n wrote:
> 
>         On Sat, Dec 5, 2009 at 6:15 PM, Matt Amos <matt at asklater.com
>         <mailto:matt at asklater.com> <mailto:matt at asklater.com
>         <mailto:matt at asklater.com>>> wrote:
>            you probably *can* publish a map based on a collective
>         database of
>            CC BY-SA data and ODbL data, and the produced work will be CC
>         BY-SA
>            licensed.
> 
>            i think you *can* publish a map based on a collective
>         database of CC
>            BY-SA data and a derivative of ODbL data, and the produced
>         work will
>            be CC BY-SA licensed and the whole dump, or diff, of the
>         derivative
>            ODbL data must be available.
> 
>            i think you *can* publish a map based on a collective
>         database of CC
>            BY-SA derivative data and a derivative of ODbL data as long
>         as the
>            derivative doesn't "represent, in terms of obtaining,
>         verification
>            or presentation, significant investment", and the produced
>         work is
>            CC BY-SA, and the diff/dump of the ODbL data is made available.
> 
>         I'm still thinking this through but how would you create the
>         ODbL derivative database?  You'd have to do it without any
>         recourse to the CC BY-SA data and I don't see how that would be
>         possible.
> 
> 
>     the clearest example is if you create it with no recourse to the CC
>     BY-SA data. for example, if you run osm2pgsql to create the two
>     independent databases, you can generate (CC BY-SA) tiles from mapnik
>     and distribute them. i don't see any license conflict here.
> 
> So how does mapnik combine the two datasets?  In memory.  I don't think 
> the definition of a database requires that it be resident on a physical 
> disk.  Mapnik has to either derive something from the CC-BY-SA data or 
> something from the ODbL data, it can't use magic.

no, not really. your argument is incomplete for these reasons:

1) the definition of a database doesn't require it to be resident on 
disk, but does require that the data be "arranged in a systematic or 
methodical way and individually accessible by electronic or other 
means". the data in mapnik's processing pipeline isn't arranged by 
mapnik in any way, and isn't accessible from outside mapnik. for these 
reasons i don't think mapnik's internal dataset is a database.

2) mapnik doesn't combine the datasets in memory, it rasterises features 
from each data source into an in-memory image which is the produced 
work. this is the same thing as rending a set of images from one 
database, then the other, then compositing them. since the compositing 
occurs after they've been converted to produced works, there's no 
combination of data.

>     a less clear example is if the modification doesn't "represent, in
>     terms of obtaining, verification or presentation, significant
>     investment". for example, if i have two apidb format databases and
>     take the list of elements in the ODbL database and delete them from
>     the CC BY-SA database then (it's my understanding that) i can still
>     render CC BY-SA tiles from the resulting collective database. and,
>     since CC BY-SA doesn't require that i release the database, i don't
>     have anything else to release.
> 
>     it gets even murkier if i do it the other way around, but if the
>     list of elements in the CC BY-SA database doesn't "represent ...
>     significant investment" then i think it's likely i can delete them
>     from the ODbL database and only release that list. again, the tiles
>     produced from the resulting collective database would be CC BY-SA.
> 
> The "significant investment" argument is very thin.  Any judge would 
> immediately see that both datasets represent significant investment.  
> You would not just be dealing with simple isolated facts, but rather 
> using the whole of the information in the dataset.

i'm not arguing that the complete dataset doesn't represent significant 
investment, i'm arguing the list of element IDs doesn't - it's assigned 
by the server and represents no creative or intentional contributor 
input. see [1] for further details, or [2].

> Whatever technical mechanism you try to use, the net result will be a 
> derivative of both datasets.  That's using information that represents a 
> "significant investment" however you look at it.

i don't agree. i think that for something to be truly derivative it 
needs to derive from meaningful information in both datasets. both 
datasets contain information which isn't meaningful (e.g: numeric IDs 
assigned by the server) and therefore the data can be combined in 
certain, limited ways.

cheers,

matt

[1] 
http://lists.openstreetmap.org/pipermail/legal-talk/2009-October/002896.html
[2] 
http://www.amazon.co.uk/Information-Technology-Law-Diane-Rowland/dp/1859417566