[OSM-legal-talk] ODbL and publishing source data

Thu Dec 1 00:33:19 GMT 2011

On 1 December 2011 00:18, Jonathan Harley <jon at spiffymap.net> wrote:

> By way of analogy: suppose I sent you a private email which included a
> license saying if you publicly use my email, you must share with me any
> other emails you combine it with. My email sits in your inbox together with
> other emails, and you can do searches across all of them. If it's a unix
> system, they're probably all in one single file. But have you really
> "combined" my email with your other private emails? My email is sitting
> there unmodified and completely independent of all your other private
> emails; it is not itself combined with them. So no, "storing next to" is
> not "combining". You can safely share a screenshot of your mail program
> without having to send me all your other private emails.

The problem with analogies is that they are analogies and aren't the same
as the original thing. As a similar one, what if instead you sent me your
mailbox rather than a single email, and I imported all your mail into mine
(so are probably stored in the same file). Although none of the actual data
(emails) have changes, they are stored together (possible even in a SQL
database rather than flat files).

I don't know if that would count as two collective databases or a single
derived database.

> If the rendering of the second output depends on the first dataset, the
> Produced Work created from the second dataset is not independent of of the
> first dataset.
>

No, the produced work isn't independent of it, but the datasets are still
> independent of each other, that's my point.
>

My point is that to actually do the rendering, you will have created a
single database containing both datasets in the process (albeit possible as
transient in-memory data structures). I don't think we're really
disagreeing, just both unsure as to where the line is and guessing on
different sides :)

>
>> I guess it's possible the rendering algorithm for the second dataset
>> could use the Produced Work from the first rather than the first dataset
>> directly, which may be okay except that it's arguable whether you are
>> reverse engineering part of the first database.
>>
>>
> Yes, I think that point's arguable - if the combined produced work is
> based on two previous renderings from the two datasets, you might be able
> to argue that those renderings are themselves databases, particularly if
> they use vectors - though it's a very murky area, because rendering usually
> involves throwing away lots of information. (For example, suppose two roads
> meet at a T junction, and I render them both in the same line style without
> names; you can no longer tell their IDs, tags, or even how many ways in OSM
> make up that rendered T shape.)

To give a more explicit example, consider if you rendered a map with a
white background containing all the OSM data excluding hotels (you can just
release the algorithm to remove hotels, which is trivial). You then did a
rendering of your hotel data, using an algorithm that tries to avoid
putting the hotels on a non-white section of the image.  The rendering then
indirectly depends on the OSM data, but not on the OSM database itself.

I guess that's the a question: if you write a program that reads data from
> two sources and uses both to produce it's output, are the temporary
> in-memory data structures considered derived or collective for the purposes
> of copyright and database right law?
>
>
Yes, I've puzzled over that one too. If in-memory structures count as
> derivative databases, then that would be the one you would have to ask for
> under ODbL, which only requires licensees to release the last in any chain
> of derivative databases. It would make that whole side of ODbL pretty much
> impossible to enforce if so.

Assuming that it does count as a derivative database for this point, it's
likely to be hard to impossible for them to release that since it only
existed for a brief period of time while it was rendering - it's quite
possible that the whole thing didn't exist at the same point in time as
data was processed progressively.

>
>  The answer probably depends on how the program is implemented, but given
>> that we won't know the implementation, how can we ever determine whether
>> someone's Produced Work requires them to release their database? If we say
>> we can't determine that, aren't we essentially saying that it's impossible
>> to enforce that part of the ODbL?
>>
>
> Even the attribution part of ODbL isn't necessarily easy to enforce - I
> suspect that the more complete OSM gets, the more difficult it will be
> sometimes to tell that a map with no attribution used OSM data. And yes,
> the share-alike part is much harder to enforce that that. It's not so
> gloomy really, though - in some cases we will know the implementation
> because it'll be open source, and most companies are professional and
> consider obeying the law pretty important.

Probably more important than the actual enforcement is the ability to tell
potential consumers that X is okay and Y isn't. Even if they are
professional and happy to obey the law, they are fairly likely to want to
know what they law says they need to do.

-- 
James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/legal-talk/attachments/20111201/a8f94ba4/attachment.html>