[Before addressing these technical legal issues, I should note that I
represent the Wikimedia Foundation, not OSM/the OSM community. While I hope
that in most cases the perspective of the WMF and the perspective of OSM
are in alignment, OSM members and the OSMF should definitely seek their own
legal counsel.

I'm also required by my ethical obligations to note that I'd be happy to
discuss some of these issues directly with a lawyer representing OSMF, but
my understanding is that there is no such lawyer at this time. If that
changes, please let me know and I can happily discuss with them.]

First, my comments to Paul, and then some comments/questions of my own.

> See
> https://wiki.openstreetmap.org/wiki/Open_Data_License/Substantial_-_Guidelin
> e for guideline text.
> > The Open Data License defines a term 'Substantial' which is then used
> > in the License to define a threshold about when certain clauses come
> > into effect.
> Substantial is a term defined in the relevant law, similar to fair use
or fair dealing under copyright law. We're not referencing the law at
> all in the guideline. If the use is insubstantial, than the ODbL doesn't
> come into play at all as you need no license.

Reminder that Simon has pointed out here quite recently that ODBL claims to
be a binding contract that can apply when no license is necessary.

> Is there any relevant case law on substantial?

Three qualifications here: I'm certainly not an expert in EU law; I'm
trying to summarize the state of things at email length, not
treatise-length; and it is not entirely clear that a court should or would
rely on EU law to interpret this part of the license agreement.

That said: my understanding is that there is not much EU CJ caselaw; as of
2012, only seven cases altogether about the database directive, and only
two that touch heavily on the scope of "substantial". The key case on
"substantial" is British Horseracing Board v. William Hill Organization
are surely local decisions that may also help inform an
interpretation, but you'd probably have to talk to a local lawyer in your
jurisdiction to analyze those, and even those seem to be fairly thin on the
ground, especially post-BHB.)

Per the directive and caselaw (paralleled by the ODBL), something can be
substantial in three ways: it can be quantitatively substantial,
qualitatively substantial, or substantial as a result of repeated and
systematic extraction of insubstantial parts. (The trial court, and some
commentators, had seemed to think it had to be *both* quantitative and
qualitative - see Derclaye, p.111 below - but BHB is pretty clear that
either is enough.)

The BHB court had this to say about what "quantitative" means in this

The expression ‘substantial part, evaluated quantitatively’, of the
contents of a database ... refers to the volume of data extracted from the
database and/or re-utilised, and must be assessed in relation to the volume
of the contents of the whole of that database. (Para 70)

This strongly suggests that a European court would evaluate "substantial"
in the quantitative sense with regards to the entire 2B records in OSM, not
with regards to the database the information was put into. It would be
interesting to see what courts around Europe are finding as "substantial"
in this sense; I see one reference to a French court that found that taking
15% was not quantitatively substantial, and the GRADE paper linked to from
the wiki suggests it would have to be > 50%. But I suspect this would vary
a lot based on the facts of the case, and that a skilled lawyer could raise
or lower the number. And of course in the case of a database as large as
OSM a court might try to change their mind.

For qualitative, the key passage of BHB is:

[S]ubstantial part, evaluated qualitatively, of the contents of a database
refers to the scale of the investment in the obtaining, verification or
presentation of the contents of the subject of the act of extraction and/or
re-utilisation, regardless of whether that subject represents a
quantitatively substantial part of the general contents of the protected
database. A quantitatively negligible part of the contents of a database
may in fact represent, in terms of obtaining, verification or presentation,
significant human, technical or financial investment. (Para 71)

In other words, a small chunk of a large database can be qualitatively
substantial if the cost of "obtaining, verification, or presentation" of
that small chunk was substantial. The court goes on to say that it doesn't
matter if the small chunk is, by itself, valuable - what matter is the work
done to put it into the database. What qualifies as a substantive
"investment" is left as an exercise for the lower courts. (One German case
I've found seemed to presume that 39,000 Euro was a substantive investment,
but that was not the primary point being argued in that case so I wouldn't
rely on the number being that low.)

For "repeated and systematic", the BHB court said:

The provision ... prohibits acts of extraction ... which, because of their
repeated and systematic character, would lead to the reconstitution of the
database as a whole or, at the very least, of a substantial part of it ...
(Para 87)

So repeated/systematic extraction that does not allow someone to
reconstitute a substantial part of the database would not be substantial.
The court justified this by saying that the purpose of this part of the
directive was to prevent circumvention of the first two definitions of
substantial, rather than to create a separate type of infringement.

The other EUCJ case that I'm aware of that has touched on the question is
Apis-Hristovich EOOD v Lakorda AD -
http://curia.europa.eu/juris/document/document.jsf?docid=77503&doclang=en ;
decent summary here:
http://www.mondaq.com/x/75750/IT+internet/New+ECJ+Database+Decision But the
case does not appear to be terribly relevant.

Some pretty decent summaries of BHB and other relevant caselaw, FYI:
- http://www.ivir.nl/publications/hugenholtz/EIPR_2005_3_databaseright.pdf‎
- The Legal Protection of Databases: A Comparative Analysis, By Estelle
Derclaye - available in the US on Google Books; search for "substantial
- Survey of French cases: http://ssrn.com/abstract=1989031

If we accept this definition of
> insubstantial as being true for geospatial databases in general, then
> their entire database could be extracted. If its true for OSM but not
> all other geospatial databases, we need to explain why.

I think it is pretty clear that this rule is only for OSM/ODBL, but it
wouldn't hurt to make that more explicit. (It *has* to be only about OSM,
because you can't judge whether something is substantial without knowing
about the nature of the database (quantitative) and how the data was
obtained (qualitative).)

Few other comments:

   - It might be helpful to link to
   http://wiki.openstreetmap.org/wiki/Map_features when talking about
   Features, assuming those are the same concept, which I admit I'm still not
   100% sure about?
   - It might be helpful to explain better why the page is focused on
   insubstantial rather than substantial.
   - The village/town distinction doesn't seem very helpful to me. If the
   goal really is to push out commercial projects, very few commercial
   projects are going to be viable at the town level - the vast majority will
   be national level, with a few exceptions for London/Paris/NY-level cities.
   So saying "you can use towns" would still block out most commercial use
   while perhaps allowing some small governments to do useful things. But I
   may be misunderstanding the goal here?
   - I find "This definition aims to:...Build a case for the "qualitative"
   interpretation of Substantial" to be slightly confusing - I *think* that
   what is meant is something like "This guideline attempts to clarify what
   uses would constitute a substantial qualitative use of OSM data" (perhaps
   implying that many important uses are not going to be quantitatively
   substantial?), but I'm really not sure. I would clarify or remove that.

Hope this is helpful-


