[Strategic] Fwd: Subject: Forks and such

Mon Aug 30 19:26:43 BST 2010

On 30/08/10 14:54, Mikel Maron wrote:
> Forking has been well explored on the lists, and here. If someone 
> could give a neutral account in the wiki, that would be a good 
> contribution.

I think this is a good idea but based on the discussion, we have some 
fundamental issues to cover. Once the discussion begins to slow, I 
suggest we can document some of the ideas on the wiki.

On 30/08/10 13:44, Frederik Ramm wrote:
>
> If it is OSM administered then maybe we're not talking about a fork, 
> but about dual licensing?
Yes, that could be worth considering. I did not intend to include this 
possibility in my original idea. I imagined several independent 
databases under different licenses (and yes, there would generally be 
divergence). I am occasionally pro-PD-like licensing but there are 
several to choose from and a multiple PD-like license would seem to be a 
through solution. I can't see much call for dual licensing CT/ODbL with 
a 2nd license at the moment - unless it is CC-BY-SA (but the LWG can 
worry about that).

On 30/08/10 10:22, Oliver wrote:
>
> I think it is clear that with the effort put in the license change the 
> idea to handle a fork under the umbrella of the OSMF is not capable of 
> winning a majority within the OSMF. Otherwise it would make more sense 
> to establish an ODbL fork rather than changing the license of the 
> primary database.
Interesting point, but I don't see the need for a vote of OSMF 
membership (yet or possibly at all). We are getting ahead of 
ourselves... (Side note to my thread on "consensus": since when does 
OSMF membership votes determine the direction of OSM?)

On 30/08/10 17:28, Jim Brown wrote:
> At any given time, the total data you can use for anything is limited 
> to a single database.  Having multiple data sets is a binary condition 
> where choosing one excludes using the others.  Let's say we have two 
> data sets A and B, A has 1m POI, B has 750k POI and between them they 
> have 1.25m distinct POI.  The 1.25m number is irrelevant as no one can 
> use it, they can use 1m or 750k. And the management of the 500k 
> overlap data is totally wasted effort detracting from mapping and editing.
I feel like we are both repeating ourselves, but this won't go into an 
infinite loop... I hope. In what follows, my tone attempts to be more 
concise than my previous email. It might come off as rather 
argumentative, but this is not the intent - sorry in advance basically! 
For other readers, basically I attempt to pick apart Jim's points but I 
don't advance anything new.

So, for a single user, yes. But for different individual users, the 
quantity of data is not the only consideration to make a database the 
preferable or usable one. For example, the license is different. Users 
want or need different licenses. Therefore both databases are utilized. 
 From my previous example [1], are you saying only one database fork of 
Australia is "useful"? Specifically answering this point might provide 
me with some insight into your thinking.

Your argument seems to have the conclusion that only one GIS database is 
ever need in the whole world for any purpose ("USABLE data is the amount 
in a single database"), which is clearly absurd. (I am taking a literal 
reading, as you suggested). If you admit other databases have their 
uses, for what ever reason, then forks could in principle be useful.

Also, I am an existentialist. This means I think something is valued if 
(and only if) we think it so. Some people think forks are valuable. 
Therefore forks are valuable (at least to those people). You are of 
course entitled to your opinion that they are not, but don't assume 
everyone is like you. Cloudmade wants a global and comprehensive 
database, fair enough but there are other users in the world.

> As soon as someone edits only one, then that is game over for that entry.
That is far from certain to occur.

> The difference is in the scope of impact, not the quality of the impact.
>
[snip]
>
> the decision to use a tool, write a tool, fork a tool, change the 
> source code license of a tool or discontinue the use of a tool has no 
> lasting impact outside the authors/users of the tool.
>
And the decision to change or fork a database has no lasting impact 
outside its authors/users! I don't see a difference between diversity in 
tools and databases, apart from the number of users. Ok, so the OSM 
database has more users than a single tool. So what? (I am not calling 
for reckless action, I am just pointing out I don't agree with 
your/Jim's point.)

> Having a forked database impacts EVERY mappers who wants to map an 
> area by constraining the set of data they see as what is already there 
> (they must choose a data set to map against).  It also impacts every 
> user of the map data who has to choose between datasets to work with.
>
Ok, so a choice of databases would exist. So what?

> Consequently, I think that any forks are permanent divisions of the 
> project and do not add any value to the project.
>
That doesn't follow from the fact that the database has many users 
(large scope), or a choice of databases exists.

>   If the goal is to "create and provide free geographic data" as you 
> say then we do that less well in a forked world as the data is both 
> less complete and less accurate in any single fork that it would be in 
> a unified database.
>
I don't agree with your premise. You need to establish that forking 
results in less completeness (which is far from certain) and less 
accuracy (ditto). And even then, your conclusion doesn't follow from 
that premise, either! (Accuracy and completeness are not the only 
attributes of databases. What about license, format, availability, 
richness, etc?)

> The costs of a fork are pretty extreme.
>
I don't agree that "costs of a fork are pretty extreme". Perhaps you can 
back that up with a concrete example? (You probably think you did, but I 
don't see it.)

> A fork completely divides the project.  This is because the data is no 
> longer common across the forks and I would question the capacity of 
> the community (mappers, coders, admins and everyone) to support 
> multiple distinct projects.
>
This is an exaggeration. A PD-like fork would not be completely 
independent, as data would flow from PD to the other datasets. There 
would be some areas of commonality (and some areas that diverge). The 
tools are the same, too. And many individual mappers are shared. 
Therefore it does not "completely divide the project".

>   It would be like the Wikipedia foundation deciding to host a second 
> Wikipedia site under a different license.
>
And if there was some advantages in doing so, they might consider it. 
Your point?

As you probably can guess from above, I don't think there are any viable 
arguments to be made, using these abstract principles, against forking 
(such as I feel Jim has attempted). I have attempted to addressed every 
one of Jim's points. However, I can think of many practical problems 
that are far more worrying. Perhaps we should move on to those?

I don't know if anyone else cares to wade into this discussion and say 
if forks could, in principle or in practice, provide value or are they 
always a waste of time? We (or I) probably could do with some perspective...

TimSC

[1] 
http://lists.openstreetmap.org/pipermail/strategic/2010-August/000138.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/strategic/attachments/20100830/9fb0f7eb/attachment.html>