[Strategic] Fwd: Subject: Forks and such

Mon Aug 30 17:28:07 BST 2010

I think you misunderstood me Tim.

When I said that the amount of "USABLE data is the amount in a single database" I mean it literally...  At any given time, the total data you can use for anything is limited to a single database.  Having multiple data sets is a binary condition where choosing one excludes using the others.  Let's say we have two data sets A and B, A has 1m POI, B has 750k POI and between them they have 1.25m distinct POI.  The 1.25m number is irrelevant as no one can use it, they can use 1m or 750k. And the management of the 500k overlap data is totally wasted effort detracting from mapping and editing.

A fork is equivalent to having several Wikipedia data sets where any edit to one has to be added to the others by hand.  The content diverges over time.  Entries are in one and not the other, one entry forks in content compared to the "same" entry in the other and never recombine.  As soon as someone edits only one, then that is game over for that entry.

Re:

>I am still not seeing any fundamental differences between software and

>database licensing. Both determine how the tool or data may be used. So

>what? That difference doesn't change how they add value to the project.

The difference is in the scope of impact, not the quality of the impact.  Having a tool that is different impacts the, rather small, number of people who like to write tools and they can work out the impact amongst themselves...  The data created by that tool is portable across tools (through the OSM db) and the decision to use a tool, write a tool, fork a tool, change the source code license of a tool or discontinue the use of a tool has no lasting impact outside the authors/users of the tool.

Having a forked database impacts EVERY mappers who wants to map an area by constraining the set of data they see as what is already there (they must choose a data set to map against).  It also impacts every user of the map data who has to choose between datasets to work with.

Consequently, I think that any forks are permanent divisions of the project and do not add any value to the project.  If the goal is to "create and provide free geographic data" as you say then we do that less well in a forked world as the data is both less complete and less accurate in any single fork that it would be in a unified database.

The costs of a fork are pretty extreme.  A fork completely divides the project.  This is because the data is no longer common across the forks and I would question the capacity of the community (mappers, coders, admins and everyone) to support multiple distinct projects.  It would be like the Wikipedia foundation deciding to host a second Wikipedia site under a different license.

j

From: strategic-bounces at openstreetmap.org [mailto:strategic-bounces at openstreetmap.org] On Behalf Of TimSC
Sent: 30 August 2010 17:17
To: strategic at openstreetmap.org
Subject: Re: [Strategic] Fwd: Subject: Forks and such

On 30/08/10 09:53, Jim Brown wrote:

TimSC wrote:

We have more map data than we have before. Of course, it is not in a
single database or under a single license. Is this a bad thing?

Actually, the amount of USABLE data can be defined as the amount in a single database I think...
I'd have to disagree with you there. Of course if there was a single user with a single objective, there would be a single database which was the most suited to that user. But this doesn't reflect our current situation. We have different regional situations for mapping contributors, and different users with differing legal demands. For example, some Australian contributors have used a CC-BY-SA import source to create mapping. If we then have an alternative ODbL dataset, which is much more sparse, can you say which is best: the CC-BY-SA densely mapped or the ODbL sparsely mapped database? I am not arguing that the CC-BY-SA database is most appropriate in all cases. For users who operate only in Australia, the CC-BY-SA is legally ok and much more complete. An international user who is legally cautious might prefer the ODbL version. The same differing requirements also is seen for contributors. A big CC-BY-SA import can't go into a CT/ODbL database but it could be usefully added to a CC-BY-SA fork. And perhaps OSMF might negotiate data imports that are only compatible with CT/ODbL and not with CC-BY-SA.

Basically, a "one size fits" all approach doesn't reflect all contributors or users needs. If you think "one size does fit all", you need to argue that a fork would add no value to other users (not just yourself), and that might be difficult.

This is because the data in these forked data sets cannot be combined for use.  It is likely that they cannot even be rendered by OSM itself on a map tile.  They are truly islands of data, with the only common attributes being that OSMF hosts them and that the same editors and tools can be pointed to the data set for editing (probably as long as the editing apis and server logic stays the same over time).

For clarity, I generally agree with this.

Hence, I would still strongly argue that having multiple datasets with different licenses is very different from having multiple tools, and does not add value to the goal of creating the most complete map of the world.  And the reason for this is that having different licenses has a permanent and downstream impact on the data and how it can be used.

Ah, it is interesting that you said the goal of the project is "the most complete map of the world", when OSM's goal is generally held to be that OSM "creates and provides free geographic data". Not all users want or need a global database. May specialist maps only cover a small area. Don't assume everybody has the same requirements. (Extreme example, people have discussed mapping fictional worlds and other planets.)

On my original point, different OSM tools are available under different licenses. The code from one can't be shared with another. This is again similar to the situation for map databases. Effectively, the software code was never unified and it is difficult to do an API upgrade on every tool, for example.

Tools and other differences do not have the same impact.  They can come and go, be revised and experimented with and the impact that they have is limited to their users, not to their output.  The data they generate can be used in the same fashion as that generated by other tools.  Datasets with different licenses permanently affect the data contributed to them.

I am still not seeing any fundamental differences between software and database licensing. Both determine how the tool or data may be used. So what? That difference doesn't change how they add value to the project.

TimSC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/strategic/attachments/20100830/06c204c9/attachment-0001.html>