[OSM-talk] coping with outside data was: "proprietary" keys and values, machine readable vs. humans

Sat Jan 28 21:10:56 GMT 2012

Some datasets outside OSM can contain data that OSM users and data
consumers might like to work with. Some have been imported as they
became available, whereas others were not. I don't believe we can 
keep doing that decision for all eternity, but no single actor can
implement the solution that might solve the conflict of needs. 

Again, this turned out longer than I had expected, but there's a 
thought buried in there in the lines below...

Fredrik Ramm wrote:
>Jukka Rahkonen wrote:
>> Much of that
>> data is hard or impossible to update by OSM contributors but new 
>> updates will be offered from the original sources.
>That sounds like a perfect reason to not import.

Some of it is definitively good for an import, lots of it isn't:
for example administrative boundaries are valuable, even for mappers,
and not acquirable with a survey.

There has been some mentions of the idea of an OpenMetaMap; I believe
it's time to really think and discuss how the OSM toolchain could be 
used with data from different sources - in a transparent way. It has 
been mentioned several times that other datasets can be mixed in in 
the rendering - like the corine landcover data. For a GIS pro the 
value of OSM data is in the data contributed by our users - they often
have access to the other data sources and know which sets to look at 
for the best combination, and the pro-gis tools have layers and they 
can work to make a single map. 

However, a newbie who has just learned, for example, that they can get
fresh maps for their car GPS, all that other data that they could 
like, and would like to have, on their display (e.g. woods) - for them
that data _does_not_exist_. 

"We" can't expect everybody to learn about (eventually) hundreds of 
data sources and how to combine them in-whichever-tool-they-use, but
rather there ought to be (eventually) a way for consumers to select
a repository and be done with it in simple cases.

The first, and apparently, at the beginning the easiest solution is
to import everything - the cons have been discussed to death: lack
of maintenance, lack of resolution, death of community, nonexistent
automatic updating tools, and depending on the implemention, lack
of topological connectivity.

The second solution is "don't import, write a wiki page". Some people 
will go digging for the datasets they want, and learn how their tools
can locally convert that data to compatible format. The cons are 
mostly the same, except that the original source might maintain the
data and that the local osm community keeps working on their own data
as they used to. An added con is, though, that most end users won't
use any of that data, and consider OSM as "lacking ... the data that 
I need".

The third solution is to make the could-be-imported data available
as a layer users can draw on, acting only as a aid and verify layer.
It's like a local expert that tells you that the feature you see
in aerial imagery and which resembles a road, is really a road.
At this time, this is possibly the sanest way - but the end users
still mostly lack the data they "need".

I don't think there has been any mentions of more solutions, but:

The fourth solution could be a way for people to distribute datasets
from outside sources in a format that is like the current osm data -
say, mkgmap would accept it just like any other *.osm file - but 
mangled just enough that nobody could upload it to the main DB. In
an ideal world, there might be a server API that delivers these with
conflicting OSM entities substracted; that is, given a Corine dataset
wood area that partially overlaps a wood already drawn into OSM with
more detail, the returned data would have an incision that _could_ be
topologically connected to the relevant OSM nodes - but only if the
tool want's to know about it. Combined with the third solution, this
might even make mappers more eager to map the corner of a wood they 
saw - they couldn't know if they whole element in some outside data 
source is, say, mainly coniferous, but even if they split and tag a
corner of it, they would know that the consumers would only see added
detail - and the rest as it was in the original data.

(Now you shout "glad that you volunteer" :) I don't even think this is
something any single developer could implement, and it might even have
consequences on the API. This message is about feedback.

I don't yet have any data to back this up, but I believe most OSM data
consumers are not the traditional GIS experts with Grassgis, QGIS and
spewing out svg's or pngs, but mostly users importing dumps and 
running a tile server or just mkgmap or other convert-to-navigation-
platform software.

I should not go into technical details; negative id's are already in
use for other things, and most current consumers would choke on
alphanumeric id's for these don't-you-dare-upload entities. All the
technical details would have to be discussed.

>> Topological data and landuse
>> data are some examples. Corine land cover will be updated this year,
>> 9 gigabytes of topological vector data from the National Land Survey 
>> of  Finland will be free under attribution-only license in May and 
>> so on.
>All that should not be in OSM.

The funny part is that OSM already contains features from most of 
the feature classes included - with more (tag) details than they do -
but with a very uneven distribution. From what I looked at the file 
sizes, roughly half of that is their DEM - ten times the size of the
dataset "highways with addresses". With very sparsely populated areas
(as an extreme example, the Inari municipality with 0,45 persons/
sq km) the feasibility of acquiring and maintaining a usable dataset
within osm, even of just highways, is up to chance, so some kind of
interworking solution would be crucial for a spatially homogenous
dataset in the end users' point of view.

-- 
Alv