[OSM-talk] waist deep in the tagpile

Wed May 31 02:34:52 BST 2006

dear OSM-ers,

Chris mentioned that he, Schuyler and I have been experimenting with
mapserver renderings of planet.osm imports, translating the XML dump
into GML and thus via ogr2ogr into postgis/shapefile/etc.

This has involved attempting to extract enough meaning from OSM's tag 
system to produce an Osmarender-equivalent view of what's in the latest dump.   

http://london.freemap.in/tagmania.html illustrates teh quandary that 
i find myself in. The tabs on the left show 2 different tagging
schemes in effect, and what's been marked up with either one. (This is
just the major street types, none of the footways/waterways/etc yet.
This *doesn't* show ways, only annotated segments).

In short, the "controlled vocabulary" suggested at
http://wiki.openstreetmap.org/index.php/Map_Features asks users to
annotate different classes of road with 

'highway = [some road type name]'

By volume, more than twice as much data in May's planet.osm uses 

'class = [some road type name]' 

And in that heavily-annotated part of south-west london (kudos to the
data gardener!) a stretch of the M25 is marked up with both keys.
Planet.osm for May has 6888 segments that have both highway= and
class= keys, and of those, 141 have different values for their keys.
(Does this make sense? Does 'class' have semantics worth keeping?)

Now, i've had one eye closed on OSM for a few months, and may well 
be missing a significant shift in peoples' labelling practises.  
I see two ways in which this situation could have arisen:

1/ One set of people/clients is using 'class=foo', the other 'highway=foo'
2/ At some point in time, most everyone agreed to shift from 'class=foo' to
   the more human-compatible 'highway=foo'

Asking onlist looks to be the only way to figure out which is the case;
neither planet.osm or the API will tell me who tagged a segment, or when
they did it. (I don't think a raw database dump would tell me either.)

So right now, Etienne's wonderful Osmarender isn't picking up on the many
segments that were tagged 'class=foo'. For the mapserver-based
Osmarender knockoff i worked round this by making a duplicate layer for 
each key, that's what is shown in the map link above.

What is to be done?

* I leave things as they are, for others to workaround in the same way.
  [ But this seems super-undesirable - the problem is small now. ]

* I go through all the planet.osm segments that have 'class=' and no
  'highway=' tags and POST additions to the API. I see no way to
  delete tags through the API, this probably makes sense.
  [ But this will lead to unnecessary data duplication + slowdown. ]
  [ same issue arises with automated correction of obvious typos ]

* I try to persuade someone with direct access to the OSM db to
  replace all the class= tags with highway= tags.
  [ But this may break the world of someone who is still annotating 
    with, or rendering with, 'class=foo' tags. ]

* I ignore tagged segments and concentrate on ways
  * But segments seem to have better metadata + presence than ways do
  * Someone has to create the ways before they can get key/values,
    either inherited from the segment's tags (again, do any of the
    clients support this?) or re-created by hand.
  * I worry that my understanding of how ways work is partial. 

While I'm here:

- Andy's work on the Map_Features vocabulary is spiffing. Is this 'best
practise' to the extent that it's implemented in one or more of the
editing clients? 

On Tue, May 30, 2006 at 07:59:21PM +0100, Tom Carden wrote:
> openstreetmap.org - the main site should always have the best possible
> maps that can be generated from OSM data (it's behind at the moment,
> but catching up thanks to Nick) because it's really the only place
> that can drive traffic back to the editors.  Feedback is golden.

- But not a cent for entropy?. The ideal of sensible shared
classification emerging through demonstrated usage is a nice ideal.
OSM has got to the stage where this has led to an evolved 'standard'
which, though English-only, *is* being used in non-UK places in Europe.
I think that entropy for key/value usage preference is missing now.
"Most recently used popular tags" in an editing client could be a start.

- Would a 'key foo is semantically equivalent to key bar, and value X
in the context of key foo is equavalent to etc etc' be overkill? I
mention it in the context of future i18n.  

- One thing the three of us have are planning for, is an AJAX editing
client intended *only* for creation of metadata, not for drawing of
shapes. But if something like this didn't run on the OSM systems, then
we'd have to implement our own user authentication and either maintain
locally a model of who-tagged-what-how-when and repost everything to
OSM through one account, or somehow automate OSM account creation for
everyone who came to use a tag-only client. 

The idea of a metadata-only client may sound dull, but i experimented 
with this a lot last year, before 'tagging' was implemented in the OSM 
API, and it had a lot of the same stickiness / 'just one more feature' 
compulsion nature to it as drawing clients do. Honestly ;) 

More than enough for now.

jo