[OSM-dev] Restrict key names on order to retain reusability of OSM

Stefan Keller sfkeller at gmail.com
Wed Feb 13 08:47:59 GMT 2008


As Frederik Ramm got it, it's about this:

> There is a reason why our data format has
> <tag k="keyname" v="value" />
> instead of
> <keyname>value</keyname>
>
> and that reason is allowing non-XML stuff in the key names. Or at
> least it seems like that could have been a reason. I hope nobody's
> going to come forward and tell me it was just decided over a few beers
> ;-)
All the tools/languages I use (xerces/xalan/Java) can cope with UTF-8. As I
tried to explain, GML is *not* the issue; strictly speaking, it's *even not*
about XML.

I'm just trying to say that you gave here an accidental freedom to keynames.
While for values UTF-8 is allright, which means you can give as exotic
street names as you wish. The consequences of this are problematic for all
those more database schema oriented applications. It's about databases, XML
and other formats outside the "OSM farm" you described which don't rely on
this fancy general meta schema, and it goes like this:

TABLE way_tags (
  geometry ST_GEOMETRY,
  id bigint(64) NOT NULL,
  keyname varchar(255) NOT NULL,
  value varchar(255) NOT NULL  // type
)

which typically becomes this:

TABLE highway (  -- former way keyname in UTF-8???
  geometry ST_GEOMETRY,
  id bigint(64) NOT NULL,
  type varchar(255) NOT NULL,  -- e.g. primary, footway, residential,
unclassified
  name varchar(255) NOT NULL
)
TABLE pointsofinterest (  -- former node keyname in UTF-8???
  geometry ST_GEOMETRY,
  id bigint(64) NOT NULL,
  type varchar(255) NOT NULL -- eg. level_crossing, rail, station, viaduct.
)
TABLE whateveryourkeynamewillbe ( ...

> So to recap: The current allowable characters in OSM tag names is UTF8 -
> Deal with it, instead of trying to impose limitations into OSM to make
> OSM data comply with YOUR requirements.
It's not 'my' requirement, it's about "best practices". The use cases are
e.g.:
* UMN MapServer (which is at least as capable as all known OSM renderers)
* Almost any other XML format/XML Schema
* Any other database, any GIS

So to recap: It's about few limits within the scope of OSM on an single meta
attribute (keyname) while retaining the GOOD thing of UTF-8 values in order
to avoid limited reusability and broken compatibility of OSM data beyond
it's actual well known tools. And best of all: Users obviously did not
require the kind of freedom so far you are argueing for. It's simply easier
to release this small constraint afterwards than the other way round!

Stefan

2008/2/13, J.D. Schmidt <jdsmobile at gmail.com>:
>
> Stefan Keller skrev:
> > You are right that XML names (= keys/tags) are valid in unicode
> > in which case the encoding of the whole XML document (exchange file)
> > must support this.
> >
> > But you know well that many tools have problems with non-ASCII XML
> > element and attribute names (for content/value UTF-8 is ok since
> > chars can escaped)!
> >
> > So, my last 20cents for valid key names before I give up is the
> following:
> > 'aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ_-.0123456789'
> > whereas such qualified names must begin with a letter and contain at
> > most one colon and have at most a length of 255.
>
> Stefan, if I am coming across in this message as a bit harsh, then
> you're not mistaken - I am a grumpy old man, and damn proud of it. Just
> remember, it's not personal. I try to go after the ball, not the man.
> (No FakeSteveC, it doesn't mean I try to go after a guys ball(s) in THAT
> way..)
>
> Three times you have posted that you want to limit the characters used
> in tag naming, revising your proposal first to include the colon, and
> now to include numbers. Each previous attempt you have been told that
> UTF8 is valid, for good reasons, and yet still you persist.
>
> You have not once given a valid TECHNICAL reason for such a change,
> WITHIN THE SCOPE OF OSM, for limiting the characters allowable in tag
> names.
>
> As far as I can see from your first message on this subject, your idea
> stems from converting OSM data from its XML format to GML.
>
> Your project might need GML, OSM doesn't.
>
> If you are in the need of GML compliant output, then it is your task to
> massage the OSM provided data into a GML compliant output. It is not the
> task of OSM to have the data in GML compliant format, since the XML
> format with UTF8 as allowable just plain works for OSM.
>
> The tools that you state have problems with non-ascii characters should
> be fixed to be able to handle the UTF8 characters. Not the other way
> around, by changing the dataset to comply with the requirements of the
> tools.
>
> You might think it's a hen and egg situation, although in this case, the
> egg definitely is the important part, and has priority.
> The egg (the data) in this case has attributes that can contain
> non-ascii characters, thereby allowing non-latin based nationalities to
> define their own tags in their own language. This is a GOOD thing, which
> should NOT be changed. The hen (tools and programs utilizing OSM data)
> must take this into account. If a tool can't do that, then the farmer
> (the user of that tool) have to either change that tool, or use the egg
> to prepare a dish that the tool can digest (massage the OSM data into a
> format the tool can use). The farmer should not try to persuade the egg
> that it is better of as a watermelon.
>
>
> So to recap: The current allowable characters in OSM tag names is UTF8 -
> Deal with it, instead of trying to impose limitations into OSM to make
> OSM data comply with YOUR requirements.
>
> Dutch
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20080213/ca6ad7b9/attachment.html>


More information about the dev mailing list