[OSM-dev] Restrict key names on order to retain reusability of OSM

Fri Feb 15 10:02:25 GMT 2008

2008/2/14 Stefan Keller <sfkeller at gmail.com>:
> Jochen wrote:
> > Stefan sees keys as a sort of identifier, which makes sense from a
> > programmer point of view, thinking about hashes etc. It is also how many
> > other systems are structured as Stefan points out. So there is some kind
>
> This gets the core of the issue: The OSM key in fact (i.e. in technical
> reality and "best practices") is an *attribute* or a *table name*.
> "Flurstück" could become a value of the attribute "Art" in table
> "Bodenbedeckung".
>
> Now Martjin gave a clever tip:
> > Do what most programs do: if you don't recognise the key, ignore it.
> > Think HTML. If you do recognise the key then it's irrevelant if it's
>
> > UTF-8 or not. Mapnik loads stuff into a database and just ignores
>  > everything it doesn't understand.
> This is the last resort and the only feasible solution as OSM is now. It
> assumes that OSM data will be always stored in the "fixed" key/value-meta
> model and will never evolve or be re-imported from other databases. Users
> will be 'surprised' when they miss their data on the map like with 'Tunnel '
> instead 'Tunnel' or with things like that '¨name'='Südstrasse'. My proposal
> would eliminate that.

OK, Stefan, I'll try to explain again why I disagree because I don't
think you're getting it.

Firstly, your proposal solves nothing here. 'Tunnel ' won't render on
quite a lot of maps at present, and /neither/ will 'Tunnel'. It won't
render because you used a capital 'T' and map features says 'tunnel'.
What you're talking about here is typos for which we need an entirely
different solution (and do in the form of JOSM presets/mappaint, and
potlatch presets/style rendering). This maybe be spell checkers, fuzzy
matching in the rendering code/preprocessors and numerous other ideas.

Secondly, there is absolutely nothing stopping you from writing a
program right now today that takes the current planet file and imports
it to your DB schema. The only consideration you have to make is a
transform scheme to ensure the database will allow the table name, and
as I've already pointed out, in postgres this basically involves
putting "quotes" round it. There's a million schemes you could use
with every one of them being easy to use.

>
> Now obviously it's about "geographic data" as said in the OSM homepage and
> its about databases and it's hopefully not HTML. The model you choose it a
> sort of meta model which is ok for initial capturing but not optimal for
> post-processing and showing it in maps - and the difference (from your point
> of view) is just restricting key characters to some smaller set!
>
> Look e.g. at this typical query: "Select all 'shops' (from key
> 'buildings')". In the current OSM database schema this query would look like
> this:
>
> # select id,geometry from ways natural join way_tags where k like
> 'building%' and v='shop';
>
> So to scale up, we have TWO INDEXES to maintain, ONE JOIN and TWO attribute
> PROJECTIONs. Whereas according to my proposal the same would look like this:
>
> # select id,geometry from building where type='shop';
>
> This requires ONE INDEX, NO JOIN(!!!) and ONE attribute PROJECTION. This is
> because database schema could look like this:
>
> CREATE TABLE "building" (
>   geometry varchar(255) NULL,
>   id int8 NOT NULL,
>   user_id int8 NULL,
>   timestamp timestamp NULL,
>   type varchar(255) NULL,
>   PRIMARY KEY ("id")
> );
> Now when it comes to display this, things become even more complicated in
> your model.
>
> This specific use case shows that because of my proposal the whole thing
> scales up, needs less space, is easier to query and to program, and it's
> lossless(!) when sync'ing back some data. And you get all this for free and
> - as I said before - without compromising the freedom of mappers values.

I'm sure organising the data in this way is optimal for certain
situations, but for others it really really sucks.
Imagine the query for finding out what objects are in a given area?
Or the query for determining which tags osm way N has?
How are you going to handle highways, and how are you going to be able
to efficiently for each highway determine if it's marked as oneway?

I can come up with database schemas that solve all of these problems
more efficiently than yours, but they may well be completely useless
for your task. So we can talk about JOINS, INDEXES and SELECTIONS for
as long as you like and still both be wrong if we're dealing with
different problems.

The thing I hope you'll go away from this e-mail with is this: there
is no technical reason right now that prevents you doing what you want
even with unrestricted keys.
Restricting the keyset gives us nothing except possibly one less
function call in your program to engage in the good practice of
escaping your values before you use them.

Dave