[josm-dev] Checking tags

Jochen Topf jochen at remote.org
Mon Mar 23 09:39:19 UTC 2015


On Mon, Mar 16, 2015 at 07:22:50PM +0100, Jochen Topf wrote:
> Let me give this a try and we can then look at the created list and see how we
> like it.

Okay, I have generated three lists: "good", "bad", and "unknown". The "good"
list contains those keys we can accept, the "bad" list contains keys that are
really problematic, with strange characters and so on. The "unknown" list is
the large rest. The "good" list only contains about 8000 keys from over 50,000.
But those keys together are used over a billion times while the "bad" and
"unknown" keys are only used about 500,000 times. So while the key list is
short, the overwhelming majority of keys will be flagged as "good".

You can download the key lists from here:
http://tmp.jochentopf.com/bdeda93e7dcc24f27e3822d4c95400b6/

You can see the code for the classification here:
https://github.com/joto/taginfo/blob/master/sources/db/post_grades.sql

Of course this is just a test and all open to debate. If we actually want
to use this, I can create those lists every night with the rest of taginfo.

Here are my ideas how to show this to the user: Everywhere there is a list of
tags and also in the tag edit dialog we can either colorcode the background of
the fields (gree, yellow/gray/neutral, red) or show a colored border or we can
show a warning icon next to the field. Just using the color is rather
unintrusive and doesn't take valuable screen real estate. On the other hand,
the user could hover or click on the icon to get additional info about whats
going on.

I could imagine marking everything from the "good" list in green. The "bad"
list is not so useful to mark bad keys, I suggest creating a list of
problematic characters and show everything containing those characters as
bad. Here is a good starting point for this list:
http://taginfo.openstreetmap.org/reports/characters_in_keys#problem

Of course we can extend this bad list with known bad keys (very common
misspellings) but also with some rules such as "keys with 2 characters or
less are bad" or "keys with several characters like :, _, - in a row or
at start or end of key are bad".

I do realize that this can be only a beginning and those rules and key lists
have to be tweaked in the future. I think even my "good" list contains many
keys that are not good at all and should eventually be cleaned up. But at
the moment we can't present the users with very often used keys and declare
them "bad" without overwhelming them which would just train them to ignore
the good/bad classification in the editor.

Jochen
-- 
Jochen Topf  jochen at remote.org  http://www.jochentopf.com/  +49-173-7019282



More information about the josm-dev mailing list