[Tagging] Removing name_1 and alt_name_1 from Wiki

moltonel 3x Combo moltonel at gmail.com
Sat Jan 23 22:05:04 UTC 2016


Taped "send" to early, here's the rest of my email:

On 23 January 2016 15:14:22 GMT+00:00, "Lauri Kytömaa" <lkytomaa at gmail.com>
 wrote:
>I believe this is a good point to make, the origin for many of those
>tags.
>While the number of uses is reason to keep them as-is, if a major slice
>of them comes from an import, the ratio isn't a good reason to
>*recommend*
>entering more of them.

 It's a pity that the us taginfo site is defunct; it would have given an
 interesting approximation of how many name_1 come from tiger. But I'm tired
 of this "most name_1 tags are from an import, they should be ignored"
 argument :
 * coming from an import doesn't make name_1 wrong. It's a valid (IMHO
superior) way of expressing multiple values.
 * the most you can claim is that this import sometimes incorrectly
assigned multiple values when there should be only one.
 * there are plenty of uses of name_1 outside of tiger. While the
ratio is unknown, a glance at the taginfo map shows that it isn't
negligible.
 * while popularity of a tag can on its own be enough to justify
documenting the tag, it's never a good enough reason on it own to
justify recomending it. Accordingly, the calls to recomend name_1
usage are not just based on the tag's popularity.

>Browsing through this thread I didn't notice one point, the fact that
>with
>alt_name=a;b;.. all the names are/should be in the semicolon separated
>list, i.e. even without any processing that separates the parts/names
>into
>distinct records, searching would indicate that the searched-for name
>is
>within the list of alternative names (in most cases/some countries, not
>doing some sort of wildcard matching gives a bad user experience, esp.
>if the local word or abbreviation for "street" is always at the
>beginning).

That's a good corner-case example where a multivalue-unaware consumer
still gets some benefit of the multivalue if it is encoded using
semicolons. Of course it goes haywire again when trying to display the
value, and could cause other subtle issues, with stemming for example.

>With name_1 and name_2 and name_9 you'd never know how many tags
>you have to look for when indexing the db dump and changes.

I don't get that, or rather I don't get how it's different from never
knowing how many values you're going to get in the semicolon case.
Maybe you're thinking of an implementation that'd look for "name_1",
"name_2", etc explicitly for each variation ? No programmer in his/her
right mind would do that, (s)he'll regexp-match for "name_[0-9]+"
instead or (like for example Nominatim does) just match the beginning
of the string agains "name_".

>Also, with name_[n] the original mapper and the next mappers have to
>order the names with reasoning or just how they like them (subjective),
>whereas with name=The Name + alt_name=other names the alternative
>names are then equal with each other (a collection of alternative
>names).

Firstly, there's nothing that says "the order in which the values are
entered are their order of importance in real life". Wether the order
of the values matters or not is something that should be discussed on
a per-tag basis. The only tag that I know where this matters is
lanes=*, but it has its own complicated and well-defined
special-purpose syntax.

Secondly, the ordered_or_not situation is exactly the same with the
semicolon scheme as with the suffix scheme, neither can claim
superiority here.

>What should be in the plain name tag is easier to agree on (especially
>if
>the operator behind the named entity can be asked), than it would be to
>agree on the sorting of the other known names.

Again, there's no difference between the two schemes regarding the
tagging of the "default" value. It goes, on its own, in the "name" tag
and that's it. How you encode the default value does not influence how
you choose the default value. And yes, we should really discourage the
omission of a default value, whether it's by ommiting the plain "name"
tag or by putting semicolon-separated values in it instead of in
alt_name.



More information about the Tagging mailing list