[Tagging] The actual use of the level tag (redux)

Minh Nguyen minh at nguyen.cincinnati.oh.us
Tue Mar 15 07:05:54 UTC 2022


Yesterday, I rewrote much of the level=* documentation on the wiki to 
clear up some ambiguous language and remove some ideas that hadn't 
caught on. [1] Normally, I wouldn't bother this list with my wiki 
editing activity. However, my revision contradicts a previous thread on 
this list: namely, it states a clear preference for level=0 to represent 
the ground floor, even in regions where the ground floor is normally 
called the first floor.

Background
==========

This list previously discussed the level=* numbering scheme back in 
2019. [2] The tag's documentation on the wiki was subsequently revised 
to account for the claims made in this thread [3], which I recently came 
across. As someone who maps in two regions that put the first floor on 
the ground (the United States and northern Vietnam), I was surprised 
that the documentation strongly suggested level=1 as the tag for the 
ground floor in these regions.

Anecdotally, if such tagging ever came to the attention of mappers on 
talk-us or OSMUS Slack, I'm pretty sure it would be quickly identified 
as an error. It's not that anyone necessarily _likes_ the zero-based 
system, but it's pretty widely recognized that this is a 
machine-readable key. Otherwise, level=-1 would be as nonsensical as 
grades=-1 on a school for prekindergarten students.

Rebuttal
========

The 2019 post argued for a regional definition of level=* and justified 
it with some statistics about actual usage in the database. I agree with 
this pragmatic approach; human factors can confound any attempt at 
standardization. However, I think the analysis was too narrowly focused 
on malls, which represent only a small portion of what gets tagged with 
level=*.

For example, strip malls and main street retail buildings are often two 
stories tall. They're more abundant and much easier to survey than 
malls, so more users have mapped them. Although these buildings are 
rarely mapped according to the full Simple Indoor Tagging scheme, people 
still tag level=* on shops in them, in part because iD offers an 
optional Level field on every non-building preset. [4]

Some time has passed since the 2019 post, so I thought it would help to 
revisit those statistics. I queried level=* tags inside shop=mall areas 
in the same U.S. metropolitan areas as in the original post. [5] Where 
necessary, I cross-referenced POIs with the malls' official store 
directories to determine the numbering scheme with more certainty.

Contrary to the original analysis, I found a clear preference for the 
zero-based scheme:

      Region      | 0+ | 1+ | ? |         Overpass query
-----------------+----+----+---+---------------------------------
Los Angeles MSA  | 10 |  7 | 2 | https://overpass-turbo.eu/s/1gSu
New York MSA     | 13 |  2 | 1 | https://overpass-turbo.eu/s/1gSs
Philadelphia MSA |  3 |  1 | 0 | https://overpass-turbo.eu/s/1gSm
San Jose MSA /   |  4 |  0 | 0 | https://overpass-turbo.eu/s/1gSi
   Silicon Valley |    |    |   |
Washington MSA   |  5 |  2 | 1 | https://overpass-turbo.eu/s/1gSj

Most of the malls following the one-based scheme were mapped over the 
past year by a single user, who confirmed with me that they were trying 
to follow the guidance on the level=* wiki page but didn't have a strong 
opinion on the matter.

How has the U.S. achieved such consistency despite real-world usage and 
the wiki? I think it's partly thanks to iD's Level field: the 
placeholder text and dropdown menu list the most common values in 
taginfo, starting with 0, 1, and -1. Many people intuitively understand 
that the presence of 0 means the system is zero-based. Even though the 
basement is sometimes called "floor 0", they wouldn't interpret level=0 
as the basement if -1 is also present.

Alternative analysis
====================

For a more holistic analysis, I consulted Geofabrik's regional taginfo 
instances to determine the most popular level=* values in 19 one-based 
countries, regardless of POI type:

* If level=0 predominates over level=1, and especially if level=-1 is 
more common than level=1, then zero-based numbering is likely the 
dominant scheme.

* If level=1 predominates over level=0, or if mnemonics like level=B or 
G are common, then one-based numbering is likely the dominant scheme.

I found that level=* is indeed one-based in four countries: Kazakhstan, 
Mongolia, North Korea, and South Korea. In these countries, level=1 is 
overwhelmingly the most common value. Of them, only South Korea has 
significant level=* usage. But in most other countries, including the 
U.S., the value distribution is much closer to the overall global 
distribution.

I included the detailed results directly in the level=* article. [1] If 
I've made any errors in this analysis, feel free to correct it and 
please accept my apologies.

There are some important caveats to this broad approach:

* Mappers often omit level=* from POIs on the ground floor, assuming 
that level=0 is the default. However, this is indistinguishable from a 
POI on an unknown level and also doesn't help us determine the floor 
numbering system in use.

* It's possible for a mall to be consistently mapped according to the 
one-based scheme, but then a mapper comes along and tags level=0, or 
vice versa.

* In localities where soft story buildings are common, many shops could 
be on the floor above the ground floor, which would be tagged level=1 
under the zero-based scheme. Similarly, some malls place a parking 
garage on the ground floor and all the shops above it.

The four countries above are so skewed towards level=1 that it can't be 
explained by any of these caveats. But these caveats could explain the 
more ambiguous numbers in countries such as Ecuador and Russia.

In South Korea, level=1 has always been more prevalent than level=0. But 
the value distribution became much more lopsided [6] after a February 
2021 import. [7] By that point, the level=* documentation had already 
been updated to prefer one-based numbering. [3]

Conclusion
==========

I think the 2019 post's main point stands: data consumers must be 
mindful about inconsistent usage of level=*. However, it simply is not 
the case that mapping communities in all the one-based countries 
intentionally differ on the key's definition. It would probably be 
feasible to treat one-based numbering as a temporary tagging error to be 
fixed, as opposed to a long-term internationalization problem to 
accommodate.

The wiki page could have had a stronger effect in promoting one-based 
usage, by creating a feedback loop, than the organic predisposition it 
was trying to describe. I hope the new revision will have a more 
harmonizing effect.

[1] 
https://wiki.openstreetmap.org/wiki/Special:Diff/2167347/2288785#Ground_floor_number
[2] 
https://lists.openstreetmap.org/pipermail/tagging/2019-January/042330.html
[3] https://wiki.openstreetmap.org/wiki/Special:Diff/1792464
[4] 
https://github.com/openstreetmap/id-tagging-schema/blob/6beccb2acf38c5a09778a23b295d42497056485b/data/fields/level.json
[5] I used the U.S. federal government's metropolitan statistical areas 
because the original post didn't define the metro areas in its analysis.
[6] 
https://lists.openstreetmap.org/pipermail/imports/2021-January/006476.html
[7] https://ohsome.org/apps/dashboard/ -- count of level=* on N/W/R in 
South Korea, grouping by level=0,1,2

-- 
minh at nguyen.cincinnati.oh.us





More information about the Tagging mailing list