[Tagging] The actual use of the level tag (redux)
Minh Nguyen
minh at nguyen.cincinnati.oh.us
Tue Mar 15 07:05:54 UTC 2022
Yesterday, I rewrote much of the level=* documentation on the wiki to
clear up some ambiguous language and remove some ideas that hadn't
caught on. [1] Normally, I wouldn't bother this list with my wiki
editing activity. However, my revision contradicts a previous thread on
this list: namely, it states a clear preference for level=0 to represent
the ground floor, even in regions where the ground floor is normally
called the first floor.
Background
==========
This list previously discussed the level=* numbering scheme back in
2019. [2] The tag's documentation on the wiki was subsequently revised
to account for the claims made in this thread [3], which I recently came
across. As someone who maps in two regions that put the first floor on
the ground (the United States and northern Vietnam), I was surprised
that the documentation strongly suggested level=1 as the tag for the
ground floor in these regions.
Anecdotally, if such tagging ever came to the attention of mappers on
talk-us or OSMUS Slack, I'm pretty sure it would be quickly identified
as an error. It's not that anyone necessarily _likes_ the zero-based
system, but it's pretty widely recognized that this is a
machine-readable key. Otherwise, level=-1 would be as nonsensical as
grades=-1 on a school for prekindergarten students.
Rebuttal
========
The 2019 post argued for a regional definition of level=* and justified
it with some statistics about actual usage in the database. I agree with
this pragmatic approach; human factors can confound any attempt at
standardization. However, I think the analysis was too narrowly focused
on malls, which represent only a small portion of what gets tagged with
level=*.
For example, strip malls and main street retail buildings are often two
stories tall. They're more abundant and much easier to survey than
malls, so more users have mapped them. Although these buildings are
rarely mapped according to the full Simple Indoor Tagging scheme, people
still tag level=* on shops in them, in part because iD offers an
optional Level field on every non-building preset. [4]
Some time has passed since the 2019 post, so I thought it would help to
revisit those statistics. I queried level=* tags inside shop=mall areas
in the same U.S. metropolitan areas as in the original post. [5] Where
necessary, I cross-referenced POIs with the malls' official store
directories to determine the numbering scheme with more certainty.
Contrary to the original analysis, I found a clear preference for the
zero-based scheme:
Region | 0+ | 1+ | ? | Overpass query
-----------------+----+----+---+---------------------------------
Los Angeles MSA | 10 | 7 | 2 | https://overpass-turbo.eu/s/1gSu
New York MSA | 13 | 2 | 1 | https://overpass-turbo.eu/s/1gSs
Philadelphia MSA | 3 | 1 | 0 | https://overpass-turbo.eu/s/1gSm
San Jose MSA / | 4 | 0 | 0 | https://overpass-turbo.eu/s/1gSi
Silicon Valley | | | |
Washington MSA | 5 | 2 | 1 | https://overpass-turbo.eu/s/1gSj
Most of the malls following the one-based scheme were mapped over the
past year by a single user, who confirmed with me that they were trying
to follow the guidance on the level=* wiki page but didn't have a strong
opinion on the matter.
How has the U.S. achieved such consistency despite real-world usage and
the wiki? I think it's partly thanks to iD's Level field: the
placeholder text and dropdown menu list the most common values in
taginfo, starting with 0, 1, and -1. Many people intuitively understand
that the presence of 0 means the system is zero-based. Even though the
basement is sometimes called "floor 0", they wouldn't interpret level=0
as the basement if -1 is also present.
Alternative analysis
====================
For a more holistic analysis, I consulted Geofabrik's regional taginfo
instances to determine the most popular level=* values in 19 one-based
countries, regardless of POI type:
* If level=0 predominates over level=1, and especially if level=-1 is
more common than level=1, then zero-based numbering is likely the
dominant scheme.
* If level=1 predominates over level=0, or if mnemonics like level=B or
G are common, then one-based numbering is likely the dominant scheme.
I found that level=* is indeed one-based in four countries: Kazakhstan,
Mongolia, North Korea, and South Korea. In these countries, level=1 is
overwhelmingly the most common value. Of them, only South Korea has
significant level=* usage. But in most other countries, including the
U.S., the value distribution is much closer to the overall global
distribution.
I included the detailed results directly in the level=* article. [1] If
I've made any errors in this analysis, feel free to correct it and
please accept my apologies.
There are some important caveats to this broad approach:
* Mappers often omit level=* from POIs on the ground floor, assuming
that level=0 is the default. However, this is indistinguishable from a
POI on an unknown level and also doesn't help us determine the floor
numbering system in use.
* It's possible for a mall to be consistently mapped according to the
one-based scheme, but then a mapper comes along and tags level=0, or
vice versa.
* In localities where soft story buildings are common, many shops could
be on the floor above the ground floor, which would be tagged level=1
under the zero-based scheme. Similarly, some malls place a parking
garage on the ground floor and all the shops above it.
The four countries above are so skewed towards level=1 that it can't be
explained by any of these caveats. But these caveats could explain the
more ambiguous numbers in countries such as Ecuador and Russia.
In South Korea, level=1 has always been more prevalent than level=0. But
the value distribution became much more lopsided [6] after a February
2021 import. [7] By that point, the level=* documentation had already
been updated to prefer one-based numbering. [3]
Conclusion
==========
I think the 2019 post's main point stands: data consumers must be
mindful about inconsistent usage of level=*. However, it simply is not
the case that mapping communities in all the one-based countries
intentionally differ on the key's definition. It would probably be
feasible to treat one-based numbering as a temporary tagging error to be
fixed, as opposed to a long-term internationalization problem to
accommodate.
The wiki page could have had a stronger effect in promoting one-based
usage, by creating a feedback loop, than the organic predisposition it
was trying to describe. I hope the new revision will have a more
harmonizing effect.
[1]
https://wiki.openstreetmap.org/wiki/Special:Diff/2167347/2288785#Ground_floor_number
[2]
https://lists.openstreetmap.org/pipermail/tagging/2019-January/042330.html
[3] https://wiki.openstreetmap.org/wiki/Special:Diff/1792464
[4]
https://github.com/openstreetmap/id-tagging-schema/blob/6beccb2acf38c5a09778a23b295d42497056485b/data/fields/level.json
[5] I used the U.S. federal government's metropolitan statistical areas
because the original post didn't define the metro areas in its analysis.
[6]
https://lists.openstreetmap.org/pipermail/imports/2021-January/006476.html
[7] https://ohsome.org/apps/dashboard/ -- count of level=* on N/W/R in
South Korea, grouping by level=0,1,2
--
minh at nguyen.cincinnati.oh.us
More information about the Tagging
mailing list