[Talk-ca] What do I poutine the name tag of a road with a suffix?

Mon Dec 12 13:24:41 UTC 2022

On Dec 12, 2022, at 3:11 AM, Minh Nguyen <minh at nguyen.cincinnati.oh.us> wrote:
> As an engineer, I'm trained in the belief that there are many possibilities, but there are tradeoffs to every choice. The tradeoff to "deal with it" is that some probably won't, to the detriment of ordinary end users -- or, as I like to think of them, prospective fellow OSM contributors.

I’m also trained as an engineer, and I also believe "there are many possibilities,” though perhaps what you mean here is more clearly stated “there are many possible solutions to any given problem."  Yes, there are tradeoffs to choices when faced with solving a problem with many solutions.  Faced with a “deal with it,” the tradeoffs seem to be “deal with it” (successfully, maybe cleverly, maybe sloppily, but it gets the job done) or “don’t deal with it” (fail).  This having something to do with “probably won’t, to the detriment of ordinary end users…” completely lost me.  I read that a number of times, but I really couldn’t follow it (a "parse failure" on my part, it seems).  I am curious as to what you mean here, perhaps you might clarify.

> It's true that some amount of special-casing by geography is unavoidable. For example, I've heard that U-turns are generally prohibited in British Columbia, so routers are expected to just apply that rule instead of relying on explict turn restrictions. Inconvenient for data consumers, perhaps, but workable.

Yes, such special-casing by geography is found widely in our map data.  I can’t say what I’m about to type is an accurate reason for why all of the time, but “we have tagging schemes, reality doesn’t always fit into these, people tag reality as best they can, even as this ‘drifts’ from our tagging schemes.”  I think this is because many, likely most, perhaps even the vast, vast majority of people who edit OSM have never “formally mapped” before (I mean that with wide latitude) and there are simply going to be “I gave it my best, but it didn’t hit the mark of perfection” occurrences.  (And largely speaking, excluding vandalism, OSM is OK with that).  Moreover, there really are geographic areas which are QUITE different as to “how the rest of the world maps these sorts of things in OSM.”  Those instances might initially be special-cased, and further get improved by “re-fitting” some tagging sub-flavors (sub-name-spacing kinds of approaches, for example), they might not.  They might stay geographically special-cased for a long time.  How “workable” any of these end up being rather strongly depends on what the special-case(s) is/are, what use-case parses them, and more.  (They are “special,” after all, and it’s hard to generalize these).

> On the other hand, we no longer expect renderers to add lots of special cases to choose route shields based on ref=* on ways. Instead, route relations have more structured tags like network=CA:AB:primary, so that renderers such as OsmAnd and OSM Americana [1] can choose the right shield with a minimum of guesswork. OSM tried guesswork before and it didn't work well.

That’s an excellent example of how, over time, we 1) identified a problem, 2) identified a solution, 3) deployed the solution, and a bit unclearly as to where the fuzzy boundary of doing so became “firm,” we 4) began to act as though the “old” method of doing things is not or should not be “supported” (as correct data) any longer.  While slow, this reduces guesswork so guessing is no longer necessary, by virtue of being such a small set of “residual” problems, they can be ignored, and eventually the community takes a “well, there’s a better way to do this now, fix your (area’s, for example) data or remain mired in the past doing things in an old, deprecated method which doesn’t work with most or all things which expect such data to be another way.”  I think if we keep the need to perform this dance to a minimum, while realizing there really are times and places to do so, we’re hitting a sweet spot.

I don’t know what to call this process, but we should identify that it works, and it is at least one method to improve our map data.  But it is slow, and requires that people “re-learn,” which is seldom easy and never frictionless.

> If the messiness and lack of structure in OSM name=* tags globally somehow leads to a reliable, open-source, AI-powered TTS engine, then who could complain? But that possibility doesn't strictly need to prevent us from considering alternative approaches that provide more context when necessary, such as (speaking of Salt Lake City) the name:full=* key. [2]

SLC’s name:full=* key is yet another solution seemingly coined to solve a specific problem.  And while there is nothing strictly wrong with such special-casing cleverness, it is better in the long-run to harmonize such approaches so that many different flavors need not be invented and deployed when one (ideally) or a very small number (a few, say three or four at most) could suffice.  This takes time (gobs of it).  And yes, who could complain?  But that’s not the usual question when faced with “a human must do some rather clever mental parsing, (when confronted with, say ‘a sign’s text to pronounce’)”.  Rather, it's "how do we realistically get a machine to do this?”  The usual question isn’t about complaining, as complaining is easy with mess and lack of structure.  Engineers like you (us, others…) can toil and special-case into the next century (“dealing with it” in certain ways, using special-casing and cleverness alike), or OSM can spend effort “on the other side of the stack” and attempt to regularize / harmonize our data into “less mess” and “more structure.”  The latter requires wide education and sometimes re-training, but the efforts can be worth it.

That’s a longer-term goal, certainly (compared to “bang out this parser by the end of next quarter”) but well-written dictionaries and grammars go a long way towards people speaking the same natural language much, much better understanding each other.  Yes, pronunciation shifts happen (rather commonly) words are coined and new turns of phrases happen (commonly) and even shifts in grammar take place (less commonly, but they happen as natural languages evolve), but in widely spoken languages, “there are rules.”  If OSM wants to encourage growth of the sort that is directly opposite “a Tower of Babel,” we should strive to regularize and harmonize “the messiness and lack of structure” not only in our name=* tags, but any and all tags.  That’s ambitious, but we are still a young project with much room to grow into these endeavors.  And I say it a lot, but we’re doing OK, maybe even “fine” here, while we can always improve.  The thing is, these are much longer-term solutions that take significant time (years).  I have years to invest in this project, and I believe many other do, too.  I do so expressing a preference for “regularity in data,” rather than “cleverness in parsers.”  (An oversimplified way to say it, but you get my gist).

Albertans in OSM, this isn’t going to be solved in one day (or one parser), it is a “roll up our sleeves, we’re going to have to figure this out and do something about it in the wider OSM context.”  There are well-known strategies to do this, including things like identifying specific problems and refactoring.  But it isn’t code refactoring, it’s data refactoring:  similar strategies apply.  I think this is doable, given many other somewhat-similar data improvement thrusts I’ve seen in our project.

Hey, it’s a “talk” page, so I talk.  I know it seems like a small question to ask how to suffix a road name, but when you have what might be characterized as a mess, you don’t want to sweep it under the rug, you want to fix it.  I hope to offer some perspective to do so and I thank you for reading.