[OSM-talk] Good practice, and should we rely on defaults?
stevea
steveaOSM at softworkers.com
Sun Apr 10 09:28:03 UTC 2022
On Apr 9, 2022, at 7:28 PM, Minh Nguyen <minh at nguyen.cincinnati.oh.us> wrote:
> Vào lúc 03:12 2022-04-08, Richard Fairhurst đã viết:
>> A machine-readable GeoJSON file with a few defaults per polygon (LHD vs RHD, bikes allowed on trunk, etc.) would not necessarily be a bad thing. If it was kept relatively simple then it might get widespread adoption. But anything as stupidly complex as the opening hours syntax, say, is not going to fly.
>
> In case anyone is curious, some editors and data consumers do maintain GeoJSON files of defaults, for example:
>
> https://github.com/Project-OSRM/osrm-backend/tree/41dda32546399f1dc12af1de41668993de44c7dc/data/
> https://github.com/ideditor/country-coder/blob/main/src/data/borders.json
> https://github.com/openstreetmap/iD/blob/7b0127f5e7b98214782574698c5fc8c970ed8caa/data/territory_languages.json
>
> For every one of these files in open-source repos, you can imagine there are a dozen in closed-source projects, often compiled by quickly consulting Wikipedia. One hopes they're at least reasonably consistent with each other.
The futility of "chasing defaults" reminds me of how some languages have really complex phonemes/morphology (sounds and words) and some languages have really complex grammar/syntax (word order, verb conjugation, tense/mood, sometimes noun case...), but seldom BOTH. Roughly speaking, languages evolve to be as complex as they need to be, regardless of whether that's an ending on a word or a change in sentence structure, with the result that "what needs to be said can be said." In other words, "the work needs to happen somewhere."
In OSM, "files of defaults" vs. the real world and what's in it (its data), and how closely you want to be able to "say what you want to say" (express in, say a rendering or routing) act similarly. That is, you might "press around" the parsing-sense into a defaults file (perhaps as a good first step in a heuristic, even knowing it might be obsolete) and you use "other data" to "finish" (the parsing that yields the best rendering / routing / end-use case). Those other data might be real-world knowledge that outweighs the defaults, it might be more, it might be different. But there might also be something "missed" along the way as you do that, especially because "tables of defaults" go stale in some cases. Not all cases, but enough that some judgement must be made about how to best balance the effort involved (in finding errors, tuning tables) vs. the benefit gained. In my experience, such heuristics often lead to sensible strategies to solve problems, as the effort to tune them to be better gets smaller and smaller (since the majority of work has already been expended) as the value gained keeps growing. However, a poor initial design can allow too much fiddling to collapse the whole scheme.
> And then there's the wiki's attempts at achieving machine readability. I'm pretty sure the most recent change to the default speed limit table, an entry for the City of Minneapolis [1], bucks osm-default-speeds's expected syntax. But what is to be done about individual municipalities setting their own default speed limits, default U-turn restrictions [2], or default street parking restrictions [3]?
All three of those seem like they'll need their respective schemes to fit into wider schemes of "defaults" (and "defaults processing"). Actually, I think for any given case, this is technically do-able. I'd guess that in most or even all cases the effort is worth it, but I could be proven wrong for some difficult corner cases.
In short: shortcuts (like "use the defaults table") often work, especially when they are well-documented, understandable/understood and adhered to. Until they don't, that is, but then they can be tuned, sometimes with only small additional effort. Do a quickie cost-benefit analysis early, design well, not sloppily, document (and show) your work and let a thousand flowers bloom. Planting seeds like this works in OSM, or at least it can work.
More information about the talk
mailing list