[Tagging] Reviving the conditions debate

Thu Jun 14 23:24:18 BST 2012

Martin, if it walks like a duck and quacks like a duck, then it had 
better be a duck... What I mean with this, is if the grammar is so 
English-like such that people are tempted to use constructions which are 
not (or not quite) supported by the grammar, or if the way it works is 
contrary to how the English language would interpret it, then "errors 
will occur". Plus, of course, that the majority of users will not have 
English as their first language, and we have to make this accessible to 
the man-in-the-street and not allow it to become so obfuscated that only 
PhD's need apply.

Bottom line: I question whether making it kind of pseudo-English-like is 
the right goal to aim at. A simple grammar which is (mis)understood 
equally over the whole world might be better. Your post below is full of 
examples supporting my point. The grammar should be derived from what 
you are trying to model, just as a (descriptive) grammar for English is 
reverse-engineered from the way the language is used. If you start with 
the premise that the answer must be expressed in ANTLR and shouldn't 
include brackets, that's putting the cart before the horse. Please feel 
free to carry on with your experimentation to see if you can make a 
grammar on this basis, but remember that humans have to read and write 
this stuff (which does not detract from my earlier assertion that 
machine-readability is a slightly higher priority than 
human-readability) and they often need clear boundaries to make the most 
of their creativity. If you put a child in the middle of an "infinitely 
large" field with no boundaries obvious to the child, they won't move 
far from where you put them. If you put the same child in a large fenced 
enclosure, they will explore every inch. Give a child a paintbrush in 
front of a huge blank wall and you will get a small picture. Mark out a 
"frame" on the wall and say "paint in this" and it will all get used. 
Give a mapper no limits on tagging, and many things will not get tagged 
(because of inner doubts about how to do it). Give the same mapper a 
menu of 100 tags which can be used, and he will use many more of them.

 > Human language is sadly not very precise: "except taxis AND bicycles" 
does not mean, you must be in a vehicle that is both (it means if not 
taxis AND if not bicycles),

The human language here is extremely precise to any fluent 
English-speaker. It means what it says. It's the IT-based interpretation 
of the word AND which leads to the grammar misinterpreting the 
intention. Think of the expression:

     a * b + c * d

To the "untrained eye" this may appear ambiguous or be interpreted 
differently to how a compiler will interpret it. Nonetheless it's valid 
code and no compiler will complain. However style-wise there is a school 
of thought that such constructions are unsafe because a "bug" caused by 
precedence problems is difficult to find by a quick inspection. My 
mandating the use of parentheses to make the programmer's intentions 
clear to a code-reviewer helps to detect bugs early, and has the 
desirable side-effect of making the programmer think just a little bit 
harder about how it's going to work out. Prevention is better than cure: 
anything making it less likely that "coding errors" make it into the 
database is most definitely a Good Thing To Have. Grammars which allow 
"just about everything" are a pain because they frustrate this error 
checking and delay error detection considerably, often relying on a user 
to report an anomaly, triggering all kinds of incident management and 
problem management processes and costing thousands of times more to fix 
than if the input validation had stopped the error occurring in the 
first place.

"If I were king" I would be looking for a system that:
* makes common cases easy
* makes complex cases possible
* makes each rule as standalone as possible (one sign -> one rule)
* does not rely on the user's fluency in English grammar (knowledge of a 
set of specific words, e.g. tags and functions, is fine)
* uses grammatical constructions which are familiar to most people, or 
easily learnt
* has a grammar allowing for a user-extensible function repertoire
* allowing user-defined functions to be stored in an external file 
(accessible at entry and run time)
* fits comfortably with the key-value-pair paradigm of OSM
* can produce a result in a variety of data types including at least 
boolean, number, string
* can use the value of other tags as variables
* can use other variables supplied at run time (e.g. weather, time, 
vehicle type)
* supports the usual comparision and numeric operations
* supports string concatenation operation

Colin

On 14/06/2012 22:19, martinq wrote:
> Hi,
>
> sadly this discussion was restarted before I could establish a 
> reference implementation for a less technical way of tagging 
> conditional values (for those who are interested: it is a ANTLR 
> grammar, hopefully with built-in evaluation code). The reference 
> implementation is for me a key for acceptance, because the less 
> technical to tag, to more difficult to parse. And we all agree that it 
> should be possible to parse it - but not necessarily easy.
>
> Objective of my proposal: As less rules as possible - as close as 
> possible to the sign-posted information.
>
> The proposal page does not contain a lot of information, because I 
> adapt the "grammar" based on what is feasible. Sadly cannot spend a 
> lot of time in continuing with the reference my implementation at the 
> moment.
>
>
> I will comment based on what I have already figured out:
>
> > - "Or" operators. "Maxspeed is 80 if it's wet or Sunday" can be
> > rephrased as "Maxspeed is 80 if it's wet. Maxspeed is 80 if it's
> > Sunday." IOW, these can be modelled by using two tags instead of one.
>
> This is in fact the biggest challenge in the current state of my 
> parser (in combination with fuzzy & context related precedence). Human 
> language is sadly not very precise:
> "except taxis AND bicycles" does not mean, you must be in a vehicle 
> that is both (it means if not taxis AND if not bicycles), "except HGV 
> AND weight>7.5t" is by most humans interpreted differently (if not 
> (HGV AND weight>7.5t). There are lot of other examples for 
> amgiguities, eg. "except Fr, 10:00-18:00" does not mean complete 
> Friday and the other days, etc.
>
> However, in this case the preliminary parser has no problem to 
> understand following different expressions:
>
> maxspeed:cond = 80 if wet or Sunday
>
> Easy tagging, isn't it?
>
> But the grammar is flexible. Instead of 'if' I also support 'when' and 
> '[' ']' (I am not sure about the brackets yet - they are clearly not 
> as intuitive as 'if' or 'when').
> maxspeed:cond = 80 [wet]
> maxspeed:cond2 = 80 [Su]
>
> Also understood by the parser:
> maxspeed:cond = 80 when weekday is Sunday and condition is wet
> or
> maxspeed:cond = 80 weekday=Su or condition=wet
> or
> maxspeed:cond = 80 [wet]; 80 on Sundays
>
> and many other variants. It is almost impossible to tag it wrong.
>
>
> > - Brackets for expressions. If we limit ourselves to just "and"
> > operators, evaluation order doesn't matter.
>
>
> This is something I really want to avoid. Brackets for precedence 
> purposes are a purely technical artefact and I not seen them on the 
> signs with the information we want to tag...
>
> However, the *precedence* is the major problem in the current parser.
> Thus I don't think I can write a parser without any rule helping the 
> parser and restricting the mappers. But brackets will just be 
> introduced if I have no other option.
>
>
> >> Pseudo-Javascript: (!is_motor_vehicle(vehicle_type)) ||
> >> ((vehicle_type='hgv')&&  (time<'10:00' || time>'20:00')&&
> >> intention='loading')
>
> == side note ==
> A assume, this is access related.
> Side-note: The current access tags are IMO just abbreviations, 
> nowadays we would write instead of
>
> hgv=yes --> access:hgv=yes.
>
> With the conditional value proposal it could be tagged as
>
> access=yes if hgv
>
> maxweight=x is an example for (vehicle) access = no if weight>x, even 
> though it can also be a non-conditional property of an object (e.g. a 
> bridge may have a intrinsic maximum acceptable weight, but we don't 
> have to go into details now).
> == side note end ==
>
>
> I interpreted your code as your example as
>
> access=yes
> access:cond1 = no if motorized
>   or "no if motor vehicle" or "no if vehicle is motorized"
> access:cond2 = delivery if hgv from 10:00-20:00
>   or "delivery if vehicle is a hgv and time is 10:00-20:00
>   or many variants more...
>
> If the evuation part of my parser works (yet I still working on the 
> grammar), I may also be able to create a kind of "normalized" 
> JavaScript expression out of the "fuzzy" human tags [but I don't have 
> implemented a normalized attributed tree yet].
>
>
> > - defining a syntax for time intervals (opening_hours)
>
> By using on/off, this is already the first tag which moved the 
> condition into the value. By using off/on, it reads like
>
> off if ...
> on if ...
>
> However, the author struggled with the same basic problem, e.g. there 
> is a (non-intuitive) difference between using ',' and ';' now.
>
> Also, except for a basic time restrictions it is impossible to tag and 
> also difficult to read these expressions. Clearly powerful, but too 
> compressed for casual mappers. Can you read this?
>
> Dec 25-Su-28 days - Mar Su[-1]: 22:00-23:00
>
> In this case I would stick to human readable expressions like "last 
> Sunday in March" and put the load to evaluate it onto the 
> parser/application.
>
>
> > - defining a subset hierarchy of vehicle categories (such as
> > "motor_vehicle" including "hgv" as a subset)
>
> Applications must know which vehicle you drive to evaluate certain 
> conditional values (mainly access, speed limits and also parking 
> conditions). Unlike the current access tags we don't need an hierarchy.
>
> motorized is an attribute of the vehicle, the application must know 
> about it. Not every taxi is motorized [rickshaws or bicycle taxis], 
> the attributes for the circumstances of use should be interdependently 
> determined by the application. The applicaton may use a tree 
> internally, but I don't think its the mapper's job.
>
> For a standard taxi a application may work with following attributes:
> taxi = yes
> weight = 2t
> motorized = yes
> width = 2m
> length = 5m
> height = ...
> permits = A38, A39
> public service = yes
> [however, if the taxi is empty, this attribute may be no]
> passengers = 2
> sex = male
> age = 55
> engine power = 22PS
> vehicle maxspeed = 180
> colour = black
> etc.
> [I hope the concept is clear]
>
> Independent from that does the application also require knowledge 
> about date & time for certain conditions - and the application may 
> need to know the weather, holidays, purpose of travel (or intention), 
> destination of travel (especially in my area several access 
> restrictions depend on the area you driving to).
>
> The mappers no longer have to worry about interdependencies between 
> such attributes, e.g is a rickshaw a motor vehicle or a bicycle or 
> just a pedestrian with a trailer? (It can be all).
>
> Now, whenever some properties are not known, the application simply 
> cannot evaluate all conditional value. Either the application can 
> accept this and continues with the safety value (e.g. no for access, 
> lowest speed for maxspeed, etc.) or may have to ask the user (e.g. 
> user can select if the bicycle map or the motor vehicle map should be 
> shown).
>
> > So how do the existing proposals fit in here? Well, the primary
> > difference between the example above and "extended conditions" is that
> > the latter aims for for short, manually editable strings by letting
> > literals for vehicle classes, weather conditions etc., as well as time
> > intervals, stand for themselves - based on the assumption that a parser
> > will be able to unambiguously identify them.
>
> 1) I dislike proposals which try to solve only the situation for access
> access is a tag like any other - and we shouldn't re-invent the wheel:
> access:cond = yes if time is 10:00-18:00  (or simply yes [10:00-18:00])
> maxspeed:cond = 80 if time is 10:00-18:00
> parking:lane:left:cond = parallel time is 10:00-18:00
> open:cond = yes time is 10:00-18:00
>
> or - simply to emphasize the concept -
> access:cond = yes if female
> maxspeed:cond = 80 if female
> parking:lane:left:cond = parallel if female
> open:cond = yes if female
>
> The extended access proposal can be used for any tags, thus no issue 
> here:
> access:hgv:(20:00-10:00) = ...
> maxspeed:hgv:(20:00-10:00) = ...
> parking:lane:left:hgv:(20:00-10:00) = ...
> ...
>
>
> 2) Value vs. key
> I think value side conditions would be more intuitive, because the 
> value depends on the condition.
>
> Also, it easier to filter the things in the database, especially if 
> left/right & forward/backward is also mixed into the conditions (or 
> should we simple go the next step and see them as condition too?).
>
> Disadvantage is that values can contain any characters. This makes it 
> hard to identify the start of the condition in a parser.
>
> 3) The extended access proposal:
>
> > motor_vehicle = no
> > hgv:(20:00-10:00):loading = yes
>
> Normal form is nice to parse - but do you think everybody can map it?
> Non-normal form is not so nice for machines - thus I cannot promise 
> that I achieve to parse it - and the discussion is theoretical until I 
> can prove it (with reference implementation).
>
> I also see no reason why an application may not be able to treat this 
> as equivalent:
> hgv = yes    (shortcut for:)
> access:hgv = yes   (which is a valid expression also in the proposed 
> extension)
> access:cond = yes [hgv]
>
> Then backward compatibility, extended condition tags or value side 
> conditions could be used. If applications need a parser anyway...
>
> martinq
>
> _______________________________________________
> Tagging mailing list
> Tagging at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/tagging