Parsing challenge: Old logician's dot notation
Asked Answered
P

3

8

In J. Barkley Rosser's "Logic for Mathematicians" he uses a notation to avoid too many parentheses. Although I don't know when logicians start using this notation, but I know that book first published in 1957, and J. G. P. Nicod's paper which was published in 1916 is also using this notation. So obviously it has a pretty long history, although nowadays this is not favoured by modern logicians.

In the programming world, in LISP alike programming languages there is a big challenge for programmers to keep track of right (huge!) amount of parentheses. Haskell provides an operator $ that provide part of the functionalities, but since you cannot say 2 * $ 3 + 4 it is not as powerful as the dots (see examples below). The C language serials use a conventional operation precedence, but in some cases deep nested parentheses are still required. So I wonder why no actual languages uses this strategy? I tried, but I found not even able to write a grammar for it!

Let me show some example of a toy calculator language with only two operators + and *, and all terms are integers.

With this notation, a translater shall pass the following test cases:

1 + 3 .* 2
= (1 + 3) * 2
1 *. 3 + 2
= 1 * (3 + 2)
1 *. 2 +. 2
= (1 * 2) + 2
2 *: 2 + 3 .* 4
= 2 * ((2 + 3) * 4)

I cann't explain all detail of this notation, it costs almost 5 pages in Rosser's book. But in genaral (and short), a dot . before or after an operator represents a "separator", to push the two sides away. A colon : is a stronger separator, three dots .: or :. is even stronger, but less then ::, and so on.

I wonder how can we write a grammar for the above language and then parse it? Also although this notation has been obsoleted I found it appears very clear to the programmer's eye. So what is its pros and cons for it?

Pettifogging answered 29/8, 2013 at 12:7 Comment(2)
You can say (2 *) $ 3 + 4 in Haskell -- but indeed, that introduces parens as part of the function slice.Spatola
@larsmans I know, that is exactaly what I said "not as powerful as the dots".Pettifogging
H
11

Dot notation was most famously used by Russell and Whitehead in Principia Mathematica (1910-1913) but the notation was borrowed from Guiseppe Peano. It was also used by Alonzo Church, Willard Van Orman Quine, and other influential logicians (see this entry in the Stanford Encylopedia of Philosophy).

With a bit of practice, it's not difficult to read formulae in dot-notation, but it is not quite as elegant as it first appears. To start with, Russell and Whitehead still found it useful to use parentheses with the negation operator ~:

*3·01. p.q .=. ~(~p v ~q)

As the above example show, the dot is used both as the conjunction operator and to express precedence. Consequently, a stronger conjunction might be written as : or even :..

Finally, in order to reduce visual clutter (I suppose), Russell and Whitehead also use precedence relations, in which the set of operators is divided into three precedence groups, such that the dots for an operator of higher precedence dominate an equal number of dots for an operator of lower precedence. Between operators of equal precedence, it's not legal for the number of dots to be equal, but Russell and Whitehead also defined some ternary operators such as p v q v r in order to be able to avoid having to specify unimportant precedence. (The precise parsing rules for such expressions were never spelled out formally, as far as I know, but the definitions appear in PM.)

Having said all that, it is not terribly difficult to parse dot-notation using a variant of the shunting-yard algorithm. Unfortunately, it cannot be parsed with a context free grammar, which makes it less useful for grammars generated with automated tools. Even GLR parsers are restricted to CFGs. (The fact that dot-notation is not context-free can be proven with the pumping lemma; it makes a rather more interesting exercise than the usual pumping lemma applications, at least imho.)

If you allow the CFG to have an infinite number of (dot) symbols and a corresponding infinite number of rules, then it is quite straightforward to write a grammar (or, more precisely, a grammar template since most of the rules are parameterized by dot-count). Thus, in theory you could parse a dot-expression by first scanning to find the maximum number of dots used, and then generating the corresponding finite CFG from the template. (The resulting is LALR(1) as long as you define a separate terminal for every sequence of dots.) In practice, this would be done by using predicates in the CFG, so you could create a parser with a parser-generator which allows predicates (ANTLR does, for example, but I'd personally use a bottom-up generator to avoid fiddling around with left-recursion elimination.)

It's important to note that dot-notation has its own variant of "redundant parentheses", since you could, at least in theory, use more dots than necessary. When I was playing around with the (theoretical but not really useful) algorithm above -- automatically generating CFGs -- I found it easier to require dot-minimality, because otherwise you end up creating a lot more unit rules. I wasn't able to find a machine-readable copy of PM to test with, but in all the searching I did, I never found an expression which was not dot-minimal. I don't know whether that was the result of compulsive tidiness or the idea that only dot-minimal expressions were legal.

Hesione answered 29/8, 2013 at 15:22 Comment(4)
Thanks for you brilliant answer. I want to use dot notation because it frees me from counting the right number of parentheses to finish an expression, and it makes things more readable. Some programming styles suggest when writing deep nested expressions to use spacing for grouping things visually, that is equivalent to the use of dots.Pettifogging
@EarthEngine: yes, the spacing conventions are similar to dot-notation, but of course they are less precise. On the other hand, the notation is complicated by the three (or possibly four, depending on which reference you read) levels of operator precedence, the possibility of defining multi-operator associative sequences, and the uncertain definition of what constitutes a dotting error. Having said all that, if you want to build a parser, I'd go with the shunting-yard algorithm. Hint: the precedence is basically the inverse of the number of dots (or reverse the comparison order).Hesione
So you mean the top-down operator precedence (I think it is based on shunting-yard) will work? I'll try that.Pettifogging
@EarthEngine: It's really bottom-up, not top-down. Top-down operator precedence parsing is what is usually called "Pratt parsing"; Dijkstra's shunting-yard algorithm is, as Wikipedia says, "a simple shift-reduce parser that is capable of parsing a subset of LR(1) grammars." (en.wikipedia.org/wiki/Operator-precedence_parser). But yes, that's what I meant.Hesione
L
3

Here's an interesting application. Perl6 allows you to extend the language, adding new operators and defining their precedence relative to existing operators. The follow code sample defines the operators*. .* etc. The tests below pass.

use v6;
use Test;

sub infix:<*.>  ($a, $b) is looser(&infix:<+>)  { $a * $b }
sub infix:<.*>  ($a, $b) is looser(&infix:<+>)  { $a * $b }
sub infix:<*:>  ($a, $b) is looser(&infix:<*.>) { $a * $b }
sub infix:<:*>  ($a, $b) is looser(&infix:<.*>) { $a * $b }

sub infix:<+.>  ($a, $b) is looser(&infix:<*.>) { $a + $b }
sub infix:<.+>  ($a, $b) is looser(&infix:<.*>) { $a + $b }
sub infix:<+:>  ($a, $b) is looser(&infix:<*:>) { $a + $b }
sub infix:<:+>  ($a, $b) is looser(&infix:<:*>) { $a + $b }

# Tests

is 1 + 3 .* 2,
    (1 + 3) * 2;

is 1 *. 3 + 2,
    1 * (3 + 2);

is 1 *. 2 +. 2,
    (1 * 2) + 2;

is 2 *: 2 + 3 .* 4,
    2 * ((2 + 3) * 4);
Lupine answered 6/9, 2013 at 7:42 Comment(1)
It's greate to know about this. But so you have to add infinite lines to allow .::::::::+ ?Pettifogging
N
-1

ISO core standard Prolog would not allow directly parsing old Logicians dot notation. Although it can form symbols including a dot, a prominent one being the univ operator

(=..)/2. Different symbols used as operator, lead to different functors in the parsed term. To mitigated the problem I have introduced the sys_alias/1 operator

property in my Prolog system. Here is an example definition:

:- op(750, xfx, .=.).
:- set_oper_property(infix(.=.), sys_alias(=)).

In the above we use second variant of (=)/2. The usual (=)/2 has operator level 700, and the second variant has a higher operator level 750, leading to the desired parenthesis.

And here is an example run:

Jekejeke Prolog 4, Runtime Library 1.4.5

?- write(p = q .=. q = p), nl.
(p = q) = (q = p)
Yes

Could be helpful to parse this beast:

On the single axioms of protothetic. II.
Sobociński, Bolesław - 1961

https://projecteuclid.org/euclid.ndjfl/1093956834

Norland answered 9/8, 2020 at 0:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.