ANTLR4 : ordering problem of parser rules for a keyword used in several rules (AND, BETWEEN AND)
Asked Answered
M

2

6

I am having a problem while parsing some SQL typed string with ANTLR4. The parsed string is :

WHERE a <> 17106
AND b BETWEEN c AND d
AND e BTW(f, g)

Here is a snippet of my grammar :

where_clause
    : WHERE element
    ;

element
    : element NOT_EQUAL_INFERIOR element
    | element BETWEEN element AND element
    | element BTW LEFT_PARENTHESIS element COMMA_CHAR element RIGHT_PARENTHESIS
    | element AND element
    | WORD
    ;

NOT_EQUAL_INFERIOR:     '<>';
LEFT_PARENTHESIS:       '(';
RIGHT_PARENTHESIS:      ')';
COMMA_CHAR:         ',';

BETWEEN:            B E T W E E N;
BTW:                B T W;

WORD ... //can be anything ... it doesn't matter for the problem.

This grammar generates a tree on that string (image)
(source: hostpic.xyz)

But as you can see on that same picture, the tree is not the "correct one".

ANTLR4 being greedy, it englobes everything that follows the BETWEEN in a single "element", but we want it to only take "c" and "d".

Naturally, since it englobes everything in the element rule, it is missing the second AND of the BETWEEN, so it fails.

I have tried changing order of the rules (putting AND before BETWEEN), I tried changing association to right to those rules (< assoc=right >), but those didn't work. They change the tree but don't make it the way I want it to be.

I feel like the error is a mix of greediness, association, recursivity ... Makes it quite difficult to look for the same kind of issue, but maybe I'm just missing the correct words.

Thanks, have a nice day !

Mesa answered 27/8, 2019 at 9:45 Comment(0)
B
1

I think you misuse the rule element. I don't think SQL allows you to put anything as left and right limits of BETWEEN.

Not tested, but I'd try this:

expression
    : expression NOT_EQUAL_INFERIOR expression
    | term BETWEEN term AND term
    | term BTW LEFT_PARENTHESIS term COMMA_CHAR term RIGHT_PARENTHESIS
    | expression AND expression
    | term
    ;

term
    : WORD
    ;

Here your element becomes expression in most places, but in others it becomes term. The latter is a dummy rule for now, but I'm pretty sure you'd want to also add e.g. literals to it.

Boogie answered 27/8, 2019 at 10:51 Comment(5)
Hi, thank you for answering. My grammar is built to be a "global SQL" grammar; i.e: it works for all SQL, basic, transact, oracle, and others. And, as a matter of fact, you can put anything as left and right limits of between, at least on Sqlite3. I just tested it right now, and you can have booleans on both side of the between. Example : SELECT * FROM test WHERE bool BETWEEN 0 > 2 AND true This runs.Mesa
OT: btw there is already ISO SQL ANTLR grammar available. Implementing all SQL dialects in the same grammar is imho impossible.Durkheim
@LouisC.: I meant "anything" as in for example 1 BTW (2, 3), which would also be a valid element according to your grammar. Is 0 BETWEEN 1 BTW (2, 3) AND 4 <> 5 valid SQL?Boogie
@doublep: Yes, both of these are indeed valid SQL (they run in some SQL dialects). Hence why I am making the rule element this way. To ibre5041: Thanks for the tip. I am fully aware that having all SQL dialects is at the very least very difficult. But in this grammar, it won't matter if the grammar allows SQL impossibilities (like having a Teradata clause in a PLSQL context for example). Because the point of this grammar will be to run smoothly on all SQL dialects and not to be perfectly fitted to one. For this reason I can't use the ISO SQL ANTLR grammar.Mesa
@LouisC In Standard SQL a lot of grammar is type-dependent. There is no point allowing nonesense such as 0 BETWEEN 1 BTW (2, 3) AND 4 <> 5 that may be accepted by Sqlite due to its definient parsing. (if BETWEEN were to be allowed with booleans, it would always return TRUE, or NULL if any arg is NULL).Mons
H
0

Disclaimer: I don't actually use ANTLR (I use my own), and I haven't worked with the (rather hairy) SQL grammar in a while, so this may be off the mark, but I think to get what you want you'll have to do something along the lines of:

    ... 
    where_clause
         : WHERE disjunction
         ;

    disjunction
         : conjunction OR disjunction
         | conjunction
         ;

     conjunction
         : element AND conjunction
         | element
         ;

     element
         : element NOT_EQUAL_INFERIOR element
         | element BETWEEN element AND element
         | element BTW LEFT_PARENTHESIS element COMMA_CHAR element RIGHT_PARENTHESIS
         | WORD
         ; 
    ...

This is not the complete refactoring needed but illustrates the first steps.

Horace answered 17/9, 2019 at 10:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.