PL/SQL ANTLR grammar fails for some PL/SQL files?
Asked Answered
T

1

9

I'am using ANTLR4 to generate Abstract Syntax Tree (AST) for PL/SQL codes. For some queries it works fine, but for some codes it generates an AST with only one node which is not right.

For example:

DECLARE
   a   RAW;            -- migrate to BLOB
   b   LONG RAW;       -- migrate to BLOB
   c   LONG VARCHAR;   -- migrate to LOB
   d   LONG;           -- migrate to LOB
   x   VARCHAR;
   CURSOR mycur RETURN LONG;     -- should flag
   FUNCTION myfunc RETURN LONG RAW    -- should flag
   IS
      z  LONG RAW;    -- should flag
   BEGIN 
      RETURN z;
   END;  
BEGIN
   SELECT mycol, CAST(col2 AS RAW)      -- should flag
     INTO a 
     FROM mytab 
    WHERE (b IS OF TYPE(LONG RAW, RAW, VARCHAR)); -- should flag
END;
CREATE TABLE tab (
   a   RAW,            -- should flag
   b   LONG RAW,       -- should flag
   c   LONG VARCHAR,   -- should flag
   d   LONG,           -- should flag
   x   VARCHAR
);

For this code it generates this AST:

(compilation_unit DECLARE a RAW ; b LONG RAW ; c LONG VARCHAR ; d LONG ; x VARCHAR ; CURSOR mycur RETURN LONG ; FUNCTION myfunc RETURN LONG RAW IS z LONG RAW ; BEGIN RETURN z ; END ; BEGIN SELECT mycol , CAST ( col2 AS RAW ) INTO a FROM mytab WHERE ( b IS OF TYPE ( LONG RAW , RAW , VARCHAR ) ) ; END ; CREATE TABLE tab ( a RAW , b LONG RAW , c LONG VARCHAR , d LONG , x VARCHAR ) ;)

which is merely the given code within compilation_unit node.

But if i give same code without declare part it gives good AST.

For this code:

FUNCTION myfunc RETURN LONG RAW    -- should flag
   IS
      z  LONG RAW;    -- should flag
   BEGIN 
      RETURN z;
   END;  
BEGIN
   SELECT mycol, CAST(col2 AS RAW)      -- should flag
     INTO a 
     FROM mytab 
    WHERE (b IS OF TYPE(LONG RAW, RAW, VARCHAR)); -- should flag
END;
CREATE TABLE tab (
   a   RAW,            -- should flag
   b   LONG RAW,       -- should flag
   c   LONG VARCHAR,   -- should flag
   d   LONG,           -- should flag
   x   VARCHAR
);

it gives this AST:

(compilation_unit (unit_statement (create_function_body FUNCTION (function_name (id (id_expression (regular_id myfunc)))) RETURN (type_spec (datatype (native_datatype_element LONG RAW))) IS (declare_spec (variable_declaration (variable_name (id_expression (regular_id z))) (type_spec (datatype (native_datatype_element LONG RAW))) ;)) (body BEGIN (seq_of_statements (statement (return_statement RETURN (condition (expression (logical_and_expression (negated_expression (equality_expression (multiset_expression (relational_expression (compound_expression (concatenation (additive_expression (multiply_expression (datetime_expression (model_expression (unary_expression (atom (general_element (general_element_part (id_expression (regular_id z))))))))))))))))))))) ;) END) ;)) BEGIN (unit_statement (data_manipulation_language_statements (select_statement (subquery (subquery_basic_elements (query_block SELECT (selected_element (select_list_elements (expression (logical_and_expression (negated_expression (equality_expression (multiset_expression (relational_expression (compound_expression (concatenation (additive_expression (multiply_expression (datetime_expression (model_expression (unary_expression (atom (general_element (general_element_part (id_expression (regular_id mycol)))))))))))))))))))) , (selected_element (select_list_elements (expression (logical_and_expression (negated_expression (equality_expression (multiset_expression (relational_expression (compound_expression (concatenation (additive_expression (multiply_expression (datetime_expression (model_expression (unary_expression (standard_function CAST ( (concatenation_wrapper (concatenation (additive_expression (multiply_expression (datetime_expression (model_expression (unary_expression (atom (general_element (general_element_part (id_expression (regular_id col2)))))))))))) AS (type_spec (datatype (native_datatype_element RAW))) ))))))))))))))))) (into_clause INTO (variable_name (id_expression (regular_id a)))) (from_clause FROM (table_ref_list (table_ref (table_ref_aux (dml_table_expression_clause (tableview_name (id (id_expression (regular_id mytab))))))))) (where_clause WHERE (condition_wrapper (expression (logical_and_expression (negated_expression (equality_expression (multiset_expression (relational_expression (compound_expression (concatenation (additive_expression (multiply_expression (datetime_expression (model_expression (unary_expression (atom ( (expression_or_vector (expression (logical_and_expression (negated_expression (equality_expression (multiset_expression (relational_expression (compound_expression (concatenation (additive_expression (multiply_expression (datetime_expression (model_expression (unary_expression (atom (general_element (general_element_part (id_expression (regular_id b)))))))))))))) IS OF TYPE ( (type_spec (datatype (native_datatype_element LONG RAW))) , (type_spec (datatype (native_datatype_element RAW))) , (type_spec (datatype (native_datatype_element VARCHAR))) )))))) ))))) ; END ;)))))))))))))))))) unit_statement (unit_statement CREATE TABLE tab) (unit_statement (data_manipulation_language_statements (select_statement (subquery (subquery_basic_elements ( (subquery (subquery_basic_elements a RAW , b LONG RAW , c LONG VARCHAR , d LONG , x VARCHAR)) )) ;)))) <EOF>)

So it seems like some keywords(rules) like declare are missing in the PL/SQL grammar file. I used this plsql.g4 file with Antlr which is in the Antlr site.

Is there any way I can find an updated plsql.g4 file or do I need to add those rules myself?

Tully answered 17/12, 2015 at 12:21 Comment(6)
That first "AST" (compilation-unit DECLARE a...) is really an AST? It looks to me like it just a string.Cleaves
Those grammars are open source, which is a funny way to say "you fix it if you don't like what it does".Cleaves
Yes. This is the String representation of the AST returned from toStringTree() method given by ANTLR4. And I'am asking whether i can find another plsql.g4 file(an updated one with more rules completed)? BTW thanks for your response.Tully
Both example ASTs are printed using toStringTree()? I agree they look inconsistent, and you seem to claim the first one is bad by virtue of exclaiming claiming the second one is good. (I don't know of alternative grammars for ANTLR.)Cleaves
@ChathuraWijeweera There is a more up-to-date fork of the ANTLR 3.3 PL/SQL grammar at github.com/ibre5041/plsql-parser It has extra fixes, including some for DECLARE.Sediment
On this late inspection, I'd guess the grammar picks up DECLAREs as a string, and the lexical defintion used for this doesn't detect the end of it correctly. Now its your problem to fix that if you don't like it.Cleaves
E
1

The problem seems to actually be that plsql antlr-grammar doesn't support LONG VARCHAR datatype, in this part: "c LONG VARCHAR;"

It leads to declaration being misinterpreted as a procedure call messing up rest of the parsing.

I have opened an issue in the grammar github: https://github.com/antlr/grammars-v4/issues/2158

Escorial answered 24/4, 2021 at 9:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.