LL(1) parser implemented with stack: how to build AST?

Asked 22/11, 2013 at 19:42 Answered 8/8, 2015 at 8:2

parsing implementation context-free-grammar ll-grammar

I am currently building a parser by hand. It is a LL(1) parser. At the moment, it is a great recognizer: its function parse(List tokens) decides whether or not tokens is a member of the language or not.

Now, I want to build the corresponding AST for that input. However, I know how to implement it in a recursive descent way (already did it). That is, for the challenge, I implement my stack using a stack with the classical algorithm:

next <- first token of the input
stack <- START_SYMBOL
do {
    top <- stack.pop()
    if (top is a terminal and top == next) {
        next <- next token of the input
    } else if (top is a non terminal and PARSING_TABLE[top, next] exists) {
        stack.push(PARSING_TABLE[top, next]);
    } else {
         return invalid input;
    }
} while (stack is not empty);
return valid input;

where the PARSING_TABLE is the LL(1) table. However, I wonder how to implement the part which build the AST in such a configuration. I do not ask for complete implementation, more for implementation idea.

Thanks !

Mom answered 22/11, 2013 at 19:42 Comment(0)

Your stack can be annotated so that it contains the AST entry reference (i.e. rule number + position in rule + target data where to store) + (terminal/non terminal)

Your initial stack <- START_SYMBOL is annotated to store its result in the AST root.

Basically, your pop() selects the current AST construct. Then the next <- next token saves the value in your AST. The stack.push(PARSING_TABLE[top, next]); opens a new AST list and writes it in the construct corresponding to top, and generates in each entry of the stack the 'rule number + position + target list'.

When you parsing is finished, you have the entire tree.

In a precise AST, you might want to ignore some tokens. This can be done via appropriate annotations in the stack set during the push() part. The typical way is to attach to each of your rules (A -> B C) some meta information, for example, what is to be kept and what is the nature of the result.

Vegetation answered 26/11, 2013 at 10:59 Comment(2)

Do you know some simple case implementation of this somewhere on the web?. I have the same OP question, but i dont get your point. If you can elaborate more on the answer it would be of help for me. – Enthalpy 1/3, 2014 at 1:15

I just searched for nearly half an hour... and did not find anything with AST building, so basically no :( The following: ag-kastens.uni-paderborn.de/lehre/material/uebi/parsdemo/… is pretty good demonstrating LL(1) alone but I could not find anything with AST. – Vegetation 1/3, 2014 at 21:18

The difficulty arises because the common method of replacing a nonterminal on the stack with the rhs of its matched-rule effectively forgets the grammatical structure at the moment it's predicted. But to generate an AST you need that structure later when a rule-parse is completed.

Rather than replacing a nonterminal with the rhs symbols of its matching rule, leave it in place and push the matched symbols as a list object. This way the stack retains the hierarchial structure of the grammar.

Parsing consumes symbols in the topmost list. The exhaustion of a list corresponds to the completion of a rule. A nonterminal is removed from the stack when its rule is completed, not when it is predicted.

As the stack is consumed, build a corollary AST structure that remembers the relevant rule and stores the parsed tokens. Thus the stack acts like a predicted AST that flows into the parsed AST.

You can think of this as emulating the call hierarchy of a recursive-descent parser with the stack of symbol-lists as a stack of call-frames.

In ruby:

# g is the grammar; a list of rules
# s is a terminal sequence to parse
# note, this code does not tokenize EOF

def parse(g, s)

 tab = gen_table(g)
 stack = [[g.start_sym]]
 # intermediate ast node format: [rule-action, symbols...]
 ast = [[->(rhs){[:_S, rhs[0]]}]]

 loop do
  puts "PARSE\n #{s}\n #{stack}\n #{ast}"

  if stack.first.empty?
   raise "extraneous input" if not s.empty?
   break
  end

  if stack.last.empty? # rule complete
   stack.pop
   node = ast.pop
   # transform the node (eg to a class) using the associated rule-action
   node = node.first.(node.drop(1))
   ast.last.push(node)
   stack.last.shift # rm sym from stack after completing it
   next
  end

  raise "incomplete input" if s.empty?
  tok = s.first
  topsym = stack.last.first

  if topsym.is_a? String # terminal
   raise "mismatch #{tok} != #{topsym}" if tok != topsym
   stack.last.shift
   ast.last.push(s.shift)

  elsif topsym.is_a? Symbol # nonterminal
   ri = tab[topsym][tok]
   raise "no rule for #{topsym}, #{tok}" if ri.nil?
   stack.push(g[ri].rhs.clone)
   ast.push([g[ri].action])
  end

 end

 node = ast.first
 node.first.(node.drop(1))
end

Possie answered 8/8, 2015 at 8:2 Comment(0)

Recommended topics

Hot tags