Simplest of parsers in go tool yacc
Asked Answered
B

2

10

Using this command:

go tool yacc -p Verb -o verb.go boilerplate.y

Trying to build this yacc file:

// boilerplate.y
%{

package main

import (
    "bufio"
    "fmt"
    "os"
    "unicode"
)

%}

%% 

.|\n   ECHO;

%%

func main() {
    fi := bufio.NewReader(os.NewFile(0, "stdin"))
  s, err := fi.ReadString('\n')
  if err != nil {
    fmt.Println('error', err)
  } 

  VerbParse(&VerbLex{s: s})
}

Error: bad syntax on first rule: boilerplate.y:16

Successfully got this example to work:

https://github.com/golang-samples/yacc/blob/master/simple/calc.y

Trying to build my own and work through the lex & yacc book. Resources seem limited to non existent.

Biisk answered 28/7, 2016 at 19:48 Comment(0)
W
8

You have an incorrect rule in your specifications.

A specification file has the following declaration:

declarations
%%
rules
%%
programs

Where a rule is defined as:

A  :  BODY  ;

Where A is a non-terminal symbol, while BODY is made up of tokens (terminal symbols), non-terminals and literals. The : and ; are required components of rule declaration syntax.

Hence the rule:

.|\n   ECHO;

is syntactically incorrect.

Since you are simply trying to echo the input, a very simple implementation based on calc.y would be following (file echo.y):

rules

%%

in : /* empty */
  | in input '\n'
     { fmt.Printf("Read character: %s\n", $2) }
  ;

input : CHARACTER
  | input CHARACTER
      { $$ = $1 + $2 }
  ;

program

%%

type InputLex struct {
    // contains one complete input string (with the trailing \n)
    s string
    // used to keep track of parser position along the above imput string
    pos int
}

func (l *InputLex) Lex(lval *InputSymType) int {
    var c rune = ' '

    // skip through all the spaces, both at the ends and in between
    for c == ' ' {
        if l.pos == len(l.s) {
            return 0
        }
        c = rune(l.s[l.pos])
        l.pos += 1
    }

    // only look for input characters that are either digits or lower case
    // to do more specific parsing, you'll define more tokens and have a 
    // more complex parsing logic here, choosing which token to return
    // based on parsed input
    if unicode.IsDigit(c) || unicode.IsLower(c) {
        lval.val = string(c)
        return CHARACTER
    }

    // do not return any token in case of unrecognized grammer
    // this results in syntax error
    return int(c)
}

func (l *InputLex) Error(s string) {
    fmt.Printf("syntax error: %s\n", s)
}

func main() {
    // same as in calc.y
}

func readline(fi *bufio.Reader) (string, bool) {
    // same as in calc.y
}

To compile and run this program, do the following at command prompt:

go tool yacc -o echo.go -p Input echo.y
go run echo.go

As you can see, you'll have to define your own parsing rules in the Lex method. The struct InputLex is designed to hold the values while your input is being parsed. InputSymType is auto generated and is defined by the %union declared in the declaration part of specification.

As far as I can tell, there is no way to directly use JISON or a regex to do the matching using go's yacc tool. You may have to take a look at some other libraries.

More details can be found here: http://dinosaur.compilertools.net/yacc/

Full working code here: https://play.golang.org/p/u1QxwRKLCl

Wearing answered 1/8, 2016 at 21:0 Comment(5)
I just want a basic file to echo standard input. I only kind of know JISON, and in that there are lex rules and grammar rules. I wanted to see the two broken out using Go in a .y file that I can work on to rebuild the simple calc file, then maybe a json parser or something.Biisk
Here is JISON using lex rules in the same file: zaa.ch/jison/demos/calc I don't really grok how having a lex function works that you linked. I see breaking the lex stage into tokens using reg ex.Biisk
Added a basic echo parser. As far as I can tell, JISON might not be usable with go tool's yacc.Wearing
You are right about one thing though. There is a severe lack of proper documentation.Wearing
TIL: Just use ANTLR :) Someone recently got a go target mostly working.Biisk
M
-2

All yacc grammar file shoud have 3 parts

  • declarations
  • rules
  • programs

in Declarations, you may has %token, %type, %start, indicate token list, nonterminal list and the start symbol in grammar.

rule's form is A:Body {action codes}

programs need implement the function lexer or getToken

see the example https://github.com/acekingke/yaccgo/blob/main/examples/e.y

it is easy to understand, just parse the string "nnn..."

Manifestation answered 7/12, 2021 at 5:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.