What is the shortest way to write parser for my language?
Asked Answered
E

11

8

PS.Where to read about parsing theory?

Epifocal answered 4/10, 2009 at 13:25 Comment(0)
B
11

It's always a good idea to read the Dragon Book. But be aware that if your language is not trivial, there's not really a "short" way to do it.

Bipinnate answered 4/10, 2009 at 13:35 Comment(0)
M
12

Summary: the shortest is probably Antlr.

Its tempting to go to the Dragon Book to learn about parsing theory. But I don't think the Dragon Book and you have the same idea of what "theory" means. The Dragon Book describes how to built hand-written parsers, parser generators, etc, but you almost certainly want to use a parser-generation tool instead.

A few people have suggested Bison and Flex (or their older versions Yacc and Lex). Those are the old stalwarts, but they are not very usable tools. Their documentation is not poor per se, its just that it doesn't quite help in getting dealing with the accidental complexity of using them. Their internal data is not well encapsulated, and it is very hard to do anything advanced with them. As an example, in phc we still do not have correct line numbers because it is very difficult. They got better when we modified out grammar to include No-op statements, but that is an incredible hack which should not be necessary.

Ostensibly, Bison and Flex work together, but the interface is awkward. Worse, there are many versions of each, which only play nicely with some specific versions of the other. And, last I checked at least, the documentation of which versions went with which was pretty poor.

Writing a recursive descent parser is straightforward, but can be tedious. Antlr can do that for you, and it seems to be a pretty good toolset, with the benefit that what you learn on this project can be applied to lots of other languages and platforms (Antlr is very portable). There are also lots of existing grammars to learn from.

Its not clear what language you're working in, but some languages have excellent parsing frameworks. In particular, the Haskell Parsec Library seems very elegant. If you use C++ you might be tempted to use Spirit. I found it very easy to get started with, and difficult--but still possible--to do advanced things with it. This matches my experience of C++ in general. I say I found it easy to start, but then I had already written a couple of parsers, and studied parsing in compiler class.

Long story short: Antlr, unless you've a very good reason.

Mag answered 10/10, 2009 at 0:0 Comment(2)
I don't agree with you. Bison and Flex have good documentaton.Holcomb
@Kinopiko: fair enough. I guess that's not exactly what I meant. Hope its better/fairer now.Mag
B
11

It's always a good idea to read the Dragon Book. But be aware that if your language is not trivial, there's not really a "short" way to do it.

Bipinnate answered 4/10, 2009 at 13:35 Comment(0)
T
5

It rather depends on your language. Some very simple languages take very little parsing so can be hand-coded; other languages use PEG generators such as Rats! ( PEG is parser expression grammar, which sits between a Regex and a LR parser ) or conventional parser generators such as Antlr and Yacc. Less formal languages require probabilistic techniques such as link grammars.

Trabue answered 4/10, 2009 at 13:38 Comment(0)
M
4

Write a Recursive Descent Parser. This is sometimes easier than YACC/BISON, and usually more intuitive.

Marduk answered 4/10, 2009 at 13:52 Comment(0)
N
3

Douglas Crockford has an approachable example of a parser written in JavaScript.

Nealon answered 10/10, 2009 at 0:19 Comment(0)
W
2

YACC, there are various implementation for different languages.

Good luck with your language ;-)

Whoso answered 4/10, 2009 at 13:28 Comment(0)
A
2

I used the GOLD Parsing System, because it seemed easier to use than ANTLR for a novice like me, while still being sufficiently-fully-featured for my needs. The web site includes documentation (including an instructions on Writing Grammars, which is half the work) as well as software.

Ammons answered 4/10, 2009 at 13:44 Comment(0)
C
2

Try Bison for parsing and Flex for lexing

The bison definition of your language is in the form of a context-free grammar. The wikipedia artcile on this topic is quite good, and is probably a good place to start.

Chimb answered 4/10, 2009 at 13:46 Comment(0)
P
1

Using a parser generator for your host language is the fastest way, combined with parsing theory from a book such as the Dragon Book or the Modern Compiler Construction in {C,ML} series.

If you use C, yacc and the GNU version bison are the standard generators. Antlr is widely used in many languages, supporting Java, C#, and C++ as far as I know. There are also many others in almost any language.

My personal favorite at present is Menhir, an excellent parser generator for OCaml. ML-style languages (Ocaml, Standard ML, etc.) dialects in general are very good for building compilers and interpreters.

Potence answered 10/10, 2009 at 0:32 Comment(0)
T
1

ANTLR is the easiest for someone without compiler theory background because of:

  • ANTLRWORKS (visual parsing and AST debugging)

  • The ANTLR book (no compiler theory background required)

  • Just 1 syntax for lexer and parser.

Tarra answered 10/10, 2009 at 8:24 Comment(0)
O
1

If you are happy with parsing expression grammars, writing your own parsers can be incredibly short. Here is a simple Packrat parser that takes a reasonable subset of PEG:

import functools

class peg_parse:
    def __init__(self, grammar):
        self.grammar = {k:[tuple(l) for l in rules] for k,rules in grammar.items()}

    @functools.lru_cache(maxsize=None)
    def unify_key(self, key, text, at=0):
        if key not in self.grammar:
            return (at + len(key), (key, [])) if text[at:].startswith(key) \
                else (at, None)
        rules = self.grammar[key]
        for rule in rules:
            l, res = self.unify_rule(rule, text, at)
            if res is not None: return l, (key, res)
        return (0, None)


    def unify_line(self, parts, text, tfrom):
        results = []
        for part in parts:
            tfrom, res = self.unify_key(part, text, tfrom)
            if res is None: return tfrom, None
            results.append(res)
        return tfrom, results

It accepts grammars of the form of a python dictionary, with nonterminals as keys and alternatives as elements of the array, and each alternative is a sequence of expressions. Below is an example grammar.

term_grammar = {
    'expr': [
        ['term', 'add_op', 'expr'],
        ['term']],
    'term': [
        ['fact', 'mul_op', 'term'],
        ['fact']],
    'fact': [
        ['digits'],
        ['(','expr',')']],
    'digits': [
        ['digit','digits'],
        ['digit']],
    'digit': [[str(i)] for i in list(range(10))],
    'add_op': [['+'], ['-']],
    'mul_op': [['*'], ['/']]
}

Here is the driver:

import sys
def main(to_parse):
    result = peg_parse(term_grammar).unify_key('expr', to_parse)
    assert (len(to_parse) - result[0]) == 0
    print(result[1])

if __name__ == '__main__': main(sys.argv[1])

Which can be invoked thus:

python3 parser.py '1+2'
('expr', 
    [('term', 
       [('fact', 
           [('digits', [('digit', [('1', [])])])])]), 
     ('add_op', [('+', [])]), 
     ('expr', 
       [('term', [('fact', [('digits', [('digit', [('2', [])])])])])])])

Parsing Expression Grammars take some care to write: The ordering of alternatives is important (Unlike a Context Free Grammar, the alternatives are an ordered choice, with the first choice being tried first, and second being tried only if the first did not match). However, they can represent all known context free grammars.

If on the other hand, you decide to go with a Context Free Grammar, Earley Parser is one of the simplest.

Oech answered 7/12, 2018 at 10:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.