Parser Generators and Ragel... Making my own D Parser
Asked Answered
F

3

10

I'm new to the world of compilers, and I recently heard about something called a parser generator. From what I (think) I've understood, parser generators take in a syntax file and output a source code file that can parse files with the given syntax.

A few questions:

  1. Did I understand that correctly?

  2. If so, is Ragel such a tool?

  3. If it is, can Ragel output a D parser into D source code?

Thank you!

Fiske answered 18/1, 2011 at 0:38 Comment(1)
"new to the world of compilers" Welcome.Furtherance
V
18
  1. That's basically it. Parser generators transform a grammar into a source file that can be used to recognize strings that are members of the language defined by the grammar. Often, but not always, a parser generator requires a lexical analyzer to break text down into tokens before it does its work. Lex and Yacc are classic examples of a paired lexical analyzer and parser generator.

    Modern parser generators offer additional features. For instance, ANTLR can generate code for lexical analysis, grammatical analysis, and even walk the generated abstract syntax tree. Elkhound generates a parser that uses the GLR parsing algorithm. This allows it to recognize a wider range of languages than non-generalized parsing algorithms. PEG Parsers don't require a separate lexical analyzer.

  2. Ragel actually generates a lexical analyzer in the form of a finite state machine. It can recognize a regular language but not a context-free language. This means it can't recognize most programming languages, including D.

  3. Ragel does generate D code if you need a fast lexical analyzer.

To fully understand what a parser generator does for you, you'll need some formal language and parsing theory. There are worse places to start than the The Dragon Book. See also: Learning to write a compiler.

If you're feeling brave, be sure to check out the lexing and parsing code distributed with the DMD compiler - /dmd2/src/dmd/ - lexer.c and parse.c.

Ventose answered 18/1, 2011 at 5:31 Comment(0)
C
13

While Ragel is based on regular expressions, it's not just a regex FSM generator. It allows recursion using an additional call/return syntax, as well as other features which allow parsing non-regular languages. So while Ragel does generate FSMs, it allows generating multiple different FSMs and provides mechanisms for jumping between them at arbitrary points, or using a special machine transition syntax. It also allows executing arbitrary code at state transitions.

Another thing that makes Ragel unique is that it's online. In other words, it's easy to use to scan data from an asynchronous source, such as a non-blocking socket. It also uses no dynamic resources, except that for call/return you can use either static, automatic, or dynamic memory for the stack; however you want. There's no global state, either.

Ragel is quite unique. Unlike most (all?) traditional generators, it was made for network programming.

Capitate answered 27/3, 2011 at 3:52 Comment(1)
All good, but why do you mention neywork &non-blocking stuff? Ragel operates on bytes, it has no notion of kjowing that a data comes from a socket or from outer space... origin of the dara seems irrelevant here..Quasimodo
F
1

Could be:

MySourceCode --> (Scanner) --> MyScannerDataFile MyScannerDataFile --> (Parser) --> MyParserDataFile MyParserDataFile --> (CodeGenerator) --> MyExecutableFile

or:

MySourceCode --> (ScannerAndParser) --> MyScannerAndParserDataFile MyScannerAndParserDataFile --> (CodeGenerator) --> MyExecutableFile

Furtherance answered 8/4, 2011 at 15:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.