How to consume W3C EBNF-Notation and produce a parser generator?

Asked 8/5, 2019 at 19:18 Answered 15/5, 2019 at 20:50

Throughout the RDF specs an EBNF-NOTATION XML specification is used to specify the grammar of a document. So I am wondering how to use Antlr/bison/yacc (maybe with some flag within these tools I don't know how to search for) — or other tools I don’t know about yet — to consume these specifications and generate a parser for me to use to see if my RDF is well-formed before trying to load.

An example grammar for my specific use case is: https://www.w3.org/TR/n-quads/#sec-grammar

I have already converted this grammar into Antlr4 grammar and created a parser using that tool and attempted to just write my own recursive descent parser but it was time-consuming and I'd rather not repeat the exercise if I have to do this again.

Don't really have any code, this is just a request for information.

What I want to do is basically copy/paste the grammars specified in this XML EBNF-NOTATION and produce a parser generator similar to what Antlr provides.

Trefler answered 8/5, 2019 at 19:18 Comment(2)

If your goal is just to see if your RDF is well-formed, you could just use an existing parser, e.g., Apache Jena comes with an nquads command line tool that does syntax checking for N-Quads. – Darcidarcia 8/5, 2019 at 19:46

In the case of the SPARQL spec, the grammar section in the spec document is actually generated from the JavaCC file that is used in Jena to generate the Jena SPARQL parser. The tooling for that is here. Jena is sort of the reference implementation for SPARQL, Turtle and N-Quads. I don’t think that process was used for N-Quads but you might find the tooling interesting. – Darcidarcia 8/5, 2019 at 20:4

REx Parser Generator works from grammars in W3C-style EBNF, and Railroad Diagram Generator can extract grammars directly from W3C documents.

Here is how to create a working parser from the example grammar (in Java - some other target languages are supported, too):

browse to Railroad Diagram Generator
on the Get Grammar tab, enter the example URL https://www.w3.org/TR/n-quads
proceed to Edit Grammar
add a whitespace rule to the end of the grammar: WHITESPACE ::= [ #x9]+ /* ws: definition */
save grammar to local file n-quads.ebnf
browse to REx Parser Generator
use input file n-quads.ebnf and command line -java -tree -main
save the resulting parser n_quads.java and compile it
run the parser on a sample file: java n_quads -i a-sample-file

Full disclosure: I’m the creator and maintainer of REx Parser Generator.

Saiga answered 15/5, 2019 at 20:50 Comment(1)

Railroad Diagram link is broken; the new link is rr.red-dove.com/ui – Bagasse 12/9 at 16:2

It might be easier to use a tool to transform the EBNF into a parser generator spec for your parser generator of choice.

To do that, you need a tool that can be taught to read the EBNF; actually, you can probably teach most parser generators that by writing down the syntax of the EBNF.

That tool has to also build some kind of syntax tree representing the EBNF, that you can walk over/transform to the target EBNF. This is classic code generation... along with the usual issue that you have to specify the shape of the tree, build it, and then write all the ad hoc tree traversals needed to generate the target BNF.

You can get all this machinery packaged into a bundle as a program transformation system (PTS). PTS usually include parser generation, tree building, and pattern-directed code transformation. Then you can focus to writing the EBNF grammar, and writing source-to-source translation rules.

Our DMS Software Reengineering Toolkit can be used for this. We've done similar with DMS: namely, read XML DTD descriptions and synthesized high performance XML readers in Java.

Waterworks answered 8/5, 2019 at 19:31 Comment(0)

Recommended topics

Hot tags