Xtext: grammar for language with significant/semantic whitespace
Asked Answered
A

4

11

How can I use Xtext to parse languages with semantic whitespace? I'm trying to write a grammar for CoffeeScript and I can't find any good documentation on this.

Arraign answered 23/8, 2011 at 21:33 Comment(0)
H
4

AFAIK, you can't.

In case of parsing Python-like languages, you'd need the lexer to emit INDENT and DEDENT tokens. For that to happen, you'd need semantic predicates to be supported inside lexer rules (Xtext's terminal rules) that would first check if the current-position-in-line of the next character int the input equals 0 (the beginning of the line) and is a ' ' or '\t'.

But browsing through the documentation, I don't see this is supported by Xtext at the moment. Since Xtext 2.0, support has been added for semantic predicates in production rules (see: 6.2.8. Syntactic Predicates), but not in terminal rules.

The only way to do this with Xtext would be to let the lexer produce terminal spaces and line-breaks, but this would make an utter mess of your production rules.

If you want to parse such a language using Java (and a Java oriented parser generator) I'd recommend ANTLR, in which you can emit such INDENT and DEDENT tokens quite easily. But if you're keen on Eclipse integration, then I don't see how you'd be able to do this using Xtext, sorry.

Hyperbaton answered 26/8, 2011 at 19:22 Comment(0)
M
17

Here's an example whitespace sensitive language in XText

Mcdevitt answered 13/9, 2011 at 8:59 Comment(3)
Cool (I'll have to delve into XText a bit, an not answer question in this tag yet... :))! I can't remove my answer though since it is the accepted answer. @Arraign needs to accepts yours before I can remove mine.Hyperbaton
Seems to be hosted on github now github.com/ralfebert/org.eclipselabs.todotextCracker
Unfortunately, I can't find this sample anywhere.Keyser
H
4

AFAIK, you can't.

In case of parsing Python-like languages, you'd need the lexer to emit INDENT and DEDENT tokens. For that to happen, you'd need semantic predicates to be supported inside lexer rules (Xtext's terminal rules) that would first check if the current-position-in-line of the next character int the input equals 0 (the beginning of the line) and is a ' ' or '\t'.

But browsing through the documentation, I don't see this is supported by Xtext at the moment. Since Xtext 2.0, support has been added for semantic predicates in production rules (see: 6.2.8. Syntactic Predicates), but not in terminal rules.

The only way to do this with Xtext would be to let the lexer produce terminal spaces and line-breaks, but this would make an utter mess of your production rules.

If you want to parse such a language using Java (and a Java oriented parser generator) I'd recommend ANTLR, in which you can emit such INDENT and DEDENT tokens quite easily. But if you're keen on Eclipse integration, then I don't see how you'd be able to do this using Xtext, sorry.

Hyperbaton answered 26/8, 2011 at 19:22 Comment(0)
B
4

Version 2.8 of Xtext comes with support for Whitespace-Aware Languages. This version ships with the "Home Automation Example" that you can use as a template.

Balfore answered 12/2, 2016 at 9:35 Comment(0)
M
2

For people interested in CoffeeScript, Adam Schmideg has an Eclipse plugin that uses XText.

For people interested in parsing Python-like DSL's in XText, Ralf Ebert's code for Todotext mentioned above is no longer available from Github but you can find it in the Eclipse test repository. See the original thread about this work and the Eclipse issue that was raised about it.

I have been playing with this code today and my conclusion is it no longer works in the current version of XText. When XText is used in Eclipse, I think it does "partial parsing". This is not compatible with the stateful lexer you need to process indentation sensative languages. So I suspect even if you patch the lexer, the Eclipse editor does not work. In the issue, it looks like Ralf proposed patches to address these issues, but looking into the XText source, these changes seem long gone? If I am wrong and someone can get it to work, I would be very interested?

There is a different implementation here but I cannot get that to work with the current version of XText either.

Instead I have switched to parboiled which does supports indentation based grammars out the box.

Multiplex answered 30/4, 2013 at 3:30 Comment(1)
Others experiencing the same problem: post on XText list March 2013Multiplex

© 2022 - 2024 — McMap. All rights reserved.