PEG.js Get any text between ( and );
Asked Answered
D

1

5

I'm trying to catch some text between parathesis with a semicolon in the end.

Example: (in here there can be 'anything' !"#¤);); any character is possible);

I've tried this:

Text
 = "(" text:(.*) ");" { return text.join(""); }

But it seems (.*) will include the last ); before ");" does and I get the error:

Expected ");" or any character but end of input found

The problem is that the text can contain ");" so I want the outer most ); to descide when the line ends.

This regex \((.*)\); does what I want, but how can I do the same in PEG.js? I don't want to include the outer parentheses and semicolon in the result.

This seems like it should be quite easy if you know what you're doing =P

Dwarfism answered 20/9, 2016 at 21:44 Comment(1)
I have. Couldn't find what I was looking for. Or maybe I didn't understand it. If you know where in the docs this is specified it would be appreciated if you could tell me where.Dwarfism
S
14

So, the point is that a PEG is deterministic, while a regex is not. So a PEG won't backtrack once it's accepted some input. We can then simulate the semantics you want. Since you say the regex \((.*)\); does what you want, we might translate this to a PEG.

What does this regex do? It consumes all characters up to the end of the input, then keeps backtracking until it sees a );, i.e., it consumes the last possible );.

To make this work with a PEG, we might use a lookahead to keep consuming iff we have a ); ahead.

So, a solution is:

Text
 = "(" text:TextUntilTerminator ");" { return text.join(""); }

TextUntilTerminator
 = x:(&HaveTerminatorAhead .)* { return x.map(y => y[1]) }

HaveTerminatorAhead
 = . (!");" .)* ");"

The TextUntilTerminator non-terminal consumes while HaveTerminatorAhead matches without consuming it (a lookahead, the & symbol). Then it consumes one single character. It does so until it knows we've reached the final ); on the input.

The HaveTerminalAhead non-terminal is simple: it verifies if there is one character ahead, and, if it does, garantees that there is at least one ); after it. We also use the negative-lookahead ! to stop at the first ); we see (avoid consuming it, which would reproduce your original problem).

This PEG, then, reproduces the behavior of the regex you suggested.

Skip answered 21/9, 2016 at 12:55 Comment(6)
Nice, it works! And I learned some things too =) Thanks alot!Dwarfism
Note that, if you're using this on a bigger grammar, you probably should add another kind of lookahead to limit the checking. The code above assumes that you want the last ); on the input, which might not be the desired behavior if you want to match things beyond a Text.Skip
I will read files with multiple lines of this kind, but maybe I could just add a \n to the grammar to take this into account?Dwarfism
I have trouble understanding how HaveTerminatorAhead works. Could you try explaining it in some other way?Maier
How can we modify this, so that it matches until first delimiter and not the last one?Monoculture
@TornikeShavishvili: as TextUntilTerminator simply keeps consuming while there is at least one delimiter ahead, you could simply change the Text rule to use HaveTerminatorAhead directly instead, so it'll consume up to the first delimiter found. Minor changes will be needed, of course (as removing ");" from the HaveTerminatorAhead rule).Skip

© 2022 - 2024 — McMap. All rights reserved.