How do I parse this type of expressions?
Asked Answered
H

1

6

I don't have a compilers background so I am not sure if this is a commmon thing in that area. Are there any standard techniques to parse expressions like this? (Say, tab indicates the depth)

And
    A + B = 1
    C + D = 1
    Or
       P + Q = 1
       K = 1
    And
       Q = 1
       R = 2

Should be parsed as:

((A+B=1) AND (C+D=1) AND ((P+Q=1) OR (K=1)) AND ((Q=1) AND (R=2)))

I am not sure if I should resort to a stack based evaluation? I am currently trying out one and I'll post a working code if I can get it running.

Any suggestions on a simple way to achieve this?

Hessler answered 29/7, 2013 at 18:42 Comment(7)
What's the context? Does it have to be "safe"? Or could you change your syntax slightly and use Python with eval() or similar? For example, ((A+B==1) and (C+D==1)) is Python syntax.Turncoat
Unfortunately, I cannot change the input. I am parsing some XML files and managed to parse the expressions themselves into a string. How I format the string is upto me but the order of evaluation and all still needs care. Also, I don't want to evaluate anything but want to say, get a string for printing.Hessler
are you parsing the expression into trees? Then evaluating them by subbing in numbers for those variables?Steffens
+1 for an interesting question, but I'm slightly confused -- are you only asking about parsing, or about how to do evaluation as well?Keffer
@MattFenwick: Sorry what I meant was I am not too concerned about evaluating the expressions for now. I am only trying to obtain a string representation of the input.Hessler
Isn't ((A+B=1) AND (C+D=1)... a "string representation of the input" already?Turncoat
@BenHoyt: Note that I am talking about the first multi-line string that does not have any parenthetical text and not the second one (which is the one I want from the first) which contains proper order of evaluation.Hessler
K
4

Assuming you are asking about how to parse expressions built out of operators with varying precedences and associativities -- absolutely.

One effective approach is called "top down operator precedence", and maybe also "operator precedence", and "precedence climbing" parsing. Here are some nice sources explaining the approach in detail:

The really neat thing is how little code it actually takes.

Key concepts are:

  • prefix vs infix vs mixfix

  • precedence: is 3 + 4 * 5 parsed as (3 + 4) * 5 or 3 + (4 * 5)?

  • associativity: is x - y - z parsed as x - (y - z) or (x - y) - z?

Coincidentally, I have just been learning this stuff recently and ended up writing a an article on my blog about a similar approach to operator parsing, which you can find here. In my approach, I deal with infix, prefix, postfix, and mixfix operators (i.e. ? :); precedences and associativities are all specified in tables; I use a stack to keep track of operators whose operands haven't yet been found. The parser then builds a parse tree, where each node is a subexpression.

Keffer answered 29/7, 2013 at 19:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.