Check if two "simple" 'if statements' in C are equivalent
Asked Answered
M

2

9

I have 'if statements' from two different sources, which try to implement the same condition possibly in a different way. The 'if statements' are C. If at all possible I need a python script that can decide whether pairs of conditions are equivalent or not. A basic example:

source1: ((op1 != v1) || ((op2 != v2) || (op3 != v3)))

source2: ((op2 != v2) || (op1 != v1) || (op3 != v3))

Of course any operator is allowed, function calls and of course parentheses.

Any ideas are welcome.

Edit 1: The function calls have no side effects.

Mancunian answered 6/3, 2013 at 21:50 Comment(14)
This is one of those problems which seems easy at first, and then you look at it for a minute and can't help but saying "eeiiwww".Beelzebub
Write a parser in PLY that turns it into a truth table and compare the truth tables. dabeaz.com/plyDan
If you are considering function calls as op1 and op2, , you should note that ((op1 != v1) || ((op2 != v2) || (op3 != v3))) is not necessarily equivalent to ((op2 != v2) || (op1 != v1) || (op3 != v3)). For example suppose op1 != v1 and op2 != v2, in the first statement op1 is only called, in the second statement only op2. This is due to short circuiting.Illegitimate
I think @Dan is right, you'll need to build an all-out parser for this. Just curious, though, what is the use case? Maybe there is an easier way?Beelzebub
Unless I'm mistaken, this problem is NP-hard, as it's equivalent to Boolean satisfiability. (There's a good chance I'm mistaken, though...) So if I'm not mistaken, the computational requirements grow very quickly (i.e. exponentially) with the number of terms.Unciform
If function calls are allowed, then I can think of at least one case that would require solving the halting problem. You're probably going to need to narrow things down a bit. Something like "Functions can't have side effects and always return a result"Calash
@user23127,@PeteBaughman: yes you're right, it is assumed that the function calls don't have any side effects. I'll update the question.Mancunian
Every one of you contributed something meaningful until now. Thank you for your contribution.Mancunian
@Beelzebub 1. What is the use case? There are hundreds of pseudo-code-likish specification fragments (not formalized but neither free text) and the corresponding C implementation. Using layers upon layers of transformations and a crude parser I transform the specification to C code. At this stage I already attempt to "reorder" commutative operations but I have to consider existing parenthesis. The same goes for the actual C code and then they are compared (as strings).Mancunian
@Beelzebub 2. This sounds most likely a bit futuristic or barbaric, but it works quite well. The problem remains: there are many situations where the code conforms to the specification but it is written "too" differently, hence a direct match can't be confirmed.Mancunian
@Mancunian Even after Edit 1 there's a problem: Consider the Function Forever() which enters an infinite loop: Is (TRUE || Forever()) equivalent to (Forever() || TRUE) ? Alternatively, substitute "Forever()" with "CrashProgram()". The latter example may be considered a side-effect, but certainly not the first.Calash
@PeteBaughman, you're of course right, but I don't need or want to evaluate any of the expressions. There is a somewhat short description two posts above, where I try to answer the question: What is the use case? Lacking a better word, I would like to "transform" (not evaluate) source1 and source2 until I can say, that for the exact same inputs (any combination thereof) both expressions would evaluate to the same result (or not). build the entire truth table! as suggested by Hooked one post below is not practicable, because any 'if condition' may have from 1 to about 30 sub-expressions.Mancunian
@Mancunian So would you consider (TRUE || Function()) and (Function() || TRUE) to be equivalent then? If so, the problem becomes a lot easier.Calash
@PeteBaughman I would consider ((a==1)||(Function()==2)) to be equivalent with ((2==Function())||(1==a)). These functions that may or may not be part of a condition, have no side effects, they always would return with the same value given the same environment, they don't return booleans, but their return value is always tested against something (Function() != something). I know that the problem is not easy, but I also don't pretend to know or need an academic solution to it, where every complicated situation is covered. I need a solution to a very specific/small subset of the whole problem.Mancunian
M
5

Here's the thing, the problem may (or may not) be NP-complete but unless this is within the inner-loop of something important (and the number of variable are small), build the entire truth table! It's extremely easy to do. It obviously grows as 2^n, but for small n this is completely feasible. Like the comments suggest, I'm assuming that the function calls have no side effects and simply output True or False.

I've posted a mockup example that solves your stated problem, adapt as needed. I rely on pythons parser to handle the evaluation of the expression.

import pyparsing as pypar
import itertools

def python_equiv(s):
    return s.replace('||',' or ').replace('&&',' and ')

def substitute(s,truth_table, VARS):
    for v,t in zip(VARS,truth_table):
        s = s.replace(v,t)
    return s

def check_statements(A1,A2):  
    VARS = set()
    maths    = pypar.oneOf("( ! = | & )")
    keywords = pypar.oneOf("and or")
    variable = pypar.Word(pypar.alphanums)
    variable.setParseAction(lambda x: VARS.add(x[0]))
    grammar  = pypar.OneOrMore(maths | keywords | variable)

    # Determine the variable names
    grammar.parseString(A1)
    grammar.parseString(A2)

    A1 = python_equiv(A1)
    A2 = python_equiv(A2)

    bool_vals = map(str, [False,True])

    for table in itertools.product(bool_vals,repeat=len(VARS)):
        T1 = substitute(A1,table,VARS)
        T2 = substitute(A2,table,VARS)
        if eval(T1) != eval(T2):
            print "FAIL AT ", table,
            return False

    print "Statements equiv:",

    return True


# Original example
A1 = '''((op1 != v1) || ((op2 != v2) || (op3 != v3)))'''
A2 = '''((op2 != v2) ||  (op1 != v1) || (op3 != v3))'''
print check_statements(A1,A2)

# Example with a failure
A1 = '''((op1 != v1) || ((op2 != v2) || (op3 != v3)))'''
A2 = '''((op2 != v2) ||  (op1 != v1) && (op3 != v3))'''
print check_statements(A1,A2)

Gives as output:

Statements equiv: True
FAIL AT  ('False', 'False', 'False', 'False', 'False', 'True') False
Mash answered 7/3, 2013 at 5:3 Comment(7)
My personal opinion: in example code and such, do not use from foo import *, that makes it hard for others to know which used symbols are from which import.Greybeard
@Greybeard Generally I agree, though with pyparsing it gets a bit much. For clarity I've added in the import paths.Mash
This helps you decide if the boolean formulas are equivalent, but not if they compute the same thing. Consider: x = P>Q ; if (x && y) ... ; x = sin(Z)<3; if (x && y) ... The two formulas are algebraically identical, but they do NOT compute the same thing. You need to prove the two conditionals are evaluated with the same control dependence.Opprobrium
@IraBaxter I see, I think I interpreted the question at a level that was too shallow. The OP stated "[Check if] statements ... are equivalent", and I saw that as check if "the boolean formulas are equivalent". I'll leave my posting up since I think it still has some value as a subset of the original problem.Mash
@Hooked, this is a cool idea, and thank you for bringing it up, but there can be up to 30 subexpressions in a condition and as you mentioned this approach would be way to slow.Mancunian
@Mancunian Like I said in regards to IraBaxter, I don't think this fully answers the problem. I do want to point out it's not the number of subexpressions, but the number of unique variables, as that determines the size of the truth table. Small is relative too, I can easily run a simulation with 20 variables 2^20 ~= 100K in no time, but I would get stuck at 30!Mash
@Hooked, thank you for your help, with a few changes the code you've provided worked out pretty well (much better than previous attempts).Mancunian
O
2

To do this, you need control flow anlaysis to determine if the two conditions have the same control dependence (otherwise they don't execute in the same data context), full data flow analysis including points-to analysis of the C code, a side-effect analysis of functions, the ability to backslice from the root of the condition to the leaves of the expression across function calls, and then a boolean equivalence matcher that accounts for C semantics (e.g. short-circuiting, overflows, ...)

This is far beyond what you get from a C parser.

Opprobrium answered 6/3, 2013 at 23:27 Comment(3)
Well, question specifies "simple", which could be interpreted to mean subexpressions do not have side-effects etc...Greybeard
If you leave out side effects, you still have to determine what functions are called. This requires points-to analysis, unless you are going to eliminate this too. If you simplify it enough, yes he can get a simple answer. My point: won't be much left.Opprobrium
Well yeah, this rules out a large portion of common C conditions, for example (fp && -1 != (ch = fgetc(fp))), or anything where order matters. But if functions called in the condition expression have no side-effects (which is actually a reasonable self-imposed rule in C, for those who prefer functional programming), then it doesn't matter which get called.Greybeard

© 2022 - 2024 — McMap. All rights reserved.