Why is determining if a function is pure difficult?

Asked 22/10, 2009 at 18:59 Answered 23/10, 2009 at 3:49

Solved algorithm functional-programming static-analysis

I was at the StackOverflow Dev Days convention yesterday, and one of the speakers was talking about Python. He showed a Memoize function, and I asked if there was any way to keep it from being used on a non-pure function. He said no, that's basically impossible, and if someone could figure out a way to do it it would make a great PhD thesis.

That sort of confused me, because it doesn't seem all that difficult for a compiler/interpreter to solve recursively. In pseudocode:

function isPure(functionMetadata): boolean;
begin
   result = true;
   for each variable in functionMetadata.variablesModified
      result = result and variable.isLocalToThisFunction;
   for each dependency in functionMetadata.functionsCalled
      result = result and isPure(dependency);
end;

That's the basic idea. Obviously you'd need some sort of check to prevent infinite recursion on mutually-dependent functions, but that's not too difficult to set up.

Higher-order functions that take function pointers might be problematic, since they can't be verified statically, but my original question presupposes that the compiler has some sort of language constraint to designate that only a pure function pointer can be passed to a certain parameter. If one existed, that could be used to satisfy the condition.

Obviously this would be easier in a compiled language than an interpreted one, since all this number-crunching would be done before the program is executed and so not slow anything down, but I don't really see any fundamental problems that would make it impossible to evaluate.

Does anyone with a bit more knowledge in this area know what I'm missing?

Connatural answered 22/10, 2009 at 18:59 Comment(2)

It would be any variables accessed that must be local, not just modified. A function whose result depends on the current value of a global, even if it doesn't modify that global, is clearly not pure. – Milone 22/10, 2009 at 22:58

The obvious question: does logging affect purity ? I would say that on the contrary, actually MODIFYING a global value (if this cannot throw), does not affect the result of your method, and thus allow for memoization even if it is clearly not pure! – Loveliesbleeding 23/10, 2009 at 9:35

It is particularly hard in Python. Since anObject.aFunc can be changed arbitrarily at runtime, you cannot determine at compile time which function will anObject.aFunc() call or even if it will be a function at all.

Vienna answered 22/10, 2009 at 19:13 Comment(2)

Ouch! All right, I can see how that would make it more difficult. – Connatural 22/10, 2009 at 19:24

There's more - eval, setattr(), gettattr() doing weird things, etc. Features like this make languages hard to analyze statically. – Houphouetboigny 22/10, 2009 at 21:20

You also need to annotate every system call, every FFI, ...

And furthermore the tiniest 'leak' tends to leak into the whole code base.

It is not a theoretically intractable problem, but in practice it is very very difficult to do in a fashion that the whole system does not feel brittle.

As an aside, I don't think this makes a good PhD thesis; Haskell effectively already has (a version of) this, with the IO monad.

And I am sure lots of people continue to look at this 'in practice'. (wild speculation) In 20 years we may have this.

Ugrian answered 22/10, 2009 at 19:9 Comment(6)

Computer AI though is only 4-6 years away and sure it will be able to solve this problem for us :) – Aarhus 22/10, 2009 at 19:47

In Haskell every function is pure. IO doesn't change that. Well, maybe except of those ones that do unsafePerformIO. – Spence 22/10, 2009 at 19:59

Anootating all the system class is that .NET is doing for their Code Contracts. When in doubt, it assumes the method is not pure. – Galcha 22/10, 2009 at 20:8

Jared: Hasn't computer AI been "only 4-6 years away" for the last 30 years or so? :P – Connatural 22/10, 2009 at 20:17

@Mason or maybe even 50 years or so %) – Delrosario 22/10, 2009 at 20:35

@Mason - I think Jared is joking. – Kramatorsk 22/10, 2009 at 22:7

Vienna answered 22/10, 2009 at 19:13 Comment(2)

Ouch! All right, I can see how that would make it more difficult. – Connatural 22/10, 2009 at 19:24

There's more - eval, setattr(), gettattr() doing weird things, etc. Features like this make languages hard to analyze statically. – Houphouetboigny 22/10, 2009 at 21:20

In addition to the other excellent answers here: Your pseudocode looks only at whether a function modifies variables. But that's not really what "pure" means. "Pure" typically means something closer to "referentially transparent." In other words, the output is completely dependent on the input. So something as simple as reading the current time and making that a factor in the result (or reading from input, or reading the state of the machine, or...) makes the function non-pure without modifying any variables.

Also, you could write a "pure" function that did modify variables.

Vallee answered 22/10, 2009 at 19:28 Comment(1)

+1 Being side-effect free doesn't mean using only immutable values. – Spence 22/10, 2009 at 20:2

Here's the first thing that popped into my mind when I read your question.

Class Hierarchies

Determining if a variable is modified includes the act of digging through every single method which is called on the variable to determine if it's mutating. This is ... somewhat straight forward for a sealed type with a non-virtual method.

But consider virtual methods. You must find every single derived type and verify that every single override of that method does not mutate state. Determining this is simply not possible in any language / framework which allows for dynamic code generation or is simply dynamic (if it's possible, it's extremely difficult). The reason why is that the set of derived types is not fixed because a new one can be generated at runtime.

Take C# as an example. There is nothing stopping me from generating a derived class at runtime which overrides that virtual method and modifies state. A static verified would not be able to detect this type of modification and hence could not validate the method was pure or not.

Aarhus answered 22/10, 2009 at 19:7 Comment(1)

Oh, good point. Polymorphism would definitely complicate things. Although that could be fixed by putting a "pure constraint" on the base virtual function. – Connatural 22/10, 2009 at 19:21

I think the main problem would be doing it efficiently.

D-language has pure functions but you have to specify them yourself, so the compiler would know to check them. I think if you manually specify them then it would be easier to do.

Cuirass answered 22/10, 2009 at 21:58 Comment(0)

Deciding whether a given function is pure, in general, is reducible to deciding whether any given program will halt - and it is well known that the Halting Problem is the kind of problem that cannot be solved efficiently.

Clayson answered 22/10, 2009 at 22:4 Comment(4)

I know about the halting problem, but why do you say it's equivalent? – Connatural 22/10, 2009 at 22:25

Suppose I write a program P which simulates the execution of some input function F (i.e. P is an interpreter). Suppose P is written such that it halts upon complete evaluation of F and immediately after executing any impure step of F (with an output indicating why it halted). Because some F might have the form f a = f a - Haskell syntax for a function with one argument that simply calls itself with the same argument recursively ad infinitum - there is some F for which P executing F will not halt. Thus, the OP's question is reducible to the halting problem. – Clayson 23/10, 2009 at 1:32

Note that in Haskell, if we get rid of the language's loopholes permitting certain impure code inside otherwise pure code, the compiler can easily determine what code is for sure pure and what code is not for sure pure simply by looking at the types. Haskell represents impurity at the type level. – Clayson 23/10, 2009 at 1:35

Generally speaking, pure functions in computing don't exclude non-terminating (or exceptionally terminating) functions for this very reason. Haskell actually defines nontermination to be a special kind of return value (the 'bottom') which effectively poisons any operation that attempts to inspect it. – Encumber 23/10, 2009 at 3:52

Note that the complexity depends on the language, too. For the more dynamic languages, it's possible to redefine anything at any time. For example, in Tcl

proc myproc {a b} {
    if { $a > $b } {
        return $a
    } else {
        return $b
    }
}

Every single piece of that could be modified at any time. For example:

the "if" command could be rewritten to use and update global variables
the "return" command, along the same lines, could do the same thing
the could be an execution trace on the if command that, when "if" is used, the return command is redefined based on the inputs to the if command

Admittedly, Tcl is an extreme case; one of the most dynamic languages there is. That being said, it highlights the problem that it can be difficult to determine the purity of a function even once you've entered it.

Hinz answered 23/10, 2009 at 3:49 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags