Can a pure function have free variables?

Asked 7/4, 2014 at 14:9 Answered 8/4, 2014 at 12:50

haskell functional-programming pure-function free-variable

For example, a referentially transparent function with no free variables:

g op x y = x `op` y

A now now a function with the free (from the point-of-view of f) variables op and x:

x = 1
op = (+)
f y = x `op` y

f is also referentially transparent. But is it a pure function?

If it's not a pure function, what is the name for a function that is referentially transparent, but makes use of 1 or more variables bound in an enclosing scope?

Motivation for this question:

It's not clear to me from Wikipedia's article:

The result value need not depend on all (or any) of the argument values. However, it must depend on nothing other than the argument values.

(emphasis mine)

nor from Google searches whether a pure function can depend on free (in the sense of being bound in an enclosing scope, and not being bound in the scope of the function) variables.

Also, this book says:

If functions without free variables are pure, are closures impure?

The function function (y) { return x } is interesting. It contains a free variable, x. A free variable is one that is not bound within the function. Up to now, we’ve only seen one way to “bind” a variable, namely by passing in an argument with the same name. Since the function function (y) { return x } doesn’t have an argument named x, the variable x isn’t bound in this function, which makes it “free.”

Now that we know that variables used in a function are either bound or free, we can bifurcate functions into those with free variables and those without:

Functions containing no free variables are called pure functions.

Functions containing one or more free variables are called closures.

So what is the definition of a "pure function"?

Busse answered 7/4, 2014 at 14:9 Comment(23)

Is this a question about pure functions in general, or in Haskell? I'm not sure you would get the same answers. – Coherent 7/4, 2014 at 14:13

I'm not sure if "pure" is well-defined here. It's certainly "pure" (as in, no side effects) in the haskell sense. – Quincuncial 7/4, 2014 at 14:18

@Quincuncial thanks for pointing that out, I've edited to use "pure function"! – Busse 7/4, 2014 at 14:20

@Riccardo good point, I hadn't thought about that. Would "pure function" mean something different in Haskell than in, say, Javascript (with the obvious caveat that Javascript doesn't enforce purity)? – Busse 7/4, 2014 at 14:21

@MattFenwick I thought more about haskell vs theoretical computer science. – Coherent 7/4, 2014 at 14:22

@Riccardo another good point; I wasn't aware that there would be a difference in the meaning of "pure function" between those two! – Busse 7/4, 2014 at 14:25

It's not that, it's that you might or might not respect the requirements to call a function "pure" (i.e. respecting referential transparency) depending on the semantics of the language you are using. – Coherent 7/4, 2014 at 14:26

in mathematics, and in haskell, there are no "variables" (in the sense of imperative programming languages). There are constants (e.g., the Haskell name snd has a constant meaning), and local constants (as in f x = 3*x + 1, here x is constant in each call of f). In mathematics, or constraint programming, sometimes names denote unknowns (as in "find x with 3*x + 1 == 7" - here x is still a constant, but the value of which we do not immediately know) – Prothalamion 7/4, 2014 at 14:29

@Quincuncial I'm not really sure that "absence of side effects" is a real definition of "pure" even in Haskell. It's more just a simple way of getting the intuition across. – Mutation 7/4, 2014 at 14:29

@Prothalamion yes, we all know that. That's not really what the question is asking about. – Busse 7/4, 2014 at 14:30

Matt Fenwick is asking a question about the definition of pure functions??. Pure functions don't hold state, and they don't cause side-effects. That's it, AFAIK. That would allow internal "free" variables, but not external, global, static, or closed-over variables that are mutated. – Phenosafranine 7/4, 2014 at 14:30

@RobertHarvey that's what I thought, too. However, I've recently come across sources (such as the Wikipedia article) that seem to imply something different, and so I attempted to find a more authoritative reference for a definition of "pure function". Unfortunately, I haven't been able to find one yet. – Busse 7/4, 2014 at 14:33

The Wikipedia definition says that you can refer to external variables all day long, so long as you don't modify them. As soon as something external is mutated, or state is held in the function that changes the output for subsequent calls, it is rendered impure. – Phenosafranine 7/4, 2014 at 14:34

@J.Abrahamson What I meant to say was that I've never come across an actual definition of pure; I've always thought about it as a way of emphasizing that the function is referentially transparent. – Quincuncial 7/4, 2014 at 14:42

@RobertHarvey That's a pretty hand-wavy definition, I believe. It certainly doesn't link except tenuously to the Quinian definition of referentially transparent. There's something deeper at play, at the very least a level mixup between syntactic purity and semantic purity. An expression with a free variable doesn't even have a semantics until it's located in an expression which binds that variable. Side-effects are inherently semantic. – Mutation 7/4, 2014 at 14:43

@Quincuncial I agree that that's how it's used, and I've been hunting for a real definition of the term for a little while as well. Supposedly there's a good one in Filinksi's "Declarative Continuations and Categorical Duality" which I might repeat as an answer, but I haven't honestly digested that paper. – Mutation 7/4, 2014 at 14:44

@Riccardo I guess what I'm asking is "what are the requirements to call a function 'pure'?" – Busse 7/4, 2014 at 15:1

@J.Abrahamson: There's nothing at all hand-wavy about it. A pure function is one that accepts one or more inputs and produces one or more outputs, produces the same output result every time (given the same input), and doesn't affect anything in its external environment. It seems too simple because we're programmers and we love to complicate things, but that's really all there is to it. – Phenosafranine 7/4, 2014 at 15:45

@RobertHarvey I think when you're talking about things that this level you end up having to define really annoying things like "same", "accept", "produce", and (especially) "external environment". Practically, I agree with you, but I don't think that notion meshes well with a question that asks about open lambda terms. I've given my attempt to explain all the pieces I know about as an answer. – Mutation 7/4, 2014 at 16:30

Also I'll note that the people who've spent the most time being dissatisfied with such answers were more analytic philosophers, linguists, and theoretical computer scientists over programmers. – Mutation 7/4, 2014 at 16:31

@RobertHarvey might I ask for a reference? As I said before, I don't disagree at all, I just would like to find an authoritative source and do some more background reading. Unfortunately, I don't feel that Wikipedia is the best source for precise, authoritative definitions ... :( – Busse 7/4, 2014 at 23:56

@MattFenwick: Oddly enough, your evil twin posted a nearly identical question today here: programmers.stackexchange.com/q/235175/1204... although if you're looking for an answer with some mathematical rigor, I fear that you may still be disappointed. – Phenosafranine 8/4, 2014 at 0:4

Short answers: In a pure functional programming language: Yes, as long as the type of the free variable doesn't feature IO in a positive position. In a non-pure functional programming language, no, since the free variable can have side effects. – Bryson 13/4, 2014 at 23:0

To the best of my understanding "purity" is defined at the level of semantics while "referentially transparent" can take meaning both syntactically and embedded in lambda calculus substitution rules. Defining either one also leads to a bit of a challenge in that we need to have a robust notion of equality of programs which can be challenging. Finally, it's important to note that the idea of a free variable is entirely syntactic—once you've gone to a value domain you can no longer have expressions with free variables—they must be bound else that's a syntax error.

But let's dive in and see if this becomes more clear.

Quinian Referential Transparency

We can define referential transparency very broadly as a property of a syntactic context. Per the original definition, this would be built from a sentence like

New York is an American city.

of which we've poked a hole

_ is an American city.

Such a holey-sentence, a "context", is said to be referentially transparent if, given two sentence fragments which both "refer" to the same thing, filling the context with either of those two does not change its meaning.

To be clear, two fragments with the same reference we can pick would be "New York" and "The Big Apple". Injecting those fragments we write

New York is an American city.
The Big Apple is an American city.

suggesting that

_ is an American city.

is referentially transparent. To demonstrate the quintessential counterexample, we might write

"The Big Apple" is an apple-themed epithet referring to New York.

and consider the context

"_" is an apple-themed epithet referring to New York.

and now when we inject the two referentially identical phrases we get one valid and one invalid sentence

"The Big Apple" is an apple-themed epithet referring to New York.
"New York" is an apple-themed epithet referring to New York.

In other words, quotations break referential transparency. We can see how this occurs by causing the sentence to refer to a syntactic construct instead of purely the meaning of that construct. This notion will return later.

Syntax v Semantics

There's something confusing going on in that this definition of referential transparency above applies directly to English sentences of which we build contexts by literally stripping words out. While we can do that in a programming language and consider whether such a context is referentially transparent, we also might recognize that this idea of "substitution" is critical to the very notion of a computer language.

So, let's be clear: there are two kinds of referential transparency we can consider over lambda calculus—the syntactic one and the semantic one. The syntactic one requires we define "contexts" as holes in the literal words written in a programming language. That lets us consider holes like

let x = 3 in _

and fill it in with things like "x". We'll leave the analysis of that replacement for later. At the semantic level we use lambda terms to denote contexts

\x -> x + 3         --    similar to the context "_ + 3"

and are restricted to filling in the hole not with syntax fragments but instead only valid semantic values, the action of that being performed by application

(\x -> x + 3) 5 
==> 
5 + 3
==> 
8

So, when someone refers to referential transparency in Haskell it's important to figure out what kind of referential transparency they're referring to.

Which kind is being referred to in this question? Since it's about the notion of an expression containing a free variable, I'm going to suggest that it's syntactic. There are two major thrusts for my reasoning here. Firstly, in order to convert a syntax to a semantics requires that the syntax be valid. In the case of Haskell this means both syntactic validity and a successfully type check. However, we'll note that a program fragment like

x + 3

is actually a syntax error since x is simply unknown, unbound leaving us unable to consider the semantics of it as a Haskell program. Secondly, the very notion of a variable such as one that can be let-bound (and consider the difference between "variable" as it refers to a "slot" such as an IORef) is entirely a syntactic construct—there's no way to even talk about them from inside the semantics of a Haskell program.

So let's refine the question to be:

Can an expression containing free variables be (syntactically) referentially transparent?

and the answer is, uninterestingly, no. Referential transparency is a property of "contexts", not expressions. So let's explore the notion of free variables in contexts instead.

Free variable contexts

How can a context meaningfully have a free variable? It could be beside the hole

E1 ... x ... _ ... E2

and so long as we cannot insert something into that syntactic hole which "reaches over" and affects x syntactically then we're fine. So, for instance, if we fill that hole with something like

E1 ... x ... let x = 3 in E ... E2

then we haven't "captured" the x and thus can perhaps consider that syntactic hole to be referentially transparent. However, we're being nice to our syntax. Let's consider a more dangerous example

do x <- foo
   let x = 3
   _
   return x

Now we see that the hole we've provided in some sense has dominion over the later phrase "return x". In fact, if we inject a fragment like "let x = 4" then it indeed changes the meaning of the whole. In that sense, the syntax here is no referentially transparent.

Another interesting interaction between referential transparency and free variables is the notion of an assigning context like

let x = 3 in _

where, from an outside perspective, both phrases "x" and "y" are reference the same thing, some named variable, but

let x = 3 in x   ==/==  let x = 3 in y

Progression from thorniness around equality and context

Now, hopefully the previous section explained a few ways for referential transparency to break under various kinds of syntactic contexts. It's worth asking harder questions about what kinds of contexts are valid and what kinds of expressions are equivalent. For instance, we might desugar our do notation in a previous example and end up noticing that we weren't working with a genuine context, but instead sort of a higher-order context

foo >>= \x -> (let x = 3 in ____(return x)_____)

Is this a valid notion of context? It depends a lot on what kind of meaning we're giving the program. The notion of desugaring the syntax already implies that the syntax must be well-defined enough to allow for such desugaring.

As a general rule, we must be very careful with defining both contexts and notions of equality. Further, the more meaning we demand the fragments of our language to take on the greater the ways they can be equal and the fewer the valid contexts we can build.

Ultimately, this leads us all the way to what I called "semantic referential transparency" earlier where we can only substitute proper values into a proper, closed lambda expression and we take the resulting equality to be "equality as programs".

What this ends up meaning is that as we impute more and more meaning on our language, as we begin to accept fewer and fewer things as valid, we get stronger and stronger guarantees about referential transparency.

Purity

And so this finally leads to the notion of a pure function. My understanding here is (even) less complete, but it's worth noting that purity, as a concept, does not much exist until we've moved to a very rich semantic space—that of Haskell semantics as a category over lifted Complete Partial Orders.

If that doesn't make much sense, then just imagine purity is a concept that only exists when talking about Haskell values as functions and equality of programs. In particular, we examine the collection of Haskell functions

trivial :: a -> ()
trivial x = x `seq` ()

where we have a trivial function for every choice of a. We'll notate the specific choice using an underscore

trivial_Int :: Int -> ()
trivial_Int x = x `seq` ()

Now we can define a (very strictly) pure function to be a function f :: a -> b such that

trivial_b . f = trivial_a

In other words, if we throw out the result of computing our function, the b, then we may as well have never computed it in the first place.

Again, there's no notion of purity without having Haskell values and no notion of Haskell values when your expressions contain free variables (since it's a syntax error).

So what's the answer?

Ultimately, the answer is that you can't talk about purity around free variables and you can break referential transparency in lots of ways whenever you are talking about syntax. At some point as you convert your syntactic representation to its semantic denotation you must forget the notion and names of free variables in order to have them represent the reduction semantics of lambda terms and by this point we've begun to have referential transparency.

Finally, purity is something even more stringent than referential transparency having to do with even the reduction characteristics of your (referentially transparent) lambda terms.

By the definition of purity given above, most of Haskell isn't pure itself as Haskell may represent non-termination. Many feel that this is a better definition of purity, however, as non-termination can be considered a side effect of computation instead of a meaningful resultant value.

Mutation answered 7/4, 2014 at 16:20 Comment(8)

Wow, awesome answer! I'm still digesting it, but I feel that I am already more clear about what exactly my question is -- as I've edited into the OP, I'm trying to figure out whether a function that makes use of variables bound in an enclosing scope is considered to be a pure function. Again, thanks for the response, and I'm going to read it a couple more times and hopefully I'll then be able to answer my own question! – Busse 7/4, 2014 at 16:42

With the revisions in the OP I'll suggest that those functions are referentially transparent, but I'll note that the answer to that question depends crucially on what it means to be a variable in your language. The distinction between a "mutable slot" and a "mathematical variable" is really vital. – Mutation 7/4, 2014 at 17:0

Okay, I see -- for Haskell, it would be a "mathematical variable", right? – Busse 7/4, 2014 at 17:14

Yep, absolutely! Variables in lambda calculus and Haskell act more as "unknowns" or just targets for substitution/reduction—they don't vary. That lets us consider them as constant values when they're found bound in expressions and thus we cannot use them for anything besides their value (their reference) making them referentially transparent. – Mutation 7/4, 2014 at 18:59

BTW, in case you're wondering, what precipitated the OP was this book, specifically, this passage: "Now that we know that variables used in a function are either bound or free, we can bifurcate functions into those with free variables and those without: 1) Functions containing no free variables are called pure functions. 2) Functions containing one or more free variables are called closures." – Busse 9/4, 2014 at 11:25

Ah! His usage of "bound" and "free" is different from mine. Also, Javascript has different semantics about "closures". I think I spoke to this issue directly in another answer: programmers.stackexchange.com/a/235320/55186 – Mutation 9/4, 2014 at 12:31

@J.Abrahamson another great answer! So for a Javascript function which makes use of variables bound in an enclosing scope which aren't mutated, you would consider that a "pure function" -- do I have that right? – Busse 9/4, 2014 at 12:56

I would say that using them completely as constants is pure, yep. If your function can observe global mutation then it breaks referential transparency (and probably purity as well), so beware of global scope. I think that arises in Javascript sort of subtly, but I'm not an expert there. – Mutation 9/4, 2014 at 13:43

The Wikipedia definition is incomplete, insofar a pure function may use constants to compute its answer.

When we look at

increment n = 1+n

this is obvious. Perhaps it was not mentioned because it is that obvious.

Now the trick in Haskell is that not only top level values and functions are constants, but inside a closure also the variables(!) closed over:

add x = (\y -> x+y)

Here x stands for the value we applied add to - we call it variable not because it could change within the right hand side of add, but because it can be different each time we apply add. And yet, from the point of view of the lambda, x is a constant.

It follows that free variables always name constant values at the point where they are used and hence do not impact purity.

Silvester answered 8/4, 2014 at 12:50 Comment(0)

Short answer is YES f is pure

In Haskell map is defined with foldr. Would you agree that map is functional? If so did it matter that it had global function foldr that wasn't supplied to map as an argument?

In map foldr is a free variable. It's not doubt about it. It makes no difference that it's a function or something that evaluates to a value. It's the same.

Free variables, like the functions foldl and +, are essential for functional languages to exist. Without it you wouldn't have abstraction and the languages would be worse off than the Fortran.

Farny answered 7/4, 2014 at 18:17 Comment(7)

I agree -- I think it's a pure function as well. However, I can't back that up with a reference and so that's why I'm looking for the precise definition of "pure function". – Busse 7/4, 2014 at 18:57

@MattFenwick the root of all functional languages are lambda calculus and it has free variables. Perhaps you'll find the theory close to that subject. – Farny 7/4, 2014 at 19:35

unfortunately, none of the resources I've found on lambda calculus don't give a precise definition of "pure function". :( – Busse 7/4, 2014 at 23:54

It's sadly really dodgy, pure. It's clear that it means something broadly—Haskell execution is very pure compared to most other languages—but it's hard to pin it down. I'm going to read Filinski's "Declarative Continuations and Categorical Duality" in earnest someday and really try to pin it down. A lot of the challenge has to do with defining what it means to have two programs be equivalent, too. If you're careful with the answer I gave you'll see I skirt around that issue a bit (a lot). – Mutation 8/4, 2014 at 19:26

@Farny I disagree that foldr is a free variable in the definition of map—the very meaning of map is intimately tied to the context it was defined in. If they were genuinely free then we could rename them without issue, but \f -> foldr ((:) . f) [] is not the same as \f -> q (z . f) r unless we establish those equalities in context. – Mutation 8/4, 2014 at 19:29

We can talk about map = \f -> foldr ((:) . f) [] as an isolated fragment of lambda calculus and determine that foldr, (:), and [] are free, but that fragment isn't the function map yet, just some open lambda expression we happened to name map. – Mutation 8/4, 2014 at 19:30

@J.Abrahamson I'm learning a ton from this discussion! I definitely was using the term "free variable" casually and I'm glad that you've pointed out the difference between free meaning "bound in an enclosing scope" and free meaning "not bound at all". I'm not sure, but I believe the author of the Javascript book that I linked to was using "free" in the "bound in an enclosing scope" sense. – Busse 9/4, 2014 at 11:34

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++