Algorithmically identifying pure functions in javascript
Asked Answered
T

3

8

Is it possible to determine whether or not a javascript function is "pure", using javascript?

A function is said to be pure when it behaves in a predictable way, in the sense that for each x, the function will always return the same associated y value (i.e a single-valued map).

For example, a pure function:

function pure(x) {
  return x * x;
}

And impure:

var x = 0;
function impure(y) {
  x = x + y;
  return x++;
}

While it's easy to tell here that impure(0) !== impure(0), it isn't as apparent with a function such as:

function weird(x) {
  if (x === "specificThing") {
    return false;
  } else {
    return true;
  }
}

or

var count = 0;
function surprise(x) {
  count++;
  if (count === 10e10 && x === 0) {
    return true;
  } else {
    return false;
  }
}

Another way of asking this is, is it possible to determine whether or not a javascript function is "impure", using javascript?

Theoretically it may be possible or impossible, but practically what steps can one take to start to determine this, maybe given a set of constraints or assumptions?

Another definition of purity includes the caveat that non-local variables may not be changed, but I'd like to consider that a separate problem. In this case we are considering functions that map consistent inputs to consistent outputs.

Tanta answered 24/4, 2015 at 22:40 Comment(17)
Theoretically? Sure. Practically? No idea. Someone may have written a parser to do such a thing. Build a parse tree, mark certain things known to be impure (e.g. Math.random()), then work your way up. It would certainly be interesting to see if someone has written a tool to do this.Corroboree
Are you asking if this can be determined without looking at the source code for the function? Or, what inputs are you allowed? Certainly a person can study the code and determine if there are any conditions where it might have impure behaviors. So, are you asking if a computer could study the code to tell you this?Carbonado
Related: programmers.stackexchange.com/questions/176761/… - nutshell version, you'd have to solve the halting problem to do it.Oletta
Purity can be determined recursively: a function is pure if it does not refer to any free variables, and doesn't use any impure functions or operators. But I don't know how you would determine this IN Javascript.Hydrastinine
@Carbonado Looking at the source code is certainly allowed. Exactly right in asking if a computer could study the code (using javascript) and determine its purity.Tanta
That Programmers question has a link to this SO question: #1609803Hydrastinine
@Hydrastinine In JS it's especially interesting to me because of the ability to change functions on the fly. Also, what's a "free" variable?Tanta
A free variable a non-local variable. E.g. count in your surprise function.Hydrastinine
@Hydrastinine But a function can still be pure even if it refers to a non-local variable. For instance, if count was in an essentially no-op line of code. Or for example if count++; was still there, but it was removed from the if statement.Tanta
As explained in the Programmers question, you can't really determine whether a function is truly impure, just whether it's potentially impure. You have to solve the halting problem to determine if it's truly impure.Hydrastinine
It's not a "theoretically or not" type of issue. Consider a function n() that sometime during it's execution called m(), and n() is redefined to a different function. n() purity is mutable in this case, and there's plenty of use-cases for functions calling other functions.Oletta
2¢ - My understanding is a function can only be pure if there are no "side-effects" - or that a call to it could be considered a thing in itself. So, that excludes functions containing global variables and calls to other functions.Poser
@Oletta Could you then raise the question of "current purity" of a function? Any pure function would be unable to make itself impure, but an impure function could become pure through a redefinition.Tanta
@Hydrastinine Thinking about the halting requirement more, the user's answer is coming from the perspective that the isPure function needs to return something in all cases. Practically, I'm happy with taking a function that takes too long and simply saying it can't determine the purity of a sufficiently complex function. Also from a practical point of view, I'm interested in knowing what criteria are easiest/most important for determining if a function is definitively impure.Tanta
Imagine a function with an if statement, where one branch is pure, the other is impure, and the if condition is whether we're in the first second of a century. Would you consider it pure or impure?Hydrastinine
@Hydrastinine I would consider it impure. What I'm interested in is how would a checker easily understand that it's possible to go down an impure code path? In this case, could a regex recognize a check for time, and recognize that the time looked for will happen in the future?Tanta
Wow, of all the things that isn't a job for regexp, this is near the top. Like I said earlier, this needs to be a recursive process. You need to go through the code, and for every function it calls you need to perform an isPure test on that function. The bottom of the recursion would be the set of built-in operators and functions, which are statically declared pure and impure.Hydrastinine
B
5

JavaScript is Turing complete, so it's just as able to parse and analyse JavaScript as any other programming language. So, the question is really: "can JavaScript functions be automatically tested for "purity", at all?"

Short answer

Only sometimes.

Long answer

For some functions, for which the AST is straight forward and all the symbols are contained, definitely. Something like function(X) { return X * X; } is provably pure (for primitive input) because the only variables used in the function body are variables that are passed in as function arguments. This function does not rely on any additional API calls, but pure arithmetics. We can most definitely show it's pure.

Things change when we allow arbitrary content, because JavaScript has no explicit types, and instead will happily type-coerce from complex to primitive data type (or even from primitive to primitive data type) when it needs to perform an operation that can't have that operation applied. Calling our above function with an object, instead of a number, performs further function calls underwater, and those functions are not guaranteed pure at all (see Andreas's answer on this, too)

The vast majority of JS functions are nothing like our simple function. For most functions, we need to prove not only that they're pure, but that all the functions they call internally are pure, too. Now we're running into the halting problem. Let's take a ridiculously simple example:

function(x) {
  if (typeof window !== "undefined") {
    return x * x;
  }
  return x * x * x;
}

Is this pure? Well, if we run this in the browser, then in the browser it's pure, since window is always defined. But in something like Node.js, it might be pure, but it might not be: we can't prove it is, nor can we prove it is not, because we cannot prove that this mysterious window variable exists when the function runs. While Node.js has no global window variable, we can trivially introduce one whenever we like, and the function's behaviour will change. Now we're suddenly faced with proving whether or not the entirety of our code will ever introduce a window variable (and that may be very creatively done, like global["win" + _abc] = true where _abc is the string "dow"). This is a lost cause.

The rabbit hole runs deep, and reading about the halting problem will make you appreciate how many difference faces the halting problem has.

Butterflies answered 25/4, 2015 at 5:9 Comment(1)
Unfortunately, the first function is, in fact, not pure either. See my reply.Canea
C
5

Even your first function isn't pure, because in JavaScript * may invoke a ToNumber conversion, which may invoke arbitrary user code, if x happens to be an object with a user-defined toString or valueOf method, or somebody happened to monkey-patch Object.prototype.

The sad truth is that in JS almost nothing can be proved pure. The only operations that can never have side effects are ===, !==, !, &&, ||, and typeof. This is a huge problem for optimisations in compilers, btw.

Canea answered 25/4, 2015 at 15:50 Comment(0)
S
2

sample code. limitation: can only *guess* if some code is pure = can only give a hint, but no guarantee

/* programmatically guess if some javascript code is pure or impure
npm install acorn acorn-walk
license is CC0-1.0 */

const acorn_parse = require("acorn").parse;
const acorn_walk = require("acorn-walk");



// the code to analyze
const content = `
  ['k1'].map(key => {
    const val = data[key];
    let local1;
    var local2 = 2;
    var local3, local4;
    global1[key] = val; // find me
    global2.key = val; // find me
    global3.push(val); // find me
    global4.pop(); // find me
    global5.shift(); // find me
    global6 = 'impure'; // find me
    const local7 = global7[1234];
    var local8 = global8.map(x => 2*x);
    var local9 = global9.filter(Boolean);
    const local10 = global10.pop(); // find me
    local1 = 'ok';
    local2.push('ok');
    return [key, val];
  })
`;



// method names for our educated guess
const write_method_set = new Set(['push', 'pop', 'shift']);
const read_method_set = new Set(['map', 'filter', 'reduce', 'forEach']);

const is_write_method = method_name => write_method_set.has(method_name);
const is_read_method = method_name => read_method_set.has(method_name);
const is_local = name => (name != undefined && local_var_set.has(name));
const get_src = node => content.substring(node.start, node.end);

function test_assign(node, left_name) {
  if (left_name == undefined) {
    console.log(`TODO implement: detect write access in:`);
    console.dir(node);
    return;
  }
  if (!is_local(left_name)) console.log(`impure write access to global ${left_name}: ${get_src(node)}`);
  else console.log(`pure? write access to local ${left_name}: ${get_src(node)}`);
}

function test_call(node, left_name, method_name) {
  if (left_name == undefined) {
    console.log(`TODO implement: detect write access in:`)
    console.dir(node);
    return;
  }
  if (is_read_method(method_name)) return console.log(`pure? access to ${left_name}: ${get_src(node)}`);
  if (!is_local(left_name)) {
    if (is_write_method(method_name)) console.log(`impure write access to global ${left_name}: ${get_src(node)}`);
    else console.log(`pure? access to global ${left_name}: ${get_src(node)}`);
  }
  else console.log(`pure? write access to local ${left_name}: ${get_src(node)}`)
}



const local_var_set = new Set();

// throws on syntax error
let ast = acorn_parse(content, { ecmaVersion: 2020, sourceType: "module" });

acorn_walk.full(ast, (node, state, type) => {
  if (node.type == 'VariableDeclaration') {
    node.declarations.forEach(d => {
      local_var_set.add(d.id.name);
      console.log(`declare local: ${d.id.name}`);
    });
  }
  else if (node.type == 'AssignmentExpression') {
    const left_name =
      node.left.type == 'Identifier' ? node.left.name :
      node.left.type == 'MemberExpression' ? node.left.object.name :
      undefined
    ;
    test_assign(node, left_name);
  }
  else if (node.type == 'CallExpression') {
    if (node.callee.object.type == 'ArrayExpression') return; // simply ignore
    const left_name =
      node.callee.type == 'MemberExpression' ? node.callee.object.name :
      undefined
    ;
    const method_name =
      node.callee.type == 'MemberExpression' ? node.callee.property.name :
      undefined
    ;
    test_call(node, left_name, method_name);
  }
  //else console.dir(node);
});

sample output

$ node test.js | grep impure
impure write access to global global1: global1[key] = val
impure write access to global global2: global2.key = val
impure write access to global global3: global3.push(val)
impure write access to global global4: global4.pop()
impure write access to global global5: global5.shift()
impure write access to global global6: global6 = 'impure'
impure write access to global global10: global10.pop()
Silber answered 25/2, 2021 at 16:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.