Exactly what rules must a function abide before we can call it "idempotent"?
Asked Answered
T

4

9

A post from another thread says that a function is said to be idempotent if it can be called multiple times without changing the result.

However the terms used (like no-side-effects and return-same-results) are relatively ambiguous. Consider this piece of code:

public class test {

    int x = 0;
    java.util.Random r = new java.util.Random();

    public int F() {
        return x + 1;
    }

    public void F2() {
        x = r.nextInt();
    }
}

Can we say that F() is idempotent because successive calls to F() returns the same value?

Or is it not idempotent since successive calls to F() does not return the same value if F2() is called inbetween?

PS: "idempotent" as defined in computer science, not maths.

Tetragon answered 2/2, 2012 at 11:32 Comment(13)
I'm sorry, what is ambiguous about "returns the same result" or "no side effects"?Regularize
@Regularize My function F() always returns the same result and has no side effects no matter how many times you call it. Unless you do something else like call F2(), then it returns a different result from the previous result but even so, calling it multiple times will give you the same result thereafter and still has no side effects.Tetragon
Why the downvote on this question? It's useful, clear, and shows some research effort.Neuromuscular
@Pacerier: It does not always return the same result, because there is a circumstance in which two calls could return a different result. There is no ambiguity here: that is the ordinary meaning of "always".Regularize
@Tetragon - Your F() method is non-deterministic because it relies on an internal variable that can be modified in at least one way in your program.Neuromuscular
F() is not even a function (i.e., where is the argument?) And its return value depends on when (in relation to other code) it is called. Contrast this with the sqrt() function, for example. Do you believe it's output changes just because you called sin() in between?Wraf
@Wraf I know it isn't a maths function but surely it's a CS function ?Tetragon
@Ingo: sqrt's output can change if I call fesetround.Choong
@Steve - how do you know this?Wraf
@Ingo: Um. The C standard is actually pretty lax about the accuracy of floating-point functions, and fesetround isn't even guaranteed to actually do anything, so it's not guaranteed that it does actually change the results. But assuming an implementation that returns the correctly-rounded result from sqrt, which is permitted by the standard, then I should think it's obvious that the current rounding mode affects the results. That's what I mean when I say the output "can change". Admittedly only by 1 ulp, but that's still a different result.Choong
I think I've learned more from this question than any other question on SE.Adara
@Steve - ahh, I see, you assume I use C and the FPU in my sqrt function. :)Wraf
@Ingo: sorry, you're right, I did just start talking about the standard C function sqrt without saying so. Which is a bit naughty of me in a Java question.Choong
A
10

I'm going to disagree (actually, I now see that I agree with them!) with the other answers and say that, in the most common computer science use (function calls with side-effects, not functional programming), a function is idempotent if you can safely replace any invocation of the function with calling the function twice and keeping only the second result.

For example, consider two functions for deleting a file by name:

1) A function that returns success if the file does not exist. (Since the purpose of a delete operation is to make the file not exist.)

2) A function that returns a "file does not exist" error if the file does not exist. (Since the file could not be deleted.)

The first one is idempotent. If you call it and ignore the result, you can call it again and still get correct information. The file does not exist when you are done.

The second one is not idempotent. If you call it once and ignore the result, your second call will fail, making you think you did not delete the file.

A function to get the current time is idempotent under this definition, even though the result may be different. There is no harm to calling the function twice.

The concept is important in client-server protocols. Say you send a command and get no reply, maybe the connection breaks, maybe the server crashed. So you send the command again and get a reply. But on the first command, was the command lost or the reply? If the command is idempotent, it doesn't matter. You can just use the result.

If a protocol guarantees that all operations are idempotent, lower-level code can retry failed operations, switch servers, and otherwise try to "make things work" without breaking the semantics of the operations.

It takes some doing to make an idempotent protocol. For example, you might wonder how you make a sensible "delete file" operation. One way is to assign each file a unique ID that changes when the file is deleted and recreated. Then you split a delete into two halves. The first, "get ID from name" is idempotent and fails if the file does not exist. The second, "delete ID if it exists" is idempotent and succeeds if you, or anyone else, deleted the file. (The one quirk is this doesn't tell you for sure that you were the one to delete the file.) The combination of the two idempotent operations provides the desired non-idempotent delete operation.

Adara answered 2/2, 2012 at 11:40 Comment(19)
Yeah, I think that answers the question that was asked. I hate it when computer scientists don't have the imagination to invent their own words for things and muddy the waters by stealing words from mathematicians! Anyway, +1.Expediency
So to confirm, you are saying that Steve's answer is incorrect?Tetragon
@Tetragon Incorrect in the sense that it probably doesn't explain the use of idempotent that the OP was asking about.Adara
@DavidSchwartz Ok thanks, I'll wait for a tie-breaker before accepting an answer because I really don't know who is rightTetragon
Well, the answer to who is right hinges on the answer to this question: Why did you care if the function is idempotent? (If you're just trying to learn how the word is used, all the answers are right. See David's comment to Steve's answer about meaning different things.)Adara
@DavidHeffernan: it's not lack of imagination, it's that an imperative "function" doesn't just operate on its arguments, it operates on the whole state of the execution environment. So the delete that succeeds if the file doesn't exist is idempotent because the state of the machine is the same after calling it twice as it was after the first call. That is, Del(Del(x)) == Del(x), where x is the initial state of the machine. But the "ENOENT" version is not idempotent because Del(x) is a state in which the return register is 0, whereas Del(Del(x)) is a state in which it's ENOENT.Choong
@SteveJessop: What if the states are different but the difference has no semantic relevance? Is a function not idempotent just because it keeps an internal counter of how many times it is called? Or, to make it more interesting, is a function that returns a random number within a specified range idempotent? It can return different results, but all equally semantically valid.Adara
In case it isn't obvious, Del here is the operation, "call the delete function with some particular arguments", and is a function mapping one machine state to another. So the abuse of terminology, if any, is to describe the "delete function" as idempotent in cases where what we actually mean is that calling the delete function is idempotent.Choong
@SteveJessop I've no idea what you mean here. The original meaning of the word was for functions or operators that satisfy f(f(x))=f(x). The CS meanings are surely very useful concepts, they just bear absolutely no relation whatsoever to the original meaning of the word. Therefore it's confusing to re-use the word. I'd say lack of imagination is exactly what it is.Expediency
@DavidSchwartz: if your internal counter is never actually used then I would say that the function is not idempotent, but that most languages would permit, as an optimization, that it be transformed to a function that is (by stripping the counter away entirely). But if I have omniscience (or a debugger), then nothing is "never actually used" because the counter can be read at any time. If your random function uses a true random source then we're no longer on a deterministic machine, so computation models have to adjust. I don't know the answer, I'm not a CompSci.Choong
@DavidHeffernan Steve's right, they're the same thing. f(f(x))=f(x) means calling the function and then calling it again has the same effect as just calling it once.Adara
@DavidHeffernan: there's a complete resemblence. Check my second comment, the confusion is that considering side-effects as part of "idempotence" is a result of a particular computation model, in which any statement (including a subroutine call) is considered as a function whose domain and codomain are the set of all machine states. So it's the call to the subroutine that's being described as idempotent, not the subroutine (aka "function") itself. This goes without saying (or is said once very early), since in any case you can't even call unlink(unlink("readme.txt")), the types are wrong.Choong
@DavidSchwartz: Thinking about it more, I still don't claim to have the theory to answer this but I suspect we can consider non-deterministic computation in two ways. In model 1, machine state is all the memory and registers and stuff, and the operation Rand puts the machine into one of 2^32 possible distinct states. It is not idempotent. In model 2, machine state is a probability distribution of configurations of stuff, and the operation Rand puts the machine in a state equal to its initial state except the return register is now uniformly distributed on [0,2^32-1]. It is idempotent.Choong
@SteveJessop I agree with you. The = really means something squishy like "just as good as" as "the same for the purpose you care about".Adara
Oh, and in model 3, it's not even a probability distribution, the state of the machine is just the set of all configurations of stuff that are possible. In which case again Rand is idempotent, and int i = true_rnd(); is exactly the same operation as int i = 0; std::cout >> i; (assuming RAND_MAX == INT_MAX). I think model 3 probably is a useful one, where we don't care about likelihood, all we care about is that we have non-deterministic inputs.Choong
Nuts, I meant std::cin >> i;.Choong
@DavidSchwartz f(f(x))=f(x) is completely different to CS idempotence. For a start the mathematical definition requires domain to be a subset of range.Expediency
@DavidHeffernan Which it always is in comp sci. The domain and range are both the set of possible machine (or world) states. A function operates on the state of the entire machine (or world) when you call it and returns the machine (or world) in some state. The notation is slightly non-standard, but the concept is the same. (If you don't see why, consider f(x)=y -- is this idempotent?)Adara
@DavidSchwartz OK, I'm on the same page as you now.Expediency
L
5

Your function is not idempotent. It can return different results. Given the same input, an idempotent function always returns the same output.

More explicitly, if f() is idempotent (according to the computer science definition) then:

int result1 = f(x);
// ... some arbitrary code
int result2 = f(x);
assert(result1 == result2);
Laser answered 2/2, 2012 at 11:36 Comment(19)
So would it be correct to say that public int f(int x) { return x+1; } is idempotent?Watkin
@Marcelo, Yes, that is idempotentLaser
This is the functional programming definition of idempotent. I don't think it's the most common computer science definition.Adara
@david I don't think that's quite what Steve means. I think he means that given y1 = F(x); y2 = F(x), y1 and y2 are always equal.Expediency
@DavidHeffernan: That is the functional programming definition of idempotent. As I said, I don't think that's the most common definition. (It may be for you if you are a mathematician or do a lot of functional programming.)Adara
@DavidSchwartz No, the functional programming defn is the same as the mathematical defn, f(f(x))=f(x) for all x.Expediency
@DavidHeffernan Wikipedia seems to agree with you.Adara
@DavidSchwartz This is what happens when everyone tries to use the same word to mean different things. It kind of defeats the purpose of language and notation. Hey ho.Expediency
I think Steve Jessop is right, these two statements are equivalent. They just use different notation to say "calling the function more than once is functionally equivalent to calling it once".Adara
@DavidSchwartz But that would depend on how do we define functionally equivalent wouldn't it?Tetragon
@Pacerier: yes, because the definition of "idempotence" involves equality, we have to define what we mean by equal. So as in the example in the comments below David Schwartz's answer, there are cases where an operation is idempotent as far as the externally-observable state/behavior of the program is concerned, but is not idempotent if we attach a debugger. You could even say that in practice no operation on a real machine is idempotent, each CPU cycle increments some cycle-counter in the hardware somewhere. If we don't care about that, we can exclude it from consideration.Choong
Probably what we care about in practice is, "is this function call idempotent in terms of the behavior that the language defines for it?".Choong
@SteveJessop So is McLeod's answer correct or Schwartz answer correct to the question, if the scope of the question is as stated in the question?Tetragon
@Pacerier: I think David Schwartz's. In the terms used by your question, the definition of "idempotent" you should use is that an operation (including a function call) is idempotent if multiple consecutive operations with nothing else in between produce the same result as a single operation. Reading globals doesn't disqualify a function from being idempotent AFAIK, although it does disqualify it from being pure.Choong
Another case to consider is the function int f(int i) { global_value += 1; return 0; }. By the definition "f(f(x)) == f(x) for all x", the function itself obviously is idempotent. But the operation of calling the function is not idempotent, since the value of global_value is different according to how many times you made the call. So, unless you can be precise which you mean there's no point saying whether a function "is idempotent" or not, and for almost all practical purposes it's the entire effect of calling the function that we care about, not just the return value.Choong
@SteveJessop But isn't it true that although the function int f(int i) { global_value += 1; return 0; } is mathematically f(f(x)) == f(x) (mathematically idempotent) it is not CS-ly f(f(x)) == f(x) (CS-ly idempotent)? So in the CS-sense the function int f(int i) { global_value += 1; return 0; } itself is not an idempotent function regardless of whether or not the operation of calling it is idempotent. Is it true/false ?Tetragon
@Pacierier: f isn't a "mathematical" function, so it's trouble if you talk about whether or not it is "mathematically idempotent". A "mathematical function" cannot access globals, there's no such thing in mathematics. To assess whether f is "mathematically" idempotent you would have to define f in mathematical terms, and to preserve the whole meaning of f you would have to add an extra input (the prior value of global_value) and an extra output (the new value). Then clearly it is not idempotent. It's unfortunate that programming languages call non-pure routines "functions".Choong
... because it makes people think that they're mathematical functions of their listed parameters, which they're not. People seem pretty comfortable with the idea that in OO languages, "this" is a secret extra parameter to methods. In exactly the same way, the state of the execution environment is a secret extra parameter to every function in an imperative language, and the state at the point the function returns is an extra output. If you don't incorporate this into your model then simple definitions like f(f(x))==f(x) copy-pasted from mathematics will give silly results.Choong
.. so f isn't "mathematically" idempotent, it's "simplistically and inappropriately applied definition-ly" idempotent. But if global_value weren't a counter being incremented, but in fact was a results cache used to memoize the function, that was not globally accessible, then it might be appropriate in certain contexts to treat f as though it were "pure", even though it isn't pure to someone who can observe the cache.Choong
C
3

Trying to summarize things that have come up in other answers and in comments:

There is only one definition of "idempotent". A function f is idempotent if and only if f(f(x)) equals f(x) for all x in the domain of f.

There is more than one definition of "equals". In a lot of contexts, we have an idea of "equivalence" which stands in for equality, and the definition of "equivalent" might be different in different contexts.

There is more than one definition of "function". In mathematics (with a conventional set-theoretic construction), a function is a set of pairs. The "domain" of the function is the set of all elements that appear in the first position of a pair. No element of the domain appears in the first position of more than one pair in the function. The "range" of the function is the set of all elements that appear in the second position of a pair. Elements of the range may appear more than once. We say that a function "maps" each element of its domain to a particular element of its range, and we write f(x) to mean "the second element of the pair in f that has x as its first element".

So, it is clear that for a function to be idempotent, its range must be a subset of its domain. Otherwise, f(f(x)) is meaningless.

In computing, and particularly in imperative languages, a function is often defined as a sequence of statements/instructions, together with some named inputs and output(s) (in most languages only one output). "Calling" a function is an imperative operation that means to execute the instructions. But instructions in an imperative language can have side-effects: they can change things other than their outputs. This concept is absent from mathematics, as well as from pure functional programming.

These imperative "functions", which I will refer to from now on as "routines", can be reconciled with the mathematical definition of a function in two ways:

  1. ignore the side-effects, and say that the routine is a function whose domain is the set of all possible combinations of argument values, and which maps those to the outputs of the routine. This stands on weak theoretical ground if the function is not "pure", that is to say if its output depends on mutable state beyond its arguments, or if it modifies state beyond its outputs. The reason is that a mathematical function by definition does not map its inputs to different outputs at different times. Nor does a mathematical function "change" things when "called", because mathematical functions aren't "called" a particular number of times. They simply "are".

  2. incorporate the side-effects into a mathematical function that describes the effect that calling the routine has on the complete state of the machine, including the outputs from the routine but also including all global state and whatnot. This is a standard trick in CS, and it means that for every statement, instruction, call to a routine, or whatever, there is a corresponding function that maps the state of the machine before the call, to the state of the machine afterwards.

Now, if we apply the definition of "idempotent" in case 1, we are assessing whether or not the mathematical function that a particular routine was designed to implement is idempotent. If the routine does anything other than implement a mathematical function, for example if it has any side-effects, then we are on very shaky ground here, and will come up with misleading results. For example, the function int f(int i) { puts("hello!"); return i; } could be regarded as being idempotent on the basis that "it's an implementation of the identity function!". And that's true if you ignore the side-effects, but that means the definition is useless for any practical purposes, because once side effects are taken into account, executing the expression f(f(0)) is a different thing from executing the expression f(0). f(f(0)) is not equivalent to f(0) even though their return values are equal, and we could only replace one with the other if we didn't care about (that part of) the output of the program.

If we apply the definition of "idempotent" to the functions of machine states in case 2, we are assessing whether or not a call to the function (with particular arguments) is an idempotent operation on the state of the machine. Then my function f above clearly is not idempotent -- the state of a machine with "hello!\n" written to its output device is not the same as the state of a machine with "hello!\nhello!\n" written to its output device. I think it's also clear that in this case, your function F is idempotent (although it isn't "pure", since its return value depends on state other than its formal parameters, and so it is not simply an implementation of a mathematical function), and your function F2 is not idempotent. If test were immutable, then we could reasonably start describing F as pure. F2 then would be invalid.

As far as I know, when compscis talk about idempotence in imperative languages, they are usually talking about whether the functions of machine states defined by case 2, are or are not idempotent. But usage can vary -- if the routine is pure then they might talk about whether the mathematical function it represents is idempotent. In a pure functional language there is no machine state to talk about, so case 2 would be inappropriate, and any use of the term "idempotent" in relation to a function must be case 1. Functions in pure functional languages are always like mathematical functions.

Choong answered 2/2, 2012 at 14:28 Comment(1)
'There is only one definition of "idempotent"': patently false, I'm afraid!Laser
A
2

I have also been trying to figuring out what idempotency actually means and I realized that there are multiple definitions of idempotency floating around. There are two camps that definitions follow, either the definition for mathematical and function programming functions or the computer science definition.

Mathematical definition: f(f(x)) = f(x) for any value x. In other words, a function is idempotent if the effect of the function is invariant under composition.

Computer Science definition: A function is idempotent if "the side effect of N > 0 identical requests is the same as for a single request". In other words, a function is idempotent if the effect is invariant over the number of calls.

For example, take an increment function defined as int f(int x) { return x+1; }. This function will fail the mathematical definition as it is not invariant under composition because f(f(x)) != f(x). On the other hand, it fits the computer science definition because, as Steve McLeod mentioned,

int result1 = f(x);
// ... some arbitrary code
int result2 = f(x);
assert(result1 == result2);

Now, back to your question, is F() in your example idempotent? I am saying yes F() is idempotent but a sequence of calls may not be. According to the HTTP/1.1 protocol definition of idempotency, "it is possible a sequence of several requests is not idempotent even if all methods executed in that sequence are idempotent".

This is possible because you have to consider the state of the program as a hidden parameter to the function F(). For example, consider an example sequence of requests of F(), F(), F2(), F(). The last F() request will not produce the same result as the first two but that is ok because the request was not identical. You must consider the state of the program as a hidden parameter to the function and in the last request, the state was that x was equal to a new random value, but in the first requests the state of x was initially zero.

Sources:

Anadromous answered 30/1, 2014 at 6:40 Comment(1)
Best answer in my opinion. I will add that what Steve McLeod is saying is not exact for the CS definition, if you call f(x) twice in a row, it is possible they return different result, but as long as the intended effect results in the same thing as if it was only called once, than it is still idempotent in the CS sense. Imagine the result of f(x) for example contains the requestID, that would be different each time, but the intended effect, like setting the value of some row in a DB to x would be the same, and that would qualify as idempotent.Huffish

© 2022 - 2024 — McMap. All rights reserved.