Benefits of pure function
Asked Answered
K

6

82

Today i was reading about pure function, got confused with its use:

A function is said to be pure if it returns same set of values for same set of inputs and does not have any observable side effects.

e.g. strlen() is a pure function while rand() is an impure one.

__attribute__ ((pure)) int fun(int i)
{
    return i*i;
}

int main()
{
    int i=10;
    printf("%d",fun(i));//outputs 100
    return 0;
}

http://ideone.com/33XJU

The above program behaves in the same way as in the absence of pure declaration.

What are the benefits of declaring a function as pure[if there is no change in output]?

Kathrynekathy answered 22/6, 2012 at 9:42 Comment(20)
There are more optimisations possible for pure functions. If the compiler can figure out the purity by itself, the pragma makes no difference. If the compiler can't figure it out, telling it that the function is pure may cause it to be better optimised. If you lie about the purity, bad things will happen, probably.Vermifuge
Is it possible to see the optimizations made by compiler?Kathrynekathy
Yes - look at the generated assembly.Expansive
@PhilipKendall, in this particular case, there is no difference in assembly whether __attribute__((pure)) is present or not. (gcc 4.4.5, with or without -O3). I suspect that gcc can figure out himself that fun() is pure. (Note that I also tried with your example).Disjunctive
Additional info for function attributes can be found here: gcc.gnu.org/onlinedocs/gcc/Function-Attributes.htmlDevonna
I don't think this definition of purity is correct - printf, for example, would qualify (calling it twice with the same arguments yields the same return value), but it is not pure.Skyros
@tdammers: Indeed, it lacks the ...and no side-effects... part.Embrey
@FrerichRaabe: I thought the "no side-effects" part implied "same output for same input"... if you can't have side effects, what are you going to base the output on, if not the inputs?Skyros
@tdammers: In theory, a true random function has no side effect, yet it doesn't always output the same result.Disjunctive
@Ben: where does the entropy come from? We're dealing with (theoretically) deterministic machines here, the only way of getting true entropy into them is from external sources, which means side effects. Of course, we could allow programming languages to define nondeterministic functions, pretending the technical side effects aren't there and the functions really are nondeterministic; but if we do that, most of the practical benefits of tracking purity are lost.Skyros
@tdammers: You wrote that printf gives the same output for the same input, yet it is not pure. I agree with this - hence I wrote that the definition of purity given in the question should mention that a function must not have side-effects. With that addition, printf would no longer be pure.Embrey
tdammers is correct - the definition of pure given above is incorrect. Pure means that the output depends only on the inputs to the function; also, there must be no observable side-effects. "Same output for same input" is a very inaccurate summary of those requirements. en.wikipedia.org/wiki/Pure_functionElma
@FrerichRaabe: Absolutely correct. I'm just suggesting that the 'no side effects' rule supersedes the 'same output for same inputs' rule, making it redundant.Skyros
strlen() isn't a pure function. I can modify the perceived length of a string at runtime, yet pass the same pointer.Faris
@ChristianMann: strlen() is a pure function. Modifying the string at runtime is changing the input, therefore it does not need to have the same output. Also, I fixed the definition of pure in the question, but it needs review.Outworn
@RobertMason But the only input that strlen gets is a pointer, not the string itself. It can't be memoized, for instance.Faris
@ChristianMann: True. Though if the pointer is const and qualified with restrict (I know it's not standard, but many compilers support it) then it can be memoized. And I guess that we could say that an overload of strlen() for std::string could be considered pure.Outworn
@Skyros - Isn't the printing itself the side effect?Paucity
@Christian In fact, GCC does memoise strlen’s result, for instance when used in a loop (for (i = 0; i < strlen(str); ++i)) but not because it’s pure (it isn’t – you’re right), but because it’s a builtin that gets special treatment from the compiler (-fno-builtin disables this). I’m assuming that the compiler does this only for string literals where it can be sure that the string isn’t modified (that’s UB), otherwise it’d have to trace aliases to the memory location to prove that it’s not modified through any pointer.Module
@tdammers: Well, printf("Hello world\n"); has the result of displaying x + "Hello world\n", where x was the previous content of the terminal, and + is string concatenation. So the result value does not depend on the input alone but also on the context in which printf is executed. In this sense printf is not pure.Pyromagnetic
E
146

pure lets the compiler know that it can make certain optimisations about the function: imagine a bit of code like

for (int i = 0; i < 1000; i++)
{
    printf("%d", fun(10));
}

With a pure function, the compiler can know that it needs to evaluate fun(10) once and once only, rather than 1000 times. For a complex function, that's a big win.

Expansive answered 22/6, 2012 at 9:48 Comment(5)
Ie, you can safely use memoizationDarken
@Jonathonjonati What do you mean? Why not?Module
Because you can modify the string (sequence of chars starting from some address) without modifying the input (the pointer to the address where the string starts), i.e., you can't memoize it. It would only be a pure function in a language with immutable strings (Java, say).Jonathonjonati
@KonradRudolph: Imagine a length 1000 string. Call strlen on it. Then again. Same thing yes? Now modify the second character to be \0. Does strlen still return 1000 now? The starting address is the same (== input is same) but the function now returns a different value.Burial
@Jonathonjonati That’s a good objection, obviously you are right. I was misled by the fact that even books claim that strlen (in GCC / glibc) is in fact pure. But a look at the glibc implementation showed this to be wrong.Module
W
34

When you say a function is 'pure' you are guaranteeing that it has no externally visible side-effects (and as a comment says, if you lie, bad things can happen). Knowing that a function is 'pure' has benefits for the compiler, which can use this knowledge to do certain optimizations.

Here is what the GCC documentation says about the pure attribute:

pure

Many functions have no effects except the return value and their return value depends only on the parameters and/or global variables. Such a function can be subject to common subexpression elimination and loop optimization just as an arithmetic operator would be. These functions should be declared with the attribute pure. For example,

          int square (int) __attribute__ ((pure));

Philip's answer already shows how knowing a function is 'pure' can help with loop optimizations.

Here is one for common sub-expression elimination (given foo is pure):

a = foo (99) * x + y;
b = foo (99) * x + z;

Can become:

_tmp = foo (99) * x;
a = _tmp + y;
b = _tmp + z;
Weiss answered 22/6, 2012 at 9:49 Comment(5)
I am not sure if any do this, but pure functions also allow the compiler to reorder when the function gets called, should it deem reordering would be beneficial. When there is the possibility for side effects, the compiler needs to be more conservative.Puggree
@MPD - Yes, that sounds reasonable. And since a call instruction is a bottleneck for superscalar CPUs, some help from the compiler can help.Weiss
I vaguely recall using a DSP compiler a number of years ago that would use this technique to get return values sooner/later. This allowed it to minimize pipeline stalls.Puggree
Could "foo(99)" be precalculated since 99 is a const and foo will always return the same result? Maybe in some sort of two-stage compile?Dramatic
@Dramatic - I am not sure. There can be cases when it is simply not possible. e.g. if foo is part of another compilation unit (another C file), or in a pre-compiled library. In both cases, the compiler wont know what foo does, and cannot pre-calculate.Weiss
E
29

In addition to possible run-time benefits, a pure function is much easier to reason about when reading code. Furthermore, it's much easier to test a pure function since you know that the return value only depends on the values of the parameters.

Embrey answered 22/6, 2012 at 9:56 Comment(1)
+1, your point about testing is an interesting one. No set-up and tear-down required.Weiss
P
15

A non-pure function

int foo(int x, int y) // possible side-effects

is like an extension of a pure function

int bar(int x, int y) // guaranteed no side-effects

in which you have, besides the explicit function arguments x, y, the rest of the universe (or anything your computer can communicate with) as an implicit potential input. Likewise, besides the explicit integer return value, anything your computer can write to is implicitly part of the return value.

It should be clear why it is much easier to reason about a pure function than a non-pure one.

Pyromagnetic answered 22/6, 2012 at 15:8 Comment(2)
+1: Using the universe as a potential input is a very nice way of explaining the difference between pure and not-pure.Weiss
indeed, this is the idea behind monads.Saire
O
7

Just as an add-on, I would like to mention that C++11 codifies things somewhat using the constexpr keyword. Example:

#include <iostream>
#include <cstring>

constexpr unsigned static_strlen(const char * str, unsigned offset = 0) {
        return (*str == '\0') ? offset : static_strlen(str + 1, offset + 1);
}

constexpr const char * str = "asdfjkl;";

constexpr unsigned len = static_strlen(str); //MUST be evaluated at compile time
//so, for example, this: int arr[len]; is legal, as len is a constant.

int main() {
    std::cout << len << std::endl << std::strlen(str) << std::endl;
    return 0;
}

The restrictions on the usage of constexpr make it so that the function is provably pure. This way, the compiler can more aggressively optimize (just make sure you use tail recursion, please!) and evaluate the function at compile time instead of run time.

So, to answer your question, is that if you're using C++ (I know you said C, but they are related), writing a pure function in the correct style allows the compiler to do all sorts of cool things with the function :-)

Outworn answered 22/6, 2012 at 15:40 Comment(0)
T
4

In general, Pure functions has 3 advantages over impure functions that the compiler can take advantage of:

Caching

Lets say that you have pure function f that is being called 100000 times, since it is deterministic and depends only on its parameters, the compiler can calculate its value once and use it when necessary

Parallelism

Pure functions don't read or write to any shared memory, and therefore can run in separate threads without any unexpected consequence

Passing By Reference

A function f(struct t) gets its argument t by value, and on the other hand, the compiler can pass t by reference to f if it is declared as pure while guaranteeing that the value of t will not change and have performance gains


In addition to the compile time considerations, pure functions can be tested fairly easy: just call them.

No need to construct objects or mock connections to DBs / file system.

Tessy answered 13/5, 2015 at 18:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.