What exactly PHI instruction does and how to use it in LLVM

Asked 14/7, 2012 at 16:42 Answered 29/1, 2022 at 2:17

126

LLVM has phi instruction with quite weird explanation:

The 'phi' instruction is used to implement the φ node in the SSA graph representing the function.

Typically it is used to implement branching. If I understood correctly, it is needed to make dependency analysis possible and in some cases it could help to avoid unnecessary loading. However it's still hard to understand what it does exactly.

Kaleidoscope example explains it fairly nicely for if case. However it's not that clear how to implement logical operations like && and ||. If I type the following to online llvm compiler:

void main1(bool r, bool y) {
    bool l = y || r;
}

Last several lines completely confuse me:

; <label>:10                                      ; preds = %7, %0
%11 = phi i1 [ true, %0 ], [ %9, %7 ]
%12 = zext i1 %11 to i8

Looks like phi node produces a result which can be used. And I was under impression that phi node just defines from which paths values coming.

Could someone explain what is a Phi node, and how to implement || with it?

Jetty answered 14/7, 2012 at 16:42 Comment(1)

The phi node is a solution of the problem in compilers to convert the IR into "Static single assignment" form. To understand better understand the solution I would suggest better understand the problem. So I will one up you "Why is phi node". – Wellheeled 30/7, 2018 at 23:28

109

A phi node is an instruction used to select a value depending on the predecessor of the current block (Look here to see the full hierarchy - it's also used as a value, which is one of the classes which it inherits from).

Phi nodes are necessary due to the structure of the SSA (static single assignment) style of the LLVM code - for example, the following C++ function

void m(bool r, bool y){
    bool l = y || r ;
}

gets translated into the following IR: (created through clang -c -emit-llvm file.c -o out.bc - and then viewed through llvm-dis)

define void @_Z1mbb(i1 zeroext %r, i1 zeroext %y) nounwind {
entry:
  %r.addr = alloca i8, align 1
  %y.addr = alloca i8, align 1
  %l = alloca i8, align 1
  %frombool = zext i1 %r to i8
  store i8 %frombool, i8* %r.addr, align 1
  %frombool1 = zext i1 %y to i8
  store i8 %frombool1, i8* %y.addr, align 1
  %0 = load i8* %y.addr, align 1
  %tobool = trunc i8 %0 to i1
  br i1 %tobool, label %lor.end, label %lor.rhs

lor.rhs:                                          ; preds = %entry
  %1 = load i8* %r.addr, align 1
  %tobool2 = trunc i8 %1 to i1
  br label %lor.end

lor.end:                                          ; preds = %lor.rhs, %entry
  %2 = phi i1 [ true, %entry ], [ %tobool2, %lor.rhs ]
  %frombool3 = zext i1 %2 to i8
  store i8 %frombool3, i8* %l, align 1
  ret void
}

So what happens here? Unlike the C++ code, where the variable bool l could be either 0 or 1, in the LLVM IR it has to be defined once. So we check if %tobool is true, and then jump to lor.end or lor.rhs.

In lor.end we finally have the value of the || operator. If we arrived from the entry block - then it's just true. Otherwise, it is equal to the value of %tobool2 - and that's exactly what we get from the following IR line:

%2 = phi i1 [ true, %entry ], [ %tobool2, %lor.rhs ]

Strawn answered 14/7, 2012 at 17:41 Comment(3)

TL;DR φ node is a ternary expression. One might argue that it doesn't contain the condition, but really, upon converting to the final code, you can't determine otherwise which one of arguments is live, so φ have to have the condition too. – Millpond 5/6, 2017 at 11:47

You can also use clang -S -emit-llvm -o - (instead of bc and llvm-dis) – Wouldbe 27/5, 2021 at 0:49

Use clang --target=aarch64-linux-gnu -S -emit-llvm -fno-discard-value-names -O3 .\sourceFile.cpp to view variables names – Selfjustifying 13/10, 2021 at 10:25

You don't need to use phi at all. Just create bunch of temporary variables. LLVM optimization passes will take care of optimizing temporary variables away and will use phi node for that automatically.

For example, if you want to do this:

x = 4;
if (something) x = x + 2;
print(x);

You can use phi node for that (in pseudocode):

assign 4 to x1
if (!something) branch to 4
calculate x2 from x1 by adding 2
assign x3 phi from x1 and x2
call print with x3

But you can do without phi node (in pseudocode):

allocate local variable on stack called x
load into temp x1 value 4
store x1 to x
if (!something) branch to 8
load x to temp x2
add x2 with 4 to temp x3
store x3 to x
load x to temp x4
call print with x4

By running optimization passes with llvm this second code will get optimized to first code.

Bernardina answered 14/7, 2012 at 21:40 Comment(2)

From what I have read it sounds like there are a few restrictions to keep in mind here. mem2reg is the optimization pass in question, and it has a few limitations that are pointed out in the Kaleidoscope example. It sounds like this is, however, the preferred way to handle the problem and is used by Clang. – Wayfaring 18/12, 2015 at 16:21

That begs the question, how do you create temporary variables in LLVM? If you try to assign to eg. %x twice, it will reject it. – Stott 11/3, 2024 at 23:37

The existing answers are good. But, I want to make it even simpler and shorter.

block3:
    %result = phi i32 [%a, %block1], [%b, %block2]

This means that if the previous block was block1, choose value a. If the previous block was block2, choose value b.

Why do we write like this? This is to prevent assigning result in two different blocks such as if block and else block. Because, we do not want to violate SSA principle. SSA helps compilers to apply variety of optimizations and it is a de-facto standard for the intermediate codes. For more information, refer to this reference.

Coyne answered 29/1, 2022 at 2:17 Comment(0)

Recommended topics

Hot tags