Cons of virtual methods in cuda - McMap

About

Cons of virtual methods in cuda

Asked 13/11, 2015 at 10:6 Answered 13/11, 2015 at 10:30

Solved c++cuda nvcc virtual-functions

O

1

6

So far as I understand, virtual method calls are late binding and thus cannot be inlined by the compiler. Apparently, nvcc relies heavily on inlining code. I'm wondering if virtual methods have any serious disadvantage if used in a kernel in Cuda. Is there any situation where they should be avoided? Can they have an effect on performance?

Oberland answered 13/11, 2015 at 10:6 Comment(3)

Unless devirtualized at compile time, they are a performance hit (they cost a vtable lookup + an indirect branch). And if threads in a warp don't resolve to the same virtual method (for example when processing an array of objects with different concrete types), you'll get warp divergence. Avoid them as much as you can. Out of curiosity, what kind of application are you writing that requires virtual methods in CUDA code? – Tello 13/11, 2015 at 10:11

It's not the method that "is late binding", it is the method call that's late binding. Sometimes. – Wrong 13/11, 2015 at 10:17

I'm working on an ODE solver. Long story short, I have a method called solve which has two different implementations. I wrote a base class with a pure virtual function and two subclasses that overwrite this method. This is a solution that is easy to maintain, although it might be not optimal. Still, I'm interested to know more about this topic. – Oberland 13/11, 2015 at 10:23

T

6

If the compiler can devirtualize the call, it may be able to transform it into a regular method call or even inline it. Clang/LLVM, which powers NVCC, is capable of doing this in some cases, as an optimization. You will have to check the generated code to know whether this is the case.

If the compiler cannot devirtualize the call, then it may have an impact on performance, particularly if that call is on a hot path. A virtual call requires:

a vtable lookup;
an indirect branch.

The vtable lookup costs a memory access, which is slow (and may "waste" cache lines that could be better used) and indirect branches are expensive in general. Moreover, if not all threads within a warp resolve the virtual method to the same address (for example, when processing an array of object with different concrete types), this will lead to warp divergence, which is yet another performance hit.

That being said, if you are not calling the virtual method on a hot path, the impact should be negligible. Without further code, it's impossible to tell.

Tello answered 13/11, 2015 at 10:30 Comment(2)

I'm not familiar with the concept of indirect branches, could you explain what that is? – Oberland 18/11, 2015 at 8:15

@Oberland Direct branches store the destination address right in the instruction, so it is easy to continue fetching at the target of the branch, it can just peek at the instructions as they are fetched and get the target address. Indirect branches must fetch a value from memory to find their target address, so the instruction fetch stage won't know where to continue until a potentially long memory access completes. – Intension 5/5, 2016 at 10:42

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.