Sorry to disagree with the current accepted answer. This is the year 2021. Modern compilers and their optimizers shouldn't differentiate between switch
and an equivalent if
-chain anymore. If they still do, and create poorly optimized code for either variant, then write to the compiler vendor (or make it public here, which has a higher change of being respected), but don't let micro-optimizations influence your coding style.
So, if you use:
switch (numError) { case ERROR_A: case ERROR_B: ... }
or:
if(numError == ERROR_A || numError == ERROR_B || ...) { ... }
or:
template<typename C, typename EL>
bool has(const C& cont, const EL& el) {
return std::find(cont.begin(), cont.end(), el) != cont.end();
}
constexpr std::array errList = { ERROR_A, ERROR_B, ... };
if(has(errList, rnd)) { ... }
shouldn't make a difference with respect to execution speed. But depending on what project you are working on, they might make a big difference in coding clarity and code maintainability. For example, if you have to check for a certain error list in many places of the code, the templated has()
might be much easier to maintain, as the errList needs to be updated only in one place.
Talking about current compilers, I have compiled the test code quoted below with both clang++ -O3 -std=c++1z
(version 10 and 11) and g++ -O3 -std=c++1z
. Both clang versions gave similiar compiled code and execution times. So I am talking only about version 11 from now on. Most notably, functionA()
(which uses if
) and functionB()
(which uses switch
) produce exactly the same assembler output with clang
! And functionC()
uses a jump table, even though many other posters deemed jump tables to be an exclusive feature of switch
. However, despite many people considering jump tables to be optimal, that was actually the slowest solution on clang
: functionC()
needs around 20 percent more execution time than functionA()
or functionB()
.
The hand-optimized version functionH()
was by far the fastest on clang
. It even unrolled the loop partially, doing two iterations on each loop.
Actually, clang
calculated the bitfield, which is explicitely supplied in functionH()
, also in functionA()
and functionB()
. However, it used conditional branches in functionA()
and functionB()
, which made these slow, because branch prediction fails regularly, while it used the much more efficient adc
("add with carry") in functionH()
. While it failed to apply this obvious optimization also in the other variants, is unknown to me.
The code produced by g++
looks much more complicated than that of clang
- but actually runs a bit faster for functionA()
and quite a lot faster for functionC()
. Of the non-hand-optimized functions, functionC()
is the fastest on g++
and faster than any of the functions on clang
. On the contrary, functionH()
requires twice the execution time when compiled with g++
instead of with clang
, mostly because g++
doesn't do the loop unrolling.
Here are the detailed results:
clang:
functionA: 109877 3627
functionB: 109877 3626
functionC: 109877 4192
functionH: 109877 524
g++:
functionA: 109877 3337
functionB: 109877 4668
functionC: 109877 2890
functionH: 109877 982
The Performance changes drastically, if the constant 32
is changed to 63
in the whole code:
clang:
functionA: 106943 1435
functionB: 106943 1436
functionC: 106943 4191
functionH: 106943 524
g++:
functionA: 106943 1265
functionB: 106943 4481
functionC: 106943 2804
functionH: 106943 1038
The reason for the speedup is, that in case, that the highest tested value is 63, the compilers remove some unnecessary bound checks, because the value of rnd
is bound to 63, anyways. Note that with that bound check removed, the non-optimized functionA()
using simple if()
on g++
performs almost as fast as the hand-optimized functionH()
, and it also produces rather similiar assembler output.
What is the conclusion? If you hand-optimize and test compilers a lot, you will get the fastest solution. Any assumption whether switch
or if
is better, is void - they are the same on clang
. And the easy to code solution to check against an array
of values is actually the fastest case on g++
(if leaving out hand-optimization and by-incident matching last values of the list).
Future compiler versions will optimize your code better and better and get closer to your hand optimization. So don't waste your time on it, unless cycles are REALLY crucial in your case.
Here the test code:
#include <iostream>
#include <chrono>
#include <limits>
#include <array>
#include <algorithm>
unsigned long long functionA() {
unsigned long long cnt = 0;
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
if(rnd == 1 || rnd == 7 || rnd == 10 || rnd == 16 ||
rnd == 21 || rnd == 22 || rnd == 63)
{
cnt += 1;
}
}
return cnt;
}
unsigned long long functionB() {
unsigned long long cnt = 0;
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
switch(rnd) {
case 1:
case 7:
case 10:
case 16:
case 21:
case 22:
case 63:
cnt++;
break;
}
}
return cnt;
}
template<typename C, typename EL>
bool has(const C& cont, const EL& el) {
return std::find(cont.begin(), cont.end(), el) != cont.end();
}
unsigned long long functionC() {
unsigned long long cnt = 0;
constexpr std::array errList { 1, 7, 10, 16, 21, 22, 63 };
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
cnt += has(errList, rnd);
}
return cnt;
}
// Hand optimized version (manually created bitfield):
unsigned long long functionH() {
unsigned long long cnt = 0;
const unsigned long long bitfield =
(1ULL << 1) +
(1ULL << 7) +
(1ULL << 10) +
(1ULL << 16) +
(1ULL << 21) +
(1ULL << 22) +
(1ULL << 63);
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
if(bitfield & (1ULL << rnd)) {
cnt += 1;
}
}
return cnt;
}
void timeit(unsigned long long (*function)(), const char* message)
{
unsigned long long mintime = std::numeric_limits<unsigned long long>::max();
unsigned long long fres = 0;
for(int i = 0; i < 100; i++) {
auto t1 = std::chrono::high_resolution_clock::now();
fres = function();
auto t2 = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
if(duration < mintime) {
mintime = duration;
}
}
std::cout << message << fres << " " << mintime << std::endl;
}
int main(int argc, char* argv[]) {
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
return 0;
}