I'm wondering about std::variant
performance. When should I not use it? It seems like virtual functions are still much better than using std::visit
which surprised me!
In "A Tour of C++" Bjarne Stroustrup says this about pattern checking
after explaining std::holds_alternatives
and the overloaded
methods:
This is basically equivalent to a virtual function call, but potentially faster. As with all claims of performance, this ‘‘potentially faster’’ should be verified by measurements when performance is critical. For most uses, the difference in performance is insignificant.
I've benchmark some methods that came in my mind and these are the results:
http://quick-bench.com/N35RRw_IFO74ZihFbtMu4BIKCJg
You'll get a different result if you turn on the optimization:
http://quick-bench.com/p6KIUtRxZdHJeiFiGI8gjbOumoc
Here's the code I've used for benchmarks; I'm sure there's better way to implement and use variants for using them instead of virtual keywords (inheritance vs. std::variant):
removed the old code; look at the updates
Can anyone explain what is the best way to implement this use case for std::variant
that got me to testing and benchmarking:
I'm currently implementing RFC 3986 which is 'URI' and for my use case this class will be used more as a const and probably won't be changed a lot and it's more likely for the user to use this class to find each specific portion of the URI rather than making a URI; so it made sense to make use of std::string_view
and not separating each segment of the URI in its own std::string
. The problem was I needed to implement two classes for it; one for when I only need a const version; and another one for when the user wants to create the URI rather than providing one and searching through it.
So I used a template
to fix that which had its own problems; but then I realized I could use std::variant<std::string, std::string_view>
(or maybe std::variant<CustomStructHoldingAllThePieces, std::string_view>
); so I started researching to see if it actually helps to use variants or not. From these results, it seems like using inheritance and virtual
is my best bet if I don't want to implement two different const_uri
and uri
classes.
What do you think should I do?
Update (2)
Thanks for @gan_ for mentioning and fixing the hoisting problem in my benchmark code. http://quick-bench.com/Mcclomh03nu8nDCgT3T302xKnXY
I was surprised with the result of try-catch hell but thanks to this comment that makes sense now.
Update (3)
I removed the try-catch
method as it was really bad; and also randomly changed the selected value and by the looks of it, I see more realistic benchmark. It seems like virtual
is not the correct answer after all.
http://quick-bench.com/o92Yrt0tmqTdcvufmIpu_fIfHt0
http://quick-bench.com/FFbe3bsIpdFsmgKfm94xGNFKVKs (without the memory leak lol)
Update (4)
I removed the overhead of generating random numbers (I've already did that in the last update but it seems like I had grabbed the wrong URL for benchmark) and added an EmptyRandom for understanding the baseline of generating random numbers. And also made some small changes in Virtual but I don't think it affected anything. http://quick-bench.com/EmhM-S-xoA0LABYK6yrMyBb8UeI
http://quick-bench.com/5hBZprSRIRGuDaBZ_wj0cOwnNhw (removed the Virtual so you could compare the rest of them better)
Update (5)
as Jorge Bellon said in the comments, I wasn't thinking about the cost of allocation; so I converted every benchmark to use pointers. This indirection has an impact on performance of course but it's more fair now. So right now there's no allocation in the loops.
Here's the code:
removed the old code; look at the updates
I ran some benchmarks so far. It seems like g++ does a better job of optimizing the code:
-------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------
EmptyRandom 0.756 ns 0.748 ns 746067433
TradeSpaceForPerformance 2.87 ns 2.86 ns 243756914
Virtual 12.5 ns 12.4 ns 60757698
Index 7.85 ns 7.81 ns 99243512
GetIf 8.20 ns 8.18 ns 92393200
HoldsAlternative 7.08 ns 7.07 ns 96959764
ConstexprVisitor 11.3 ns 11.2 ns 60152725
StructVisitor 10.7 ns 10.6 ns 60254088
Overload 10.3 ns 10.3 ns 58591608
And for clang:
-------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------
EmptyRandom 1.99 ns 1.99 ns 310094223
TradeSpaceForPerformance 8.82 ns 8.79 ns 87695977
Virtual 12.9 ns 12.8 ns 51913962
Index 13.9 ns 13.8 ns 52987698
GetIf 15.1 ns 15.0 ns 48578587
HoldsAlternative 13.1 ns 13.1 ns 51711783
ConstexprVisitor 13.8 ns 13.8 ns 49120024
StructVisitor 14.5 ns 14.5 ns 52679532
Overload 17.1 ns 17.1 ns 42553366
Right now, for clang, it's better to use virtual inheritance but for g++ it's better to use holds_alternative
or get_if
but in overall, std::visit
seems to be not a good choice for almost all of my benchmarks so far.
I'm thinking it'll be a good idea if pattern matching (switch statements capable of checking more stuff than just integer literals) would be added to the c++, we would be writing cleaner and more maintainable code.
I'm wondering about the package.index()
results. Shouldn't it be faster? what does it do?
Clang version: http://quick-bench.com/cl0HFmUes2GCSE1w04qt4Rqj6aI
The version that uses One one
instead of auto one = new One
based on Maxim Egorushkin's comment: http://quick-bench.com/KAeT00__i2zbmpmUHDutAfiD6-Q (not changing the outcome much)
Update (6)
I made some changes and the results are very different from compiler to compiler now. But it seems like std::get_if
and std::holds_alternatives
are the best solutions. virtual
seems to work best for unknown reasons with clang now. That really surprises me there because I remember virtual
being better in gcc. And also std::visit
is totally out of competition; in this last benchmark it's even worse than vtable lookup.
Here's the benchmark (run it with GCC/Clang and also with libstdc++ and libc++):
http://quick-bench.com/LhdP-9y6CqwGxB-WtDlbG27o_5Y
#include <benchmark/benchmark.h>
#include <array>
#include <variant>
#include <random>
#include <functional>
#include <algorithm>
using namespace std;
struct One {
auto get () const { return 1; }
};
struct Two {
auto get() const { return 2; }
};
struct Three {
auto get() const { return 3; }
};
struct Four {
auto get() const { return 4; }
};
template<class... Ts> struct overload : Ts... { using Ts::operator()...; };
template<class... Ts> overload(Ts...) -> overload<Ts...>;
std::random_device dev;
std::mt19937 rng(dev());
std::uniform_int_distribution<std::mt19937::result_type> random_pick(0,3); // distribution in range [1, 6]
template <std::size_t N>
std::array<int, N> get_random_array() {
std::array<int, N> item;
for (int i = 0 ; i < N; i++)
item[i] = random_pick(rng);
return item;
}
template <typename T, std::size_t N>
std::array<T, N> get_random_objects(std::function<T(decltype(random_pick(rng)))> func) {
std::array<T, N> a;
std::generate(a.begin(), a.end(), [&] {
return func(random_pick(rng));
});
return a;
}
static void TradeSpaceForPerformance(benchmark::State& state) {
One one;
Two two;
Three three;
Four four;
int index = 0;
auto ran_arr = get_random_array<50>();
int r = 0;
auto pick_randomly = [&] () {
index = ran_arr[r++ % ran_arr.size()];
};
pick_randomly();
for (auto _ : state) {
int res;
switch (index) {
case 0:
res = one.get();
break;
case 1:
res = two.get();
break;
case 2:
res = three.get();
break;
case 3:
res = four.get();
break;
}
benchmark::DoNotOptimize(index);
benchmark::DoNotOptimize(res);
pick_randomly();
}
}
// Register the function as a benchmark
BENCHMARK(TradeSpaceForPerformance);
static void Virtual(benchmark::State& state) {
struct Base {
virtual int get() const noexcept = 0;
virtual ~Base() {}
};
struct A final: public Base {
int get() const noexcept override { return 1; }
};
struct B final : public Base {
int get() const noexcept override { return 2; }
};
struct C final : public Base {
int get() const noexcept override { return 3; }
};
struct D final : public Base {
int get() const noexcept override { return 4; }
};
Base* package = nullptr;
int r = 0;
auto packages = get_random_objects<Base*, 50>([&] (auto r) -> Base* {
switch(r) {
case 0: return new A;
case 1: return new B;
case 3: return new C;
case 4: return new D;
default: return new C;
}
});
auto pick_randomly = [&] () {
package = packages[r++ % packages.size()];
};
pick_randomly();
for (auto _ : state) {
int res = package->get();
benchmark::DoNotOptimize(package);
benchmark::DoNotOptimize(res);
pick_randomly();
}
for (auto &i : packages)
delete i;
}
BENCHMARK(Virtual);
static void FunctionPointerList(benchmark::State& state) {
One one;
Two two;
Three three;
Four four;
using type = std::function<int()>;
std::size_t index;
auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
switch(r) {
case 0: return std::bind(&One::get, one);
case 1: return std::bind(&Two::get, two);
case 2: return std::bind(&Three::get, three);
case 3: return std::bind(&Four::get, four);
default: return std::bind(&Three::get, three);
}
});
int r = 0;
auto pick_randomly = [&] () {
index = r++ % packages.size();
};
pick_randomly();
for (auto _ : state) {
int res = packages[index]();
benchmark::DoNotOptimize(index);
benchmark::DoNotOptimize(res);
pick_randomly();
}
}
BENCHMARK(FunctionPointerList);
static void Index(benchmark::State& state) {
One one;
Two two;
Three three;
Four four;
using type = std::variant<One, Two, Three, Four>;
type* package = nullptr;
auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
switch(r) {
case 0: return one;
case 1: return two;
case 2: return three;
case 3: return four;
default: return three;
}
});
int r = 0;
auto pick_randomly = [&] () {
package = &packages[r++ % packages.size()];
};
pick_randomly();
for (auto _ : state) {
int res;
switch (package->index()) {
case 0:
res = std::get<One>(*package).get();
break;
case 1:
res = std::get<Two>(*package).get();
break;
case 2:
res = std::get<Three>(*package).get();
break;
case 3:
res = std::get<Four>(*package).get();
break;
}
benchmark::DoNotOptimize(package);
benchmark::DoNotOptimize(res);
pick_randomly();
}
}
BENCHMARK(Index);
static void GetIf(benchmark::State& state) {
One one;
Two two;
Three three;
Four four;
using type = std::variant<One, Two, Three, Four>;
type* package = nullptr;
auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
switch(r) {
case 0: return one;
case 1: return two;
case 2: return three;
case 3: return four;
default: return three;
}
});
int r = 0;
auto pick_randomly = [&] () {
package = &packages[r++ % packages.size()];
};
pick_randomly();
for (auto _ : state) {
int res;
if (auto item = std::get_if<One>(package)) {
res = item->get();
} else if (auto item = std::get_if<Two>(package)) {
res = item->get();
} else if (auto item = std::get_if<Three>(package)) {
res = item->get();
} else if (auto item = std::get_if<Four>(package)) {
res = item->get();
}
benchmark::DoNotOptimize(package);
benchmark::DoNotOptimize(res);
pick_randomly();
}
}
BENCHMARK(GetIf);
static void HoldsAlternative(benchmark::State& state) {
One one;
Two two;
Three three;
Four four;
using type = std::variant<One, Two, Three, Four>;
type* package = nullptr;
auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
switch(r) {
case 0: return one;
case 1: return two;
case 2: return three;
case 3: return four;
default: return three;
}
});
int r = 0;
auto pick_randomly = [&] () {
package = &packages[r++ % packages.size()];
};
pick_randomly();
for (auto _ : state) {
int res;
if (std::holds_alternative<One>(*package)) {
res = std::get<One>(*package).get();
} else if (std::holds_alternative<Two>(*package)) {
res = std::get<Two>(*package).get();
} else if (std::holds_alternative<Three>(*package)) {
res = std::get<Three>(*package).get();
} else if (std::holds_alternative<Four>(*package)) {
res = std::get<Four>(*package).get();
}
benchmark::DoNotOptimize(package);
benchmark::DoNotOptimize(res);
pick_randomly();
}
}
BENCHMARK(HoldsAlternative);
static void ConstexprVisitor(benchmark::State& state) {
One one;
Two two;
Three three;
Four four;
using type = std::variant<One, Two, Three, Four>;
type* package = nullptr;
auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
switch(r) {
case 0: return one;
case 1: return two;
case 2: return three;
case 3: return four;
default: return three;
}
});
int r = 0;
auto pick_randomly = [&] () {
package = &packages[r++ % packages.size()];
};
pick_randomly();
auto func = [] (auto const& ref) {
using type = std::decay_t<decltype(ref)>;
if constexpr (std::is_same<type, One>::value) {
return ref.get();
} else if constexpr (std::is_same<type, Two>::value) {
return ref.get();
} else if constexpr (std::is_same<type, Three>::value) {
return ref.get();
} else if constexpr (std::is_same<type, Four>::value) {
return ref.get();
} else {
return 0;
}
};
for (auto _ : state) {
auto res = std::visit(func, *package);
benchmark::DoNotOptimize(package);
benchmark::DoNotOptimize(res);
pick_randomly();
}
}
BENCHMARK(ConstexprVisitor);
static void StructVisitor(benchmark::State& state) {
struct VisitPackage
{
auto operator()(One const& r) { return r.get(); }
auto operator()(Two const& r) { return r.get(); }
auto operator()(Three const& r) { return r.get(); }
auto operator()(Four const& r) { return r.get(); }
};
One one;
Two two;
Three three;
Four four;
using type = std::variant<One, Two, Three, Four>;
type* package = nullptr;
auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
switch(r) {
case 0: return one;
case 1: return two;
case 2: return three;
case 3: return four;
default: return three;
}
});
int r = 0;
auto pick_randomly = [&] () {
package = &packages[r++ % packages.size()];
};
pick_randomly();
auto vs = VisitPackage();
for (auto _ : state) {
auto res = std::visit(vs, *package);
benchmark::DoNotOptimize(package);
benchmark::DoNotOptimize(res);
pick_randomly();
}
}
BENCHMARK(StructVisitor);
static void Overload(benchmark::State& state) {
One one;
Two two;
Three three;
Four four;
using type = std::variant<One, Two, Three, Four>;
type* package = nullptr;
auto packages = get_random_objects<type, 50>([&] (auto r) -> type {
switch(r) {
case 0: return one;
case 1: return two;
case 2: return three;
case 3: return four;
default: return three;
}
});
int r = 0;
auto pick_randomly = [&] () {
package = &packages[r++ % packages.size()];
};
pick_randomly();
auto ov = overload {
[] (One const& r) { return r.get(); },
[] (Two const& r) { return r.get(); },
[] (Three const& r) { return r.get(); },
[] (Four const& r) { return r.get(); }
};
for (auto _ : state) {
auto res = std::visit(ov, *package);
benchmark::DoNotOptimize(package);
benchmark::DoNotOptimize(res);
pick_randomly();
}
}
BENCHMARK(Overload);
// BENCHMARK_MAIN();
Results for GCC compiler:
-------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------
TradeSpaceForPerformance 3.71 ns 3.61 ns 170515835
Virtual 12.20 ns 12.10 ns 55911685
FunctionPointerList 13.00 ns 12.90 ns 50763964
Index 7.40 ns 7.38 ns 136228156
GetIf 4.04 ns 4.02 ns 205214632
HoldsAlternative 3.74 ns 3.73 ns 200278724
ConstexprVisitor 12.50 ns 12.40 ns 56373704
StructVisitor 12.00 ns 12.00 ns 60866510
Overload 13.20 ns 13.20 ns 56128558
Results for clang compiler (which I'm surprised by it):
-------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------
TradeSpaceForPerformance 8.07 ns 7.99 ns 77530258
Virtual 7.80 ns 7.77 ns 77301370
FunctionPointerList 12.1 ns 12.1 ns 56363372
Index 11.1 ns 11.1 ns 69582297
GetIf 10.4 ns 10.4 ns 80923874
HoldsAlternative 9.98 ns 9.96 ns 71313572
ConstexprVisitor 11.4 ns 11.3 ns 63267967
StructVisitor 10.8 ns 10.7 ns 65477522
Overload 11.4 ns 11.4 ns 64880956
Best benchmark so far (will be updated): http://quick-bench.com/LhdP-9y6CqwGxB-WtDlbG27o_5Y (also check out the GCC)
DoNotOptimize
on the variant itself if you want to bench the unboxing. – ArguelloVirtual
benchmark may not actually use dynamic dispatch as the compiler may be able to trace the call graph and devirtualize the call toget
. A nice explanation is given in this answer. With gcc you can provide the flag-fno-devirtualize
to prevent this specific optimization while still not giving up on optimizations entirely. – CollectorOne
). The try/catch alternative is basically made to be very fast when the first alternative is chosen. If I change your code to hold the third alternative (Three
), the try/catch becomes thousands time slower than any other versions. quick-bench.com/yIDclCuxfzjzBlCjbVCDNrp13aU – Weekdayvariant
is explicitly disallowed from allocating memory. Your benchmark doesn't test the cost of creating such objects. And of course, no benchmark can test the cost of having to track the ownership of such heap-allocated resources. – Retailstd::in_place
and relatedemplace
for completeness. – Pinskynew
anddelete
out of the loop, the results are a bit more realistic. See quick-bench.com/aHzAIn3xRZfiTSWbuLu6ZhoyRoo – Geopoliticsauto one = new A;
withA one;
and just take a pointer with&one
. – Fiberstd::visit
in libstdc++ changed not that long ago. And clang could be using libc++ or libstdc++. – Cacophonyclass URI { std::string_view scheme_; std::vector<std::string> pieces_; void set_scheme(std::string scheme) { scheme_ = pieces_.emplace_back(std::move(scheme)); scheme_ = pieces_.back(); } };
All of your get functions and reassembly functions would end up being the same. Biggest downside is someone constantly resetting a field and constantly appending strings. – Detroitstd::variant
is to provide access to distinct classes which do not share a common interface. For example:std::variant<car, pear, fox>
. Why are you trying to replace relatively fastvirtual
calls withstd::variant
? Only because it could be faster? – Juststd::variant
is designed to solve this very issue. Why not use it when it's the right thing to do? – Pinskyvirtual
faster than usingstd::variant
? so far I've gotten different results which don't imply nor denies the performance benefits of using each methods. – Pinskystd::variant
is very weird! I think you are at the boundaries of the implementation there. Astd::variant
of pointers is not how it's supposed to be used... but have fun! – JustVirtual
. Why do you use a switch? The whole point of virtual functions is that you don't need toswitch
. I modified your benchmark, and nowVirtual
is as fast asEmptyRandom
(quick-bench.com/1Ty62gg2lpeeaTmx_sHLC6sLfjs). Furthermore, your benchmark is a little bit flawed, as it has a%
operation. It distorts the results, as it adds a constant calculation to every benchmark. You should use&
instead, it is faster. Or use a power-of-2 random array size. – RumpleEmptyRandom
was so I could know how much that function is actually is taking. Your way is just not fair to the others. I might do the same thing for the other ones as well and then the results would be more fair. – Pinskyswitch
inEmptyRandom
as well, if you just want to measure the cost of virtual dispatch. For virtual functions to work, aswitch
doesn't needed. That's the whole point of virtual functions. – Rumplestd::function
instead of c-style version. I was sure it wouldn't be faster even before I made it. But of course c-style version would beat the other ones but that is not the point of this benchmark. – Pinskystd::function
is a different thing than a function pointer. I'm glad that you mentioned the point of the benchmark. What is the point? I don't really get it. It seems that you intentionally make versions worse (like adding unneededswitch
to the virtual version), and you don't measure a possible good option. Btw., you should check out the resulting assembly. Compilers are pretty smart nowadays, maybe they optimize several of your versions to a point, where you don't measure what you wanted to measure. – Rumplestd::variant
over virtual inheritance and if it is, then which way of accessingstd::variant
is the best. My initial intention was about making a URI class that holdsstd::string_view
and when the user needed, parses the string and makes the struct so we wouldn't need to parse something that user doesn't need and most certainly uses less memory. I've updated the question since you last checked the code. – Pinskystd::variant
where performance is concerned. Having to wrap code intry
/catch
adds a performance penalty - especially when you know thatcatch
is likely to be called. Then again, I prefer old fashionedC
, so I might be biased here (I would return NULL / pointer for URI data). – Sheehanstd::variant
andtry-cache-hell
was just one of them that I had to remove from benchmarks so I could compare the charts more clearly. Read the comments of this question. – Pinskystd::visit
's implementation is not the best (libstdc++ and libc++, I think the Microsoft STL uses the optimizations mentioned). See Michael Park's blog post: mpark.github.io/programming/2019/01/22/variant-visitation-v2 . I know that there's an implementation kicking around somewhere too (I think Michael did it along with someone else). – Hyperparathyroidismstd::optional
,std::any
, andstd::variant
in c++17 are good opportunities to introduce new programming techniques and deprecate or at least use them when they make more sense to use and I'm looking for these kinda techniques so I could migrate from usingvirtual
keyword to a more constant evaluated, faster, templated techniques. And finding out the performance benefits of each is essential to understand which one should be used where. – PinskyAndyG::heterogeneous_container
that works like a visitor pattern, but groups the data by type into vectors. My own benchmarks had it much faster thanstd::any
at least, and I imagine it may be faster than a variant as well. gieseanw.wordpress.com/2017/05/03/… – Gimmickget
methods and now your way is not so much better than the others. I wonder if it has an impact if you use function pointers instead ofstd::function
. quick-bench.com/bgrIxAbc-Wd9G1UghwKSzf5UunQ – Pinskystd::
anyway, so why not removeusing namespace std;
? – Realityusing namespace std
in my codes, just in benchmarks and test codes that I just want to test something. It's not important here. There are way more important issues in the benchmark above than usingusing
statement. – Pinskystd::get_if
that still allows for the visitor syntax for cases where you don't care about handling all the types or returning a value from visit that is faster for clang too, just for reference. – Pegu