Fastest `finally` for C++ [closed]
Asked Answered
W

2

6

C++ so far (unfortunately) doesn't support finally clause for a try statement. This leads to speculations on how to release resources. After studying the question on the internet, although I found some solutions, I didn't get clear about their performance (and I would use Java if performance didn't matter that much). So I had to benchmark.

The options are:

  1. Functor-based finally class proposed at CodeProject. It's powerful, but slow. And the disassembly suggests that outer function local variables are captured very inefficiently: pushed to the stack one by one, rather than passing just the frame pointer to the inner (lambda) function.

  2. RAII: Manual cleaner object on the stack: the disadvantage is manual typing and tailoring it for each place used. Another disadvantage is the need to copy to it all the variables needed for resource release.

  3. MSVC++ specific __try / __finally statement. The disadvantage is that it's obviously not portable.

I created this small benchmark to compare the runtime performance of these approaches:

#include <chrono>
#include <functional>
#include <cstdio>

class Finally1 {
  std::function<void(void)> _functor;
public:
  Finally1(const std::function<void(void)> &functor) : _functor(functor) {}
  ~Finally1() {
    _functor();
  }
};

void BenchmarkFunctor() {
  volatile int64_t var = 0;
  const int64_t nIterations = 234567890;
  auto start = std::chrono::high_resolution_clock::now();
  for (int64_t i = 0; i < nIterations; i++) {
    Finally1 doFinally([&] {
      var++;
    });
  }
  auto elapsed = std::chrono::high_resolution_clock::now() - start;
  double nSec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
  printf("Functor: %.3lf Ops/sec, var=%lld\n", nIterations / nSec, (long long)var);
}

void BenchmarkObject() {
  volatile int64_t var = 0;
  const int64_t nIterations = 234567890;
  auto start = std::chrono::high_resolution_clock::now();
  for (int64_t i = 0; i < nIterations; i++) {
      class Cleaner {
        volatile int64_t* _pVar;
      public:
        Cleaner(volatile int64_t& var) : _pVar(&var) { }
        ~Cleaner() { (*_pVar)++; }
      } c(var);
  }
  auto elapsed = std::chrono::high_resolution_clock::now() - start;
  double nSec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
  printf("Object: %.3lf Ops/sec, var=%lld\n", nIterations / nSec, (long long)var);
}

void BenchmarkMSVCpp() {
  volatile int64_t var = 0;
  const int64_t nIterations = 234567890;
  auto start = std::chrono::high_resolution_clock::now();
  for (int64_t i = 0; i < nIterations; i++) {
    __try {
    }
    __finally {
      var++;
    }
  }
  auto elapsed = std::chrono::high_resolution_clock::now() - start;
  double nSec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
  printf("__finally: %.3lf Ops/sec, var=%lld\n", nIterations / nSec, (long long)var);
}

template <typename Func> class Finally4 {
  Func f;
public:
  Finally4(Func&& func) : f(std::forward<Func>(func)) {}
  ~Finally4() { f(); }
};

template <typename F> Finally4<F> MakeFinally4(F&& f) {
  return Finally4<F>(std::forward<F>(f));
}

void BenchmarkTemplate() {
  volatile int64_t var = 0;
  const int64_t nIterations = 234567890;
  auto start = std::chrono::high_resolution_clock::now();
  for (int64_t i = 0; i < nIterations; i++) {
    auto doFinally = MakeFinally4([&] { var++; });
    //Finally4 doFinally{ [&] { var++; } };
  }
  auto elapsed = std::chrono::high_resolution_clock::now() - start;
  double nSec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
  printf("Template: %.3lf Ops/sec, var=%lld\n", nIterations / nSec, (long long)var);
}

void BenchmarkEmpty() {
  volatile int64_t var = 0;
  const int64_t nIterations = 234567890;
  auto start = std::chrono::high_resolution_clock::now();
  for (int64_t i = 0; i < nIterations; i++) {
    var++;
  }
  auto elapsed = std::chrono::high_resolution_clock::now() - start;
  double nSec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
  printf("Empty: %.3lf Ops/sec, var=%lld\n", nIterations / nSec, (long long)var);
}

int __cdecl main() {
  BenchmarkFunctor();
  BenchmarkObject();
  BenchmarkMSVCpp();
  BenchmarkTemplate();
  BenchmarkEmpty();
  return 0;
}

The results on my Ryzen 1800X @3.9Ghz with DDR4 @2.6Ghz CL13 were:

Functor: 175148825.946 Ops/sec, var=234567890
Object: 553446751.181 Ops/sec, var=234567890
__finally: 553832236.221 Ops/sec, var=234567890
Template: 554964345.876 Ops/sec, var=234567890
Empty: 554468478.903 Ops/sec, var=234567890

Apparently, all the options except functor-base (#1) are as fast as an empty loop.

So is there a fast and powerful C++ alternative to finally, which is portable and requires minimum copying from the stack of the outer function?

UPDATE: I've benchmarked @Jarod42 solution, so here in the question is updated code and output. Though as mentioned by @Sopel, it may break if copy elision is not performed.

UPDATE2: To clarify what I'm asking for is a convenient fast way in C++ to execute a block of code even if an exception is thrown. For the reasons mentioned in the question, some ways are slow or inconvenient.

Witchery answered 13/6, 2017 at 11:54 Comment(11)
Sure, RAII. Use types that clean up themselves and no matter how the scope is exited the resources are cleaned up.Idalia
@NathanOliver, RAII is in option #2, function BenchmarkObject(). I've listed its disadvantages: mainly that it takes substantial memory on the stack and requires copying from the stack of the outer function.Witchery
This is just pure speculation, but one of the reasons that C++ doesn't have a finally clause could be that exceptions in C++ are expensive when thrown, and therefore should only be used for truly exceptional cases. That of course leads to try-catch blocks being uncommon, and mostly used to do some error reporting and then rethrowing the exception so the application terminates. Which means there's really no use for a finally clause. This is unlike other languages where exceptions are the normal error-handling function.Laforge
@SergeRogatch RAII compliant object don't have to take too much memory on the stack (cf smart pointer) and the copy issue can be resolved via copy elision and move semantic.Dildo
Wait, I don't get it. If your benchmark showed that "all the operations except functor-base are as fast as an empty loop", then this disproved your assumption that RAII would be slow become of some kind of "copying overhead", which means that there is no question here. Use RAII like everyone else who programs in C++. You don't need to be on the lookout for an "alternative to finally".Goldy
@SergeRogatch I don't mean make finally RAII but make all the types you use RAII. Like lets say you're using int foo* = new int[some_num]; int bar* = new[some_num]; we can replace that with std::unique_ptr<int[]> and if an exception is raised they will be cleaned up automatically. You don't have to do anything. All it cost is a destructor, which often times is minimal if anything.Idalia
I think Serge uses the finally-functor as RAII option. With RAII it's meant that you don't have to care about resource cleanup at all. So do not use the functor but use self cleaning types.Badenpowell
As an addendum about the talk about RAII, with proper use of it and the standard containers and smart pointers etc., one could also follow the rule of zero which will make life much easier in general and also have the effect that variables with automatic storage duration (a.k.a. local variables) will simply be cleaned up nicely when they get out of scope. Which means you can use blocks and scoping as a way of handling finally, as in { SomeType a; try { ... } catch { ... } /* automatic "finally" of variable a */ }Laforge
you can find some info in this video from Andrei Alexandrescu (cppcon 2015). It explain how to create callback which are called when you go out of scope, when an exception is raised, when no exception is raised.Dildo
@CodyGray, option #2 stack and copying overhead is not accute in my example because the Cleaner does not need here a lot of variables from the outer function. But in practice it needs several: at least the array pointer and the number of objects in the array, so to call their destructors explicitly before returning the memory as void* to a memory pool. And I need multiple cleaners for multiple resources allocated at different stages within a function.Witchery
@Someprogrammerdude: Not all finally handles really resource, some might restore state, and creating RAII class for each such case would just repeat a pattern which can be factorized by finally.Reduction
R
12

You can implement Finally without type erasure and overhead of std::function:

template <typename F>
class Finally {
    F f;
public:
    template <typename Func>
    Finally(Func&& func) : f(std::forward<Func>(func)) {}
    ~Finally() { f(); }

    Finally(const Finally&) = delete;
    Finally(Finally&&) = delete;
    Finally& operator =(const Finally&) = delete;
    Finally& operator =(Finally&&) = delete;
};

template <typename F>
Finally<F> make_finally(F&& f)
{
    return { std::forward<F>(f) };
}

And use it like:

auto&& doFinally = make_finally([&] { var++; });

Demo

Reduction answered 13/6, 2017 at 12:10 Comment(14)
make sure to wrap the destructor in try/catch as f may throw, or do SFINAE magic to choose the destructor overload based on the noexcept specifierZoraidazorana
@DavidHaim: In fact it is more complicated, see finally-scopeexit and its variance ScopeFailed ScopeSuccessReduction
Or C++17 style Finally do_finally { [&]{++var;} }.Inapposite
Neother auto doFinally = make_finally([&] { var++; });, nor Finally do_finally { [&]{++var;} } compiles in MSVC++2017 for me. Did you mean F and Func to be the same in class Finally?Witchery
won't this break if copy elision is not performed (pre c++17)?Chianti
Sopel Yes! In this case, Finally should have a boolean flag that allows it to be disabled, and moving from an object should disable it.Shluh
@Sopel, yes, this involves copy constructor: adding Finally(const Finally&) = delete; breaks compilation. But without this, it may break at run time.Witchery
I implemented a type erasing function wrapper that works with move-only function objects (very similar to std::function). The main overhead of std::function is probably from memory allocation because it uses the heap for storage. The standard allows for small object optimization though and even seems to mandate it for function pointers and reference wrappers.Shluh
Serge Rogatch, it's not about copying vs. moving though. The implicitly generate move constructor would not be better. E.g. Finally(Finally&&) = default; fixes the compile error but not the behavior.Shluh
@SergeRogatch: Fix sample to have expected behavior even without guaranteed copy elision from C++17.Reduction
@Reduction , this still does not compile for me. Please, also check template <typename F> class Finally and template <typename Func> Finally(Func&& func). I guessed that you meant to just use F for the constructor too.Witchery
Serge Rogatch: You have to use auto&& or const auto & with make_finally (at least pre-17).Shluh
I've found that whether it compiles or not depends on whether return { std::forward<F>(f) }; or return Finally<F>{ std::forward<F>(f) }; is used. So isn't is compiler bug?Witchery
There is a subtle difference indeed, the later doesn't make a copy/move but construct the return object "in place".Reduction
P
0

Well, it's your benchmark that's broken: It does not actually throw, so you only see the non-exception path. This is quite bad as the optimizer can prove that you don't throw, so it can throw away all code that actually handles performing cleanup with an exception in flight.

I think, you should repeat your benchmark, putting a call to exceptionThrower() or nonthrowingThrower() into your try{} block. These two functions should be compiled as a separate translation unit, and only linked together with the benchmark code. That will force the compiler to actually generate exception handling code irrespective of whether you call exceptionThrower() or nonthrowingThrower(). (Make sure that you don't switch on link time optimizations, that could spoil the effect.)

This will also allow you to easily compare the performance impacts between the exception and the non-throwing execution paths.


Apart from the benchmark issues, exceptions in C++ are slow. You'll never get hundreds of millions of exceptions thrown within a second. It's more around single digit millions at best, likely less. I expect that any performance differences between different finally implementations are entirely irrelevant in the throwing case. What you can optimize is the non-throwing path, where your cost is simply the construction/destruction of your finally implementation object, whatever that is.

Priory answered 14/6, 2017 at 7:50 Comment(3)
It is very important how successful optimizer is with such finally: so if it decides not to copy variables and and to inline the releasing function - that's very good for performance. And of course non-exceptional scenario is much more important for performance than exceptional. My benchmark may be not that good, but it's also not totally bad: because the variables is volatile, the compiler can't throw away its increment, which I do in the releasing function.Witchery
@SergeRogatch Ah, I didn't see that volatile. That does indeed take care of the issue with the finally body quite nicely. However, the issue with exception generation remains: There is a difference between compiling code that is known not to throw, and compiling code that is not known not to throw. I'll now edit my answer to reflect the volatile correctly.Priory
@SergeRogatch I've now updated the answer.Priory

© 2022 - 2024 — McMap. All rights reserved.