What is a loop with an ID-dependent backward branch?

Asked 17/12, 2021 at 9:27 Answered 18/6, 2024 at 14:27

I am trying to understand this clang-tidy warning: altera-id-dependent-backward-branch that seems to be triggered by this loop.

        for(; first != current; ++first)

The example I have is this code, that looks almost exactly as a perfectly reasonable implementation of std::uninitialized_fill_n.

The static analyzer complains that:

error: backward branch (for loop) is ID-dependent due to variable reference to 'current' and may cause performance degradation [altera-id-dependent-backward-branch,-warnings-as-errors]
                        for(; current != first; ++first) {

uninitialized_fill_n(ForwardIt first, Size n, T const& v) {
    ForwardIt current = first;
    try {
        for(; n > 0; ++current, --n) {
            new (std::addressof(*current)) T(v);
        }
        return current;
    } catch(...) {
        for(; first != current; ++first) {  // clang-tidy error here
            std::addressof(*first)->~T();
        }
        throw;
    }
}

I tried different ways to rewrite the code (for example making the loop backward), to suppress this warning but I couldn't. Is there a standard way to rewrite the fallback (catch) loop in a way that is ok with altera-id-dependent-backward-branch?

I am using clang-tidy 13.0.0.

I also found another instance of the same warning in my code base (this is an implementation of a hash function, by Howard Hinnart):

auto fnv1a(void const* key, std::size_t len, std::size_t h) noexcept {
    auto const *p = static_cast<unsigned char const*>(key);
    unsigned char const* const e = p + len;
    for(; p < e; ++p) {  // warning here
        h = (h ^ *p) * 1099511628211U; // prime
    }
    return h;
}

Slouch answered 17/12, 2021 at 9:27 Comment(5)

May be interesting: LLVM initial code review for this check: reviews.llvm.org/D70094/new – Neurology 17/12, 2021 at 18:5

I don't get, the examples seem to imply variables that depend on thread id? – Slouch 17/12, 2021 at 21:47

Me neither. It seems to me that this is a highly OpenCL-specific diagnostics, I have no idea why it a) made its way into clang-tidy; b) is triggered on your code, I don't see anything that resembles an access to thread id. – Neurology 17/12, 2021 at 22:10

I give up, I will disable this diagnosis. Even a "triangular loop" for(int i = 0; i = N; ++i) for(int j = 0; j != i; ++j) produces this warning. – Slouch 17/12, 2021 at 23:32

I've read some LLVM code and it looks like a bug – Neurology 18/12, 2021 at 9:41

The altera-id-dependent-backward-branch warning originates from clang-tidy's FPGA-specific rules. Specifically, when targeting FPGAs, the compiler transforms C++ into hardware, and hardware has different performance characteristics compared to traditional CPUs.

The warning you're seeing is because the branch in the loop depends on an induction variable (current in your first case and p in the second case). When synthesizing loops into hardware, branches that depend on induction variables may result in longer pipeline stalls. The exact impact is hardware-specific, but for high-performance loops, the impact can be non-trivial.

To address the warning, the loop's termination condition needs to be made independent of the induction variable.

Let's start with the first example:

for(; first != current; ++first) {  // clang-tidy error here
    std::addressof(*first)->~T();
}

You can determine the number of iterations in advance and then use a fixed loop bound:

auto distance = std::distance(first, current);
for(auto i = 0; i < distance; ++i, ++first) {
    std::addressof(*first)->~T();
}

For the second example:

for(; p < e; ++p) {  // warning here
    h = (h ^ *p) * 1099511628211U; // prime
}

You can rewrite it as:

auto distance = e - p;
for(std::size_t i = 0; i < distance; ++i, ++p) {
    h = (h ^ *p) * 1099511628211U; // prime
}

This transformation ensures that the loop bounds are fixed and do not depend on an induction variable. As a result, the FPGA synthesizer should have an easier time creating an efficient pipeline.

That said, you should always test the synthesized results to verify that the performance meets your expectations. Depending on the specific hardware and the loop's complexity, there may still be performance trade-offs.

Lenee answered 20/9, 2023 at 23:59 Comment(2)

Thank you for the explanation. In practice, I interpret that the compiler would like to know how many times the loop is going to be executed before entering the loop. This is only possible sometimes, and in the other cases, I guess the compiler could deduce it, but it isn't doing that. In the first example, it is not possible to (even manually) know in advance how many times the loop will be executed, except for specialization for RandomAccess iterator, general ForwardIterator, it is not possible to calculate the distance in O(1) and effectively std::distance will be itself a loop. – Slouch 21/9, 2023 at 3:46

For the second example, the code transformation is simple (so simple that the compiler might be doing this optimization automatically anyway). I will take that into account next time. – Slouch 21/9, 2023 at 3:47

Altera is for hardware accelerators (which unroll loops), thus do not want to recompute non-trivial S::end() (suchas for(auto x = s->begin(); s->end() != x; ++x) {}) for each iteration of loops.

std::for_each (or ranged-base for loops such as for(auto &x : s) {}) should solve this: Is the ranged based for loop beneficial to performance? shows the template internals of this.

Note: for(auto x : s) {} can trigger problems (such as performance-for-range-copy or stack-use-after-scope) unless you mark iterators as references (for(auto &x : s) {})

Rettke answered 18/6, 2024 at 14:27 Comment(1)

mmm, I see the problem across the board, no only on recompilation of end. – Slouch 18/6, 2024 at 16:29

Recommended topics

Hot tags