Way to effectively call _BitScanReverse or __builtin_clz in constexpr functions?
Asked Answered
S

3

10

It seems that _BitScanReverse, despite being an intrinsic and not a real function, can't be called in a constexpr function in Visual C++. I'm aware that I can implement this operation myself in a much slower way, which would be fine for the case where it's evaluated at compile-time but it's unfortunate that it wouldn't just be the 1-clock-cycle, single CPU instruction (BSR) for the case where it's evaluated at run-time. I haven't tried __builtin_clz in GCC/Clang yet, but it might or might not have the same sort of issue, and I want this code to work across the major compilers, (with a slow fallback for non-GCC, non-Clang, non-VC compilers).

Ideas/Questions:

Is there an easy way to have a function use one block of code when evaluating at compile-time, so that it can be constexpr-safe, and a different block of code for run-time, so that it can be fast? (If so, this would also be relevant to a few other questions I have.)

Alternatively, is there a way to trick the compiler into being able to evaluate _BitScanReverse for constexpr code?

Side question:

Is there any word of plans to eventually add this to the C++ standard? They've added std::log2 and std::ilogb, but those both go through floating-point numbers, instead of just doing one BSR, (or CLZ and a subtraction on ARM chips).

Sunk answered 9/10, 2017 at 20:55 Comment(11)
I think with GCC-compatible compilers, you can use __builtin_constant_p. Unfortunately, I don't know whether msvc has a similar construct.Gottfried
@Gottfried Yeah, when searching, I found a commit in the Qt codebase where they removed constexpr from some functions, citing _BitScanForward as the reason, and they had equivalent functions using __builtin_clz on GCC, so I figured they're probably okay for constexpr. However, it wasn't explicitly stated there, and they removed constexpr from those functions as well, but possibly just for self-consistency across compilers.Sunk
This is why languages need to get new standardized portable ways to express things that hardware can do efficiently. Rust has foo.leading_zeros() for primitive types like i32. It's disappointing that C and C++ haven't kept up and still don't have standard bit-scan or popcnt functions. There's POSIX ffs() find first set (man7.org/linux/man-pages/man3/ffs.3.html), but it's just POSIX not ISO C or C++. Some compilers do recognize certain idioms, so you can sometimes write what looks like a dumb loop and have it compile to a popcnt instruction, but IDK about bit-scan.Alaniz
Yes, __builtin_clz is accepted in a constexpr function in gcc, with no warning even with -Wextra -Wpedantic. godbolt.org/g/6cai3A.Alaniz
@PeterCordes Thanks for checking __builtin_clz! I might try out that website you linked to for other things, if it's free to try. Regarding the C/C++ standards, I've been there so many times. I'm particularly disappointed that the Javascript standard is going to add SSE-like support before C or C++, though at least GCC added support for the Intel standard intrinsics for SSE instructions a few years ago, so that Visual C++ and GCC can use the same intrinsics there.Sunk
Matt Godbolt makes his compiler-explorer site available to everyone free of charge. It's pretty useful because he has MSVC, ICC, and nightly builds of the latest gcc and clang for x86-64. He also has MIPS, ARM, ARM64, and some other non-x86 gcc set up. Also, the code behind that site is open source; see the github link in the top right.Alaniz
re: SSE. Most simple stuff can auto-vectorize fairly decently in C and C++, because ahead-of-time compilers can take the time to do that. Javascript is always JIT-compiled or interpreted, so it makes more sense to need manual vectorization even for simple stuff. If you need to vectorize manually in C++, often you need some shuffling or other stuff that isn't universally available across different SIMD architectures (SSE2 doesn't even have any variable-control shuffles). Or you didn't use __restrict__, which ISO C++ doesn't have, even though C has had it since C99.Alaniz
It would make some sense for ISO C++ to standardize some basic SIMD support, like GNU C vector stuff (int __attribute__(( vector_size(16) ))). But there is already OpenMP (#pragma omp simd before a loop) to ask explicitly for auto-vectorization (without having to use -ffast-math).Alaniz
(Sorry for going off-topic.) Some simple to vectorize cases are slowly getting supported, thankfully, but it seems most cases with conditions have to be explicitly vectorized, and float/int bit tricks just confuse the compiler. I don't blame compiler writers for the tough stuff, though, since it's a bit like expecting the compiler to change bubble sort to quicksort; it's unlikely to ever happen, and if it does, it's questionable whether it should override the programmer's choice in that way.Sunk
@PeterCordes C/C++ does not even have a way method to provide the programmer add with carry adc which has been available since before 1980 for x86.Consumerism
@Zboson At least that can easily be emulated.Genethlialogy
O
3

Is there an easy way to have a function use one block of code when evaluating at compile-time, so that it can be constexpr-safe, and a different block of code for run-time, so that it can be fast? (If so, this would also be relevant to a few other questions I have.)

As of C++20, yes! std::is_constant_evaluated serves that exact purpose.

So you can now do:

#include <type_traits>

constexpr bool my_bitscan_reverse(unsigned long& index, unsigned long mask) {
    if (std::is_constant_evaluated()) {
        // do it in a constexpr-friendly manner
        // who cares if it's not as fast as possible
    } else {
        return _BitScanReverse(&index, mask);
    }
}

Olethea answered 17/7, 2021 at 14:58 Comment(0)
A
2

For C++20 you can use std::bit_width and others form <bit> header

template< class T >
constexpr T bit_width(T x) noexcept;
#include <bit>
#include <bitset>
#include <iostream>
#include <array>

using BitArray = std::array<unsigned, 8>;

constexpr BitArray Bits() {
    BitArray bits;
    for (unsigned x{0}; x != 8; ++x)
      bits[x] = std::bit_width(x);
    return bits;
}

auto main() -> int {
    constexpr BitArray bits = Bits();

    for (unsigned x{0}; x != 8; ++x) {
        std::cout
            << "bit_width( "
            << std::bitset<4>{x} << " ) = "
            << bits[x] << " "
            << std::bit_width(x) << '\n';
    }
}

Try it on Godbolt

Aggappe answered 17/7, 2021 at 13:59 Comment(0)
N
1

This function can compute reverse bit scan at compile time:

constexpr int bit_scan_reverse_const(uint64_t const n) {
    if (n == 0) return -1;
    uint64_t a = n, b = 0, j = 64, k = 0;
    do {
        j >>= 1;
        k = (uint64_t)1 << j;
        if (a >= k) {
            a >>= j;
            b += j;
        }
    } while (j > 0);
    return int(b);
}

(from vector class library https://github.com/vectorclass/version2/blob/master/instrset.h )

Neolamarckism answered 21/7, 2022 at 6:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.