How to use MSVC intrinsics to get the equivalent of this GCC code?
Asked Answered
B

6

20

The following code calls the builtin functions for clz/ctz in GCC and, on other systems, has C versions. Obviously, the C versions are a bit suboptimal if the system has a builtin clz/ctz instruction, like x86 and ARM.

#ifdef __GNUC__
#define clz(x) __builtin_clz(x)
#define ctz(x) __builtin_ctz(x)
#else
static uint32_t ALWAYS_INLINE popcnt( uint32_t x )
{
    x -= ((x >> 1) & 0x55555555);
    x = (((x >> 2) & 0x33333333) + (x & 0x33333333));
    x = (((x >> 4) + x) & 0x0f0f0f0f);
    x += (x >> 8);
    x += (x >> 16);
    return x & 0x0000003f;
}
static uint32_t ALWAYS_INLINE clz( uint32_t x )
{
    x |= (x >> 1);
    x |= (x >> 2);
    x |= (x >> 4);
    x |= (x >> 8);
    x |= (x >> 16);
    return 32 - popcnt(x);
}
static uint32_t ALWAYS_INLINE ctz( uint32_t x )
{
    return popcnt((x & -x) - 1);
}

#endif

What functions do I need to call, which headers do I need to include, etc to add a proper ifdef for MSVC here? I've already looked at this page, but I'm not entirely sure what the #pragma is for (is it required?) and what restrictions it puts on MSVC version requirements for compilation. As someone who doesn't really use MSVC, I also don't know whether these intrinsics have C equivalents on other architectures, or whether I have to #ifdef x86/x86_64 as well when #defining them.

Brade answered 10/12, 2008 at 13:0 Comment(3)
The page you refer to above refers to a function that is part of the .NET runtime, are you trying to build your program for .NET or as a native Windows executable?Hondo
It's a native Windows executable--part of the reason I'm asking is that I've found it rather difficult to find Microsoft documentation pages that actually talk about C these days.Brade
Libcxx implementation github.com/llvm-mirror/libcxx/blob/…Electrum
S
30

Bouncing from sh0dan code, the implementation should be corrected like this :

#ifdef _MSC_VER
#include <intrin.h>

uint32_t __inline ctz( uint32_t value )
{
    DWORD trailing_zero = 0;

    if ( _BitScanForward( &trailing_zero, value ) )
    {
        return trailing_zero;
    }
    else
    {
        // This is undefined, I better choose 32 than 0
        return 32;
    }
}

uint32_t __inline clz( uint32_t value )
{
    DWORD leading_zero = 0;

    if ( _BitScanReverse( &leading_zero, value ) )
    {
       return 31 - leading_zero;
    }
    else
    {
         // Same remarks as above
         return 32;
    }
}
#endif

As commented in the code, both ctz and clz are undefined if value is 0. In our abstraction, we fixed __builtin_clz(value) as (value?__builtin_clz(value):32) but it's a choice

Schematic answered 9/12, 2013 at 10:23 Comment(5)
An almost 1-to-1 replacement for __builtin_clz() in MSVC is __lzcnt(). The hardware must support SSE4 though. More info.Formation
My hardware supports SSE4, but not BMI1, so __lzcnt() compiles but doesn't do what I'd expect, rather working as a BSR.Motteo
31 ^__builtin_clz is equal to _BitScanReverseElectrum
Note that GNU C __builtin_ctz and clz also have undefined behaviour when the input value is 0 (gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html); this allows them to inline as a single bsf instruction (or 31 ^ bsr which works for the range of defined outputs). If you need to handle possibly-zero inputs, then you'd want similar wrappers for the GNU C builtins, so the appropriate thing would to have a portability layer around BSF / 31^BSR, and then zero-handling on top of that... And using lzcnt #ifdef __BMI1__.Boneyard
Related: VS: unexpected optimization behavior with _BitScanReverse64 intrinsic - the MSVC doesn't expose the destination-unmodified behaviour of the asm instruction, even though it has an API that could do that. (So you don't need to initialize the index output arg; it doesn't hurt though, the compiler knows it's an output-only operand of the intrinsic.)Boneyard
D
3
  1. The equivalent function for int __builtin_ctz (unsigned int x) in MSVC is unsigned int _tzcnt_u32 (unsigned int a) for 32 bit integer and returns count of trailing zeros. For 64 bit use unsigned __int64 _tzcnt_u64 (unsigned __int64 a) 1.

  2. The equivalent function for int __builtin_clz (unsigned int x) in MSVC is unsigned int _lzcnt_u32 (unsigned int a) for 32 bit integer and returns count of leading zeros. For 64 bit use unsigned __int64 _lzcnt_u64 (unsigned __int64 a) 2

C++ Header: immintrin.h

Drab answered 3/5, 2021 at 16:35 Comment(8)
Not all computers have BMI1, so lzcnt may decode as bsr and give 31-clz instead of the clz you're expecting. There is an MSVC intrinsic for bsr, specifically _BitScanReverse. They're only equivalent if you would / could have compiled with gcc -mbmi (or of course gcc -march=haswell or something that includes BMI1). See Does x64 support imply BMI1 support? for old-hardware compat issues (and current-gen Pentium / Celeron low-end CPUs, thanks Intel) with these intrinsics. This is a useful answer, but only with an edit to mention that.Boneyard
All processor may not support BMI1 instruction set. However, based on detection of BMI1 instructions availability, lzcnt can be used. Otherwise bsr can be used.Drab
Yes, exactly. You have to check your CPU and manually use clz = 31-bsr(x); to exactly emulate __builtin_clz on CPUs without BMI1. But your answer doesn't say that. It wrongly implies that _lzcnt_u32 will give you identical results to __builtin_clz in general. But unlike GCC, MSVC lets you use intrinsics without needing to compile with any -march=haswell equivalent to "promise" that you'll only run the binary on CPUs that support some instruction-set extensions.Boneyard
(And BTW, to exactly emulate _lzcnt_u32 without BMI1, you'd want lzcnt = x==0 ? 32 : 31-bsr(x);. __builtin_clz has undefined behaviour if run with an input of 0, allowing it to compile to just a bsr(x) ^ 31, but lzcnt has well-defined behaviour for input == 0.)Boneyard
I didn't get the sense of what is "manually use clz = 31-bsr(x)" and why do you want to get leading / trailing zero count for input == 0?Drab
BSR doesn't handle input==0, but _lzcnt_u32 is well-defined to return 32 in that case. If you want to emulate lzcnt in terms of BSR, you need to special-case input == 0. As far as "manually", I mean if you can't assume that your code will always run on a machine with lzcnt, you should use _BitScanReverse(&tmp, x); return 31 - tmp; to get the same result (for all non-zero x). felixcloutier.com/x86/lzcnt explains how it differs from BSR.Boneyard
Ok... So, you mean there is no way to detect if BMI1 is supported or not and we need to assume always before usage of lzcnt, is it the concern? As per GCC documentation, __builtin_clz / ctz result is undefined for input == 0 as well. So, you wanted to use clz/ctz intrinsics to count total number of bits by providing 0, is it correct?Drab
Well you could write 2 versions of your function that uses clz, and do runtime dispatching (e.g. via a function pointer that you set once based on cpuid). But like most of the rest of BMI1/BMI2, it's usually not worth it to actually dispatch for, just use one version that does work everywhere, using BSR. If input==0 is possible for your use-case, then yeah you need some kind of workaround. Like bsr(x|1) so there's always a bit for it to find, instead of actually branching. So if x was 0 or 1, you get bsr(x|1) == 0.Boneyard
C
2

I find it in a korean website https://torbjorn.tistory.com/317 In msvc compiler, you can use __lzcnt(unsigned int) to replace __builtin_clz(unsigned int) in gcc compiler.

Charbonnier answered 17/10, 2020 at 2:7 Comment(1)
Note that the lzcnt instruction requires BMI1. On older CPUs, it executes as bsr, giving 31-lzcnt (and leaving the destination register unmodified for input=0). GCC will only expand __builtin_clz as lznct if you compile with -march=haswell or similar options.Boneyard
K
0

If MSVC has a compiler intrinsic for this, it'll be here:

Compiler Intrinsics on MSDN

Otherwise, you'll have to write it using __asm

Kluge answered 10/12, 2008 at 17:33 Comment(0)
S
-3

Tested on linux and windows (x86) :

#ifdef WIN32
    #include <intrin.h>
    static uint32_t __inline __builtin_clz(uint32_t x) {
        unsigned long r = 0;
        _BitScanReverse(&r, x);
        return (31-r);
    }
#endif

uint32_t clz64(const uint64_t x)
{
    uint32_t u32 = (x >> 32);
    uint32_t result = u32 ? __builtin_clz(u32) : 32;
    if (result == 32) {
        u32 = x & 0xFFFFFFFFUL;
        result += (u32 ? __builtin_clz(u32) : 32);
    }
    return result;
}
Seaddon answered 13/9, 2014 at 12:17 Comment(2)
Have you tested performance of your clz64? I wouldn't be surprised that all this branching makes it slower than the OP's implementation.Gladsome
Use __builtin_clzll like a normal person if you want 64-bit integer support on GNU C. Writing it this way will probably stop GCC from using a single 64-bit bsr or lzcnt in 64-bit builds. (But then you'd have a 64-bit MSVC intrinsic available, too.)Boneyard
C
-4

There are two intrinsics "_BitScanForward" and "_BitScanReverse", which suits the same purpose for MSVC. Include . The functions are:

#ifdef _MSC_VER
#include <intrin.h>

static uint32_t __inline ctz( uint32_t x )
{
   int r = 0;
   _BitScanReverse(&r, x);
   return r;
}

static uint32_t __inline clz( uint32_t x )
{
   int r = 0;
   _BitScanForward(&r, x);
   return r;
}
#endif

There are equivalent 64bit versions "_BitScanForward64" and "_BitScanReverse64".

Read more here:

x86 Intrinsics on MSDN

Coricoriaceous answered 29/3, 2011 at 6:44 Comment(1)
ctz & clz call the wrong functions (they should be using _BitScanForward & BitScanReverse respectively, not BitScanReverse/BitScanForward) & clz is wrong since it returns the offset of the bit set instead of the number of leading zeroes.Betrothed

© 2022 - 2024 — McMap. All rights reserved.