How to use MSVC intrinsics to get the equivalent of this GCC code?

Asked 10/12, 2008 at 13:0 Answered 3/5, 2021 at 16:35

The following code calls the builtin functions for clz/ctz in GCC and, on other systems, has C versions. Obviously, the C versions are a bit suboptimal if the system has a builtin clz/ctz instruction, like x86 and ARM.

#ifdef __GNUC__
#define clz(x) __builtin_clz(x)
#define ctz(x) __builtin_ctz(x)
#else
static uint32_t ALWAYS_INLINE popcnt( uint32_t x )
{
    x -= ((x >> 1) & 0x55555555);
    x = (((x >> 2) & 0x33333333) + (x & 0x33333333));
    x = (((x >> 4) + x) & 0x0f0f0f0f);
    x += (x >> 8);
    x += (x >> 16);
    return x & 0x0000003f;
}
static uint32_t ALWAYS_INLINE clz( uint32_t x )
{
    x |= (x >> 1);
    x |= (x >> 2);
    x |= (x >> 4);
    x |= (x >> 8);
    x |= (x >> 16);
    return 32 - popcnt(x);
}
static uint32_t ALWAYS_INLINE ctz( uint32_t x )
{
    return popcnt((x & -x) - 1);
}

#endif

What functions do I need to call, which headers do I need to include, etc to add a proper ifdef for MSVC here? I've already looked at this page, but I'm not entirely sure what the #pragma is for (is it required?) and what restrictions it puts on MSVC version requirements for compilation. As someone who doesn't really use MSVC, I also don't know whether these intrinsics have C equivalents on other architectures, or whether I have to #ifdef x86/x86_64 as well when #defining them.

Brade answered 10/12, 2008 at 13:0 Comment(3)

The page you refer to above refers to a function that is part of the .NET runtime, are you trying to build your program for .NET or as a native Windows executable? – Hondo 10/12, 2008 at 13:49

It's a native Windows executable--part of the reason I'm asking is that I've found it rather difficult to find Microsoft documentation pages that actually talk about C these days. – Brade 10/12, 2008 at 18:4

Libcxx implementation github.com/llvm-mirror/libcxx/blob/… – Electrum 3/4, 2017 at 10:35

Bouncing from sh0dan code, the implementation should be corrected like this :

#ifdef _MSC_VER
#include <intrin.h>

uint32_t __inline ctz( uint32_t value )
{
    DWORD trailing_zero = 0;

    if ( _BitScanForward( &trailing_zero, value ) )
    {
        return trailing_zero;
    }
    else
    {
        // This is undefined, I better choose 32 than 0
        return 32;
    }
}

uint32_t __inline clz( uint32_t value )
{
    DWORD leading_zero = 0;

    if ( _BitScanReverse( &leading_zero, value ) )
    {
       return 31 - leading_zero;
    }
    else
    {
         // Same remarks as above
         return 32;
    }
}
#endif

As commented in the code, both ctz and clz are undefined if value is 0. In our abstraction, we fixed __builtin_clz(value) as (value?__builtin_clz(value):32) but it's a choice

Schematic answered 9/12, 2013 at 10:23 Comment(5)

An almost 1-to-1 replacement for __builtin_clz() in MSVC is __lzcnt(). The hardware must support SSE4 though. More info. – Formation 27/1, 2016 at 10:23

My hardware supports SSE4, but not BMI1, so __lzcnt() compiles but doesn't do what I'd expect, rather working as a BSR. – Motteo 14/3, 2017 at 15:36

31 ^__builtin_clz is equal to _BitScanReverse – Electrum 16/2, 2018 at 13:16

Note that GNU C __builtin_ctz and clz also have undefined behaviour when the input value is 0 (gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html); this allows them to inline as a single bsf instruction (or 31 ^ bsr which works for the range of defined outputs). If you need to handle possibly-zero inputs, then you'd want similar wrappers for the GNU C builtins, so the appropriate thing would to have a portability layer around BSF / 31^BSR, and then zero-handling on top of that... And using lzcnt #ifdef __BMI1__. – Boneyard 17/10, 2020 at 10:4

Related: VS: unexpected optimization behavior with _BitScanReverse64 intrinsic - the MSVC doesn't expose the destination-unmodified behaviour of the asm instruction, even though it has an API that could do that. (So you don't need to initialize the index output arg; it doesn't hurt though, the compiler knows it's an output-only operand of the intrinsic.) – Boneyard 17/10, 2020 at 10:8

The equivalent function for int __builtin_ctz (unsigned int x) in MSVC is unsigned int _tzcnt_u32 (unsigned int a) for 32 bit integer and returns count of trailing zeros. For 64 bit use unsigned __int64 _tzcnt_u64 (unsigned __int64 a) 1.
The equivalent function for int __builtin_clz (unsigned int x) in MSVC is unsigned int _lzcnt_u32 (unsigned int a) for 32 bit integer and returns count of leading zeros. For 64 bit use unsigned __int64 _lzcnt_u64 (unsigned __int64 a) 2

C++ Header: immintrin.h

Drab answered 3/5, 2021 at 16:35 Comment(8)

Not all computers have BMI1, so lzcnt may decode as bsr and give 31-clz instead of the clz you're expecting. There is an MSVC intrinsic for bsr, specifically _BitScanReverse. They're only equivalent if you would / could have compiled with gcc -mbmi (or of course gcc -march=haswell or something that includes BMI1). See Does x64 support imply BMI1 support? for old-hardware compat issues (and current-gen Pentium / Celeron low-end CPUs, thanks Intel) with these intrinsics. This is a useful answer, but only with an edit to mention that. – Boneyard 3/5, 2021 at 16:42

All processor may not support BMI1 instruction set. However, based on detection of BMI1 instructions availability, lzcnt can be used. Otherwise bsr can be used. – Drab 3/5, 2021 at 19:35

Yes, exactly. You have to check your CPU and manually use clz = 31-bsr(x); to exactly emulate __builtin_clz on CPUs without BMI1. But your answer doesn't say that. It wrongly implies that _lzcnt_u32 will give you identical results to __builtin_clz in general. But unlike GCC, MSVC lets you use intrinsics without needing to compile with any -march=haswell equivalent to "promise" that you'll only run the binary on CPUs that support some instruction-set extensions. – Boneyard 4/5, 2021 at 4:38

(And BTW, to exactly emulate _lzcnt_u32 without BMI1, you'd want lzcnt = x==0 ? 32 : 31-bsr(x);. __builtin_clz has undefined behaviour if run with an input of 0, allowing it to compile to just a bsr(x) ^ 31, but lzcnt has well-defined behaviour for input == 0.) – Boneyard 4/5, 2021 at 4:40

I didn't get the sense of what is "manually use clz = 31-bsr(x)" and why do you want to get leading / trailing zero count for input == 0? – Drab 4/5, 2021 at 7:1

BSR doesn't handle input==0, but _lzcnt_u32 is well-defined to return 32 in that case. If you want to emulate lzcnt in terms of BSR, you need to special-case input == 0. As far as "manually", I mean if you can't assume that your code will always run on a machine with lzcnt, you should use _BitScanReverse(&tmp, x); return 31 - tmp; to get the same result (for all non-zero x). felixcloutier.com/x86/lzcnt explains how it differs from BSR. – Boneyard 4/5, 2021 at 7:5

Ok... So, you mean there is no way to detect if BMI1 is supported or not and we need to assume always before usage of lzcnt, is it the concern? As per GCC documentation, __builtin_clz / ctz result is undefined for input == 0 as well. So, you wanted to use clz/ctz intrinsics to count total number of bits by providing 0, is it correct? – Drab 4/5, 2021 at 7:26

Well you could write 2 versions of your function that uses clz, and do runtime dispatching (e.g. via a function pointer that you set once based on cpuid). But like most of the rest of BMI1/BMI2, it's usually not worth it to actually dispatch for, just use one version that does work everywhere, using BSR. If input==0 is possible for your use-case, then yeah you need some kind of workaround. Like bsr(x|1) so there's always a bit for it to find, instead of actually branching. So if x was 0 or 1, you get bsr(x|1) == 0. – Boneyard 4/5, 2021 at 7:39

I find it in a korean website https://torbjorn.tistory.com/317 In msvc compiler, you can use __lzcnt(unsigned int) to replace __builtin_clz(unsigned int) in gcc compiler.

Charbonnier answered 17/10, 2020 at 2:7 Comment(1)

Note that the lzcnt instruction requires BMI1. On older CPUs, it executes as bsr, giving 31-lzcnt (and leaving the destination register unmodified for input=0). GCC will only expand __builtin_clz as lznct if you compile with -march=haswell or similar options. – Boneyard 17/10, 2020 at 9:56

If MSVC has a compiler intrinsic for this, it'll be here:

Compiler Intrinsics on MSDN

Otherwise, you'll have to write it using __asm

Kluge answered 10/12, 2008 at 17:33 Comment(0)

-3

Tested on linux and windows (x86) :

#ifdef WIN32
    #include <intrin.h>
    static uint32_t __inline __builtin_clz(uint32_t x) {
        unsigned long r = 0;
        _BitScanReverse(&r, x);
        return (31-r);
    }
#endif

uint32_t clz64(const uint64_t x)
{
    uint32_t u32 = (x >> 32);
    uint32_t result = u32 ? __builtin_clz(u32) : 32;
    if (result == 32) {
        u32 = x & 0xFFFFFFFFUL;
        result += (u32 ? __builtin_clz(u32) : 32);
    }
    return result;
}

Seaddon answered 13/9, 2014 at 12:17 Comment(2)

Have you tested performance of your clz64? I wouldn't be surprised that all this branching makes it slower than the OP's implementation. – Gladsome 24/9, 2016 at 13:13

Use __builtin_clzll like a normal person if you want 64-bit integer support on GNU C. Writing it this way will probably stop GCC from using a single 64-bit bsr or lzcnt in 64-bit builds. (But then you'd have a 64-bit MSVC intrinsic available, too.) – Boneyard 17/10, 2020 at 9:59

-4

There are two intrinsics "_BitScanForward" and "_BitScanReverse", which suits the same purpose for MSVC. Include . The functions are:

#ifdef _MSC_VER
#include <intrin.h>

static uint32_t __inline ctz( uint32_t x )
{
   int r = 0;
   _BitScanReverse(&r, x);
   return r;
}

static uint32_t __inline clz( uint32_t x )
{
   int r = 0;
   _BitScanForward(&r, x);
   return r;
}
#endif

There are equivalent 64bit versions "_BitScanForward64" and "_BitScanReverse64".

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags