Rounding up to next power of 2
Asked Answered
E

31

296

I want to write a function that returns the nearest next power of 2 number. For example if my input is 789, the output should be 1024. Is there any way of achieving this without using any loops but just using some bitwise operators?


Related: Algorithm for finding the smallest power of two that's greater or equal to a given value is a C++ question. C++20 introduced std:bit_ceil which lets the compiler do whatever's optimal for the target system, but nothing equivalent is yet available in portable ISO C for bit-scan, popcount or other common bit operations that most CPUs have. Portable C code has to be less efficient and/or more complicated.

Given an integer, how do I find the next largest power of two using bit-twiddling? is a language-agnostic version of the question with some C++11 and 17 constexpr using GNU extensions.

Answers to this question don't need to be portable; fast versions for various platforms are useful.

Eastwood answered 21/1, 2009 at 17:26 Comment(8)
See here for possible solutions: http://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2FloatVegetal
By way of clarification, do you need the nearest power of 2 (ie. 65 would give you 64, but 100 would give you 128) or the nearest above (ie. 65 would give you 128, and so would 100)?Olin
They're multiple questions matching this one. For example: stackoverflow.com/questions/364985/…Lindemann
possible duplicate of Given an integer, how do I find the next largest power of two using bit-twiddling?Radioisotope
@Radioisotope Your link is 8 months later than this question.Gothurd
Or convert to Rust and use doc.rust-lang.org/stable/std/… ;-) (Supposed to be in Hackers Delight section 3.2 too)Broomstick
@Nathan's linked thread is indeed posted later than the one here, but John Feminella's answer in that thread is superb. Readers may want to take a look.Duda
Heh. All the answers and comments here (and all the answers and comments at stackoverflow.com/questions/1322510/… ) make the potentially wrong assumption that the integer is unsigned and provide example code that is very broken for negative integers.Stochastic
W
222

Check the Bit Twiddling Hacks. You need to get the base 2 logarithm, then add 1 to that. Example for a 32-bit value:

Round up to the next highest power of 2

unsigned int v; // compute the next highest power of 2 of 32-bit v

v--;
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
v++;

The extension to other widths should be obvious.

An answer on Given an integer, how do I find the next largest power of two using bit-twiddling? presents some explanation of how it works, and examples of the bit-patterns for a couple inputs.

Wyoming answered 21/1, 2009 at 17:30 Comment(20)
This is not the most efficient solution because many processors have special instruction for counting leading zeros which can be used to compute log2 very efficiently. See en.wikipedia.org/wiki/Find_first_setMauldon
@Simon: it's the portable solution. There's no common efficient algorithm for all architecturesOriginate
What if the number itself is a power of two?Crossbones
@Crossbones did you read the codes at bit twidding hacks? The power-of-2 case has already treated specificallyOriginate
@rishta the ^ operator in C isn't power. And that's the slowest solutionOriginate
Internet Archive to the rescue: web.archive.org/web/20160703165415/https://… This is still a lazy answer so I downvoted.Roderica
Why there is 5 times v |= v >> repeated? I tried it with 2 time and it also works, eg:v--; console.log(v); v |= v >> 1; console.log(v); v |= v >> 2; console.log(v); v++; Using latest Chrome on Debian 9 64bitNabonidus
Any chance we could get a macro version of this?Autunite
This thread is still well referenced but this answer (and most others) are highly outdated. CPUs have an instruction to help this (actually already at that time?). From : jameshfisher.com/2018/03/30/round-up-power-2.html uint64_t next_pow2(uint64_t x) { return x == 1 ? 1 : 1<<(64-__builtin_clzl(x-1)); } And for 32 bit : uint32_t next_pow2(uint32_t x) { return x == 1 ? 1 : 1<<(32-__builtin_clz(x-1)); } That is if you use GCC (and Clang I think?), but it would be wise to take the time to find the call to CLZ instead of copy-pasting all options around.Lepsy
@Lepsy This answer is still highly relevant and the best portable way to do it. Your 64-bit version has undefined behaviour if x > UINT32_MAX and isn't branchless. Also, GCC and Clang use -mtune=generic by default (as do most distros), so your code will NOT expand to the lzcnt instruction on x86_64 -- it'll actually expand to something MUCH slower (a libgcc routine) unless you use something like -march=native. So your proposed replacement is non-portable, buggy and (typically) slower.Singband
I'm not saying this is a full and definitive answer, or I would not have answered in comments. But a consideration for speed would be nice. On ubuntu, it works fine with default. Note the _clzl and _clz so it works for >UINT32_MAX too.Lepsy
@Lepsy that sounds like a case of "works on my machine". For me, next_pow2(1ULL << 34) returns 0 and the reason has nothing to do with _clzl vs. _clz. While your function is trivial to fix, the point I'm trying to make is simple -- if you're going to call other people's code "highly outdated", you should first make sure your own code is tested and working and not just a buggy copypasta.Singband
just a slightest note. No need to used post increment, decrement.Caswell
@Wyoming I feel like it would be more accurate to quote Bit Twiddling Hacks here and say that what needs to be found is the "result that may be expressed by the formula 1U << (lg(v - 1) + 1)", which is pow(2, (lg(v - 1) + 1). Because the snippet in the answer doesn't really compute base 2 logarithm, this does.Keening
Waity-minty, what if v is unity? I trow this solution needs a guard: if( v == 1 ) return 2;Sectionalism
@Ant_222 but 1 is a power of 2, it's just the zeroth power, i.e. pow(2, 0) == 1. special casing this isn't correct in the general case, even if it might be useful for your use caseMalagasy
@CraigBarnes x86 had bsr to calculate floor log2 efficiently since 386. GCC's __builtin_clz expands to bsr with -mtune=generic. You should use __builtin_clz to do this portably and efficiently.Keilakeily
@Keilakeily Fair point regarding bsr, but __builtin_clz is not portable. Of course it should be used when available, but that's tangential to the point I was making.Singband
Also has the advantage of working in higher-level languages, which some other answers do not.Typhus
@Lepsy C++20 actually added std::countl_zero, which would be the portable version of __builtin_clzUtterance
K
96

Mathematically speaking

next = pow(2, ceil(log(x)/log(2)));

This works by finding the number you'd have raise 2 by to get x (take the log of the number, and divide by the log of the desired base, see wikipedia for more). Then round that up with ceil to get the nearest whole number power.

This is a more general purpose (i.e. slower!) method than the bitwise methods linked elsewhere, but seem comments for caveats on actually implementing it this way.

Kuebbing answered 21/1, 2009 at 17:35 Comment(10)
From C99, you can also just use log2 if supported by your tools. GCC and VS don't seem to :(Trader
You're missing a bracket... next=pow(2, ceil(log(x)/log(2)));Thevenot
Be careful about float accuracy, though. log(pow(2,29))/log(2) = 29.000000000000004, so the result is 230 instead of returning 229. I think this is why log2 functions exist?Grimsby
Faster is powf(2, ceilf(log2f(val)))Mesomorphic
The cost of this is probably at least 200 cycles and it's not even correct. Why does this have so many upvotes?Apospory
@SuperflyJon But it mentions bitwise operators and I assume correctness is implied by any question unless noted otherwise.Chrysostom
Please do update the answer & use log2(). I made a mistake by not reading the comments and ended up with a great loss in a CP contest. :(Pylos
Can combine logarithms with bit operators: 1 << ceil(log2(x))Frediafredie
The main problem with this is say you're using a 64-bit integer value for x and ceil/log2 use a 64-bit double then there is no way to calculate for all values of x.Lodie
"This works" - No, it doesn't. The solution is mathematically correct but makes a fatal mistake: It (implicitly) assumes that floating point numbers in computing would model rational numbers. They don't. "good to know the maths, eh?" - Absolutely. But when you blindly translate an algorithm without acknowledging a change in rules, this becomes dangerous quickly. Please update this proposed answer or delete it if you cannot afford the diligence.Dejected
F
77

I think this works, too:

int power = 1;
while(power < x)
    power*=2;

And the answer is power.

Finnougric answered 20/9, 2012 at 4:46 Comment(7)
Fair enough the question asked for no loops. But as clever as some of the other functions are, for code that is not performance sensitive the answer that is quickly and easily understood and verified to be correct always wins for me.Leto
This is not returning the nearest power of 2, butthe power of that is immediatly bigger than X. Still very goodDanialah
Instead of multiplying, some bitwise "magic" can be used instead power <<= 1Capacitor
@Vallentin That should be automatically optimized by a compiler.Fez
Beware of infinite loop if x is too large (i.e. not enough bits to represent the next power of 2).Kiger
@Fez godbolt.org/z/ajvGsTs5b Clang trunk ARM 64 bit does not appear to optimize the loop at the moment.Cheatham
@Kiger It's worse than that. This invokes undefined behavior if power overflows.According
Q
55
unsigned long upper_power_of_two(unsigned long v)
{
    v--;
    v |= v >> 1;
    v |= v >> 2;
    v |= v >> 4;
    v |= v >> 8;
    v |= v >> 16;
    v++;
    return v;

}
Quadrilateral answered 21/1, 2009 at 17:39 Comment(12)
Would be nice if you have attributed it (unless you discovered it). It comes from the bit twiddling hacks page.Wyoming
Is that for a 32-bit number? Extension for 64-bit?Ruwenzori
Jonathan, you need to do it for the upper half, and if that is zero, you do it for the lower half.Wyoming
@florin, if v is a 64-bit type, couldn't you just add a "c |= v >> 32" after the one for 16?Atwater
Yup - it's from the bit-twiddling hacks as florin notedQuadrilateral
Becareful when using signed int. Values > 0x4000_0000_0000_0000 will return 0x8000_0000_0000_0000.Radioisotope
Input values > 0x8000_0000_0000_0000 will return 0. Input of 0 returns 0.Radioisotope
@Radioisotope printf("%lx\n", upper_power_of_two(0x8000000000000000)); --> 8000000000000000 when upper_power_of_two() uses 64-bit unsigned long.Jacal
I found some application in JDK ArrayDeque, but it doesn't have the first statement v-- I tested a few cases, seems okay, not sure what's the differenceUtimer
@Utimer I know this is an old comment, but I was curious, and after trying it it seems that without the decrement if the input is power of two, then the output is double the input, whereas with the decrement, a power of two is returned unchanged.Miniaturize
Code that only works for a specific bit width should be using fixed-width types instead of minimum-width types. This function should take and return a uint32_t.Singband
If you are using DPDK, rte_common.h has helper functions which use this mechanism such as rte_align32pow2 which aligns to the next power of 2.Heber
L
47

If you're using GCC, you might want to have a look at Optimizing the next_pow2() function by Lockless Inc.. This page describes a way to use built-in function builtin_clz() (count leading zero) and later use directly x86 (ia32) assembler instruction bsr (bit scan reverse), just like it's described in another answer's link to gamedev site. This code might be faster than those described in previous answer.

By the way, if you're not going to use assembler instruction and 64bit data type, you can use this

/**
 * return the smallest power of two value
 * greater than x
 *
 * Input range:  [2..2147483648]
 * Output range: [2..2147483648]
 *
 */
__attribute__ ((const))
static inline uint32_t p2(uint32_t x)
{
#if 0
    assert(x > 1);
    assert(x <= ((UINT32_MAX/2) + 1));
#endif

    return 1 << (32 - __builtin_clz (x - 1));
}
Lindemann answered 13/4, 2012 at 14:59 Comment(9)
Note that this returns the smallest power of 2 greater than OR equal to x. Changing (x -1) to x changes the function to return the smaller power of 2 greater than x.Prelusive
You can use _BitScanForward on Visual C++Polychrome
You can also use __builtin_ctz()Ponderous
@Ponderous __builtin_ctz() won't be useful to round any non power of 2 number up to the next power of twoLindemann
Please add in your answer a link to Wikipedia list of builtin bitwise functions for other compilers: en.wikipedia.org/wiki/Find_first_set#Tool_and_library_support                                Please provide also a 64-bits version. I propose the following C++11 function:              constexpr uint64_t nextPowerOfTwo64 (uint64_t x) { return 1ULL<<(sizeof(uint64_t) * 8 - __builtin_clzll(x)); }Wheaton
8 bits per byte is usually fine but CHAR_BIT is more accurate.Superpower
This looks quite strange and does't look portable, no offense.Small
@oHo: If you're using C++, C++20 provides std::countl_zero = CLZ and std::bit_width (index of MSB + 1 = x86 BSR+1) to finally provide a portable way for the language to expose something that most CPUs have been providing for years. In fact, C++20 even has std::bit_ceil which is the answer to this whole question. But this is a C question, and ISO C is still stuck in the dark ages in this regard, necessitating GNU extensions or other intrinsics to do it efficiently.Basilicata
@oHo: Also if you're using C++, use std::numeric_limits to get the number of bits in a type. uint64_t is specifically guaranteed to have 64, but it's an optional type. CHAR_BIT shouldn't be assumed to be 8 if you're going for max portability, e.g. to DSPs, but std::numeric_limits<T>::digits makes it easy.Basilicata
C
31

In standard c++20 this is included in <bit>. The answer is simply

#include <bit>
unsigned long upper_power_of_two(unsigned long v)
{
    return std::bit_ceil(v);
}

NOTE: The solution I gave is for c++, not c, I would give an answer this question instead, but it was closed as a duplicate of this one!

Castleman answered 6/4, 2021 at 19:50 Comment(3)
I agree that "solve this in C" is not a duplicate of "solve this in C++". You are demonstrating that. So I have reopened the duplicate you link to. Please feel free to move your C++ answer there.Jamboree
I've added a link to the C++ question to this C question (and even a mention of std::bit_ceil), so this no longer needs to be an answer here. Please delete. Your answer here incorrectly states that the linked question is closed; as Drew says it's been reopened.Basilicata
C23 will introduce stdc_bit_ceil_*() APIs. You can find the latest draft of the standard and read it.Cockcroft
C
24

One more, although I use cycle, but thi is much faster than math operands

power of two "floor" option:

int power = 1;
while (x >>= 1) power <<= 1;

power of two "ceil" option:

int power = 2;
x--;    // <<-- UPDATED
while (x >>= 1) power <<= 1;

UPDATE

As mentioned in comments there was mistake in ceil where its result was wrong.

Here are full functions:

unsigned power_floor(unsigned x) {
    int power = 1;
    while (x >>= 1) power <<= 1;
    return power;
}

unsigned power_ceil(unsigned x) {
    if (x <= 1) return 1;
    int power = 2;
    x--;
    while (x >>= 1) power <<= 1;
    return power;
}
Crabbed answered 19/7, 2014 at 20:46 Comment(3)
the result is not correct if x is power of 2. A micro to test if input is power of 2 is needed. #define ISPOW2(x) ((x) > 0 && !((x) & (x-1)))Dumbhead
@zorksylar more efficiently would be to if (x == 0) return 1; /* Or 0 (Which is what I use) */ x--; /* Rest of program */Sumy
Good solution! but the power of two "ceil" option is not correct. For example, when x = 2 the result should be 2 instead of 4Eckard
H
14

For any unsigned type, building on the Bit Twiddling Hacks:

#include <climits>
#include <type_traits>

template <typename UnsignedType>
UnsignedType round_up_to_power_of_2(UnsignedType v) {
  static_assert(std::is_unsigned<UnsignedType>::value, "Only works for unsigned types");
  v--;
  for (size_t i = 1; i < sizeof(v) * CHAR_BIT; i *= 2) //Prefer size_t "Warning comparison between signed and unsigned integer"
  {
    v |= v >> i;
  }
  return ++v;
}

There isn't really a loop there as the compiler knows at compile time the number of iterations.

Hild answered 13/1, 2015 at 5:31 Comment(4)
Note that the question is about C.Weaks
@Weaks Just replace UnsignedType and process it manually. I am pretty sure a C programmer can expand this simple template ignoring the std::is_unsigned<UnsignedType>::value assertion.Schenck
@Schenck Sure, it would be nice to have an answer in Javascript as well, just in case somebody wants to translate that to C.Weaks
@Weaks UnsignedType in JavaScript? Anyway, this solution shows how to do it for any UnsignedType, and it happen to be written in C++, rather than pseudocode [ sizeof(v)*CHAR_BIT instead of something like number of bits in an object of UnsignedType ].Schenck
M
13

Despite the question is tagged as c here my five cents. Lucky us, C++ 20 would include std::ceil2 and std::floor2 (see here). It is consexpr template functions, current GCC implementation uses bitshifting and works with any integral unsigned type.

Mettlesome answered 24/1, 2019 at 7:24 Comment(2)
They recently renamed it to bit_ceil open-std.org/JTC1/SC22/WG21/docs/papers/2020/p1956r1.pdfShwa
bit_floor and bit_ceil are now available in C++20 from the <bit> header. en.cppreference.com/w/cpp/header/bitTimpani
P
10

For IEEE floats you'd be able to do something like this.

int next_power_of_two(float a_F){
    int f = *(int*)&a_F;
    int b = f << 9 != 0; // If we're a power of two this is 0, otherwise this is 1

    f >>= 23; // remove factional part of floating point number
    f -= 127; // subtract 127 (the bias) from the exponent

    // adds one to the exponent if were not a power of two, 
    // then raises our new exponent to the power of two again.
    return (1 << (f + b)); 
}

If you need an integer solution and you're able to use inline assembly, BSR will give you the log2 of an integer on the x86. It counts how many right bits are set, which is exactly equal to the log2 of that number. Other processors have similar instructions (often), such as CLZ and depending on your compiler there might be an intrinsic available to do the work for you.

Piggin answered 21/1, 2009 at 18:15 Comment(3)
This is an interesting one eventhough not related to the question ( I want to roundoff only integers), will try out this one..Eastwood
Came up with it after reading the wikipedia article on floats. Besides that, I've used it to calculate square-roots in integer precision. Also nice, but even more unrelated.Piggin
This breaks the strict aliasing rules. On some compilers it may not work or issue a warning.Weaks
C
8

Here's my solution in C. Hope this helps!

int next_power_of_two(int n) {
    int i = 0;
    for (--n; n > 0; n >>= 1) {
        i++;
    }
    return 1 << i;
}
Crifasi answered 29/10, 2018 at 7:32 Comment(0)
L
7

In x86 you can use the sse4 bit manipulation instructions to make it fast.

//assume input is in eax
mov    ecx,31      
popcnt edx,eax   //cycle 1
lzcnt  eax,eax   //cycle 2
sub    ecx,eax
mov    eax,1
cmp    edx,1     //cycle 3
jle @done        //cycle 4 - popcnt says its a power of 2, return input unchanged
shl    eax,cl    //cycle 5
@done: rep ret   //cycle 5

In c you can use the matching intrinsics.

Or jumpless, which speeds up things by avoiding a misprediction due to a jump, but slows things down by lengthening the dependency chain. Time the code to see which works best for you.

//assume input is in eax
mov    ecx,31
popcnt edx,eax    //cycle 1
lzcnt  eax,eax
sub    ecx,eax
mov    eax,1      //cycle 2
cmp    edx,1
mov    edx,0     //cycle 3 
cmovle ecx,edx   //cycle 4 - ensure eax does not change
shl    eax,cl    
@done: rep ret   //cycle 5
Lenna answered 31/3, 2016 at 15:49 Comment(4)
Perhaps I'm missing something, but it doesn't look like this will work. It essentially left-shifts the value 2 by the count of leading zeros in the original number i.e. 2<<lzcnt(x). So if the argument is 3, which has 30 leading zeros, then the result is 2<<30 = 2147483648, but the result should be 4.Basketball
You should try and avoid the branch: replace it with mov edx,0; cmovle ecx,edxAniline
Most CPUs with BMI1 also have BMI2, so you can use shlx for single-uop variable-count shifts on Intel if you don't mind excluding some old AMD CPUs. Also, if popcnt(x) <= 1 is just to detect already being a power of 2, use the ZF output of BMI1 blsr dummy, eax - if clearing the lowest set bit made the result zero, the input was a power of 2, or zero. It's single-uop on Intel, 2 uops on AMD before Zen 4. (uops.info)Basilicata
It's easy to do this in half as many instructions and without BMI2. See my answer: https://mcmap.net/q/20261/-rounding-up-to-next-power-of-2Solmization
G
5
/*
** http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog
*/
#define __LOG2A(s) ((s &0xffffffff00000000) ? (32 +__LOG2B(s >>32)): (__LOG2B(s)))
#define __LOG2B(s) ((s &0xffff0000)         ? (16 +__LOG2C(s >>16)): (__LOG2C(s)))
#define __LOG2C(s) ((s &0xff00)             ? (8  +__LOG2D(s >>8)) : (__LOG2D(s)))
#define __LOG2D(s) ((s &0xf0)               ? (4  +__LOG2E(s >>4)) : (__LOG2E(s)))
#define __LOG2E(s) ((s &0xc)                ? (2  +__LOG2F(s >>2)) : (__LOG2F(s)))
#define __LOG2F(s) ((s &0x2)                ? (1)                  : (0))

#define LOG2_UINT64 __LOG2A
#define LOG2_UINT32 __LOG2B
#define LOG2_UINT16 __LOG2C
#define LOG2_UINT8  __LOG2D

static inline uint64_t
next_power_of_2(uint64_t i)
{
#if defined(__GNUC__)
    return 1UL <<(1 +(63 -__builtin_clzl(i -1)));
#else
    i =i -1;
    i =LOG2_UINT64(i);
    return 1UL <<(1 +i);
#endif
}

If you do not want to venture into the realm of undefined behaviour the input value must be between 1 and 2^63. The macro is also useful to set constant at compile time.

Genesa answered 27/11, 2013 at 7:21 Comment(3)
This is the probably the worst solution (it's also missing ULL suffix on the 64-bit constant). This will generate 32 tests per input in all cases. Better to use a while loop, it'll always be faster or at the same speed.Atelier
BUT... this can be evaluated by the preprocessor if the input is a constant, and thus ZERO operation at run time!Odellodella
@Odellodella For a version that correctly handles zero and one and generates 292 less assembly instructions in MSVC and half as many in GCC/Clang, see my answer: https://mcmap.net/q/20261/-rounding-up-to-next-power-of-2Solmization
C
4

For completeness here is a floating-point implementation in bog-standard C.

double next_power_of_two(double value) {
    int exp;
    if(frexp(value, &exp) == 0.5) {
        // Omit this case to round precise powers of two up to the *next* power
        return value;
    }
    return ldexp(1.0, exp);
}
Contumelious answered 29/12, 2013 at 0:24 Comment(2)
Random browsers, if you read this comment, choose this code.This is clearly the best answer, no special instructions, no bit twiddling, just efficient, portable and standard code. Guessing why no one else upvoted it ^^Danialah
Random browsers, this is going to be dead slow if you don't have specialized floating point hardware. On x86 you can run circles around this code using bit twiddling. rep bsr ecx,eax; mov eax,0; cmovnz eax,2; shl eax,cl is about 25x faster.Lenna
W
4

An efficient Microsoft (e.g., Visual Studio 2017) specific solution in C / C++ for integer input. Handles the case of the input exactly matching a power of two value by decrementing before checking the location of the most significant 1 bit.

inline unsigned int ExpandToPowerOf2(unsigned int Value)
{
    unsigned long Index;
    _BitScanReverse(&Index, Value - 1);
    return (1U << (Index + 1));
}

// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

#if defined(WIN64) // The _BitScanReverse64 intrinsic is only available for 64 bit builds because it depends on x64

inline unsigned long long ExpandToPowerOf2(unsigned long long Value)
{
    unsigned long Index;
    _BitScanReverse64(&Index, Value - 1);
    return (1ULL << (Index + 1));
}

#endif

This generates 5 or so inlined instructions for an Intel processor similar to the following:

dec eax
bsr rcx, rax
inc ecx
mov eax, 1
shl rax, cl

Apparently the Visual Studio C++ compiler isn't coded to optimize this for compile-time values, but it's not like there are a whole lot of instructions there.

Edit:

If you want an input value of 1 to yield 1 (2 to the zeroth power), a small modification to the above code still generates straight through instructions with no branch.

inline unsigned int ExpandToPowerOf2(unsigned int Value)
{
    unsigned long Index;
    _BitScanReverse(&Index, --Value);
    if (Value == 0)
        Index = (unsigned long) -1;
    return (1U << (Index + 1));
}

Generates just a few more instructions. The trick is that Index can be replaced by a test followed by a cmove instruction.

Whoa answered 31/7, 2018 at 20:17 Comment(4)
A small mistake: It should return 1 for 1, it does not though.Arrest
Thanks. In the application for which it was developed we explicitly needed 2 to the first power when 1 is input. 1 could be taken as a special case with a conditional without generating too many more instructions I imagine.Whoa
Updated the answer to include a version that returns 1 for an input value of 1.Whoa
For a version that correctly handles zero and compiles down to only 4 instructions instead of the 9 present in this post, see my answer below: https://mcmap.net/q/20261/-rounding-up-to-next-power-of-2Solmization
C
4

The C23 standard will add a new series of APIs named stdc_bit_ceil(). There are a type-generic version (stdc_bit_ceil() macro) and type-specific versions (stdc_bit_ceil_*()) defined in the header <stdbit.h>. In the near future this will be the standard way to round integers to a power of 2.

The latest draft of the C23 standard (N3096)

If you are writing in C++, C++20 has std::bit_ceil which does the exact same thing.

Both standard interfaces have one undefined behavior when the result cannot be represented in the output data type (i.e. it would overflow).


If your compiler and C library do not support C23 yet, here is an implementation that I consider it an "ultimate" solution. I intended the library writers to copy this code and adapt appropriately.

The following code

  • is targeted for C language (not C++),

  • uses compiler built-ins to yield efficient code (CLZ or BSR instruction) if compiler supports any,

  • is portable (standard C and no assembly) with the exception of built-ins, and

  • addresses the undefined behavior by producing the 0 result instead (not required by the standard, but it helps in applications handling untrusted inputs).

#include <limits.h>

#ifdef _MSC_VER
# if _MSC_VER >= 1400
/* _BitScanReverse is introduced in Visual C++ 2005 and requires
   <intrin.h> (also introduced in Visual C++ 2005). */
#include <intrin.h>
#pragma intrinsic(_BitScanReverse)
#pragma intrinsic(_BitScanReverse64)
#  define HAVE_BITSCANREVERSE 1
# endif
#endif

/* Macro indicating that the compiler supports __builtin_clz().
   The name HAVE_BUILTIN_CLZ seems to be the most common, but in some
   projects HAVE__BUILTIN_CLZ is used instead. */
#ifdef __has_builtin
# if __has_builtin(__builtin_clz)
#  define HAVE_BUILTIN_CLZ 1
# endif
#elif defined(__GNUC__)
# if (__GNUC__ > 3)
#  define HAVE_BUILTIN_CLZ 1
# elif defined(__GNUC_MINOR__)
#  if (__GNUC__ == 3 && __GNUC_MINOR__ >= 4)
#   define HAVE_BUILTIN_CLZ 1
#  endif
# endif
#endif


unsigned int stdc_bit_ceil_ui(unsigned int x);
unsigned long stdc_bit_ceil_ul(unsigned long x);
unsigned long long stdc_bit_ceil_ull(unsigned long long x);

/**
 * Returns the smallest power of two that is not smaller than x.
 */
unsigned int stdc_bit_ceil_ui(unsigned int x)
{
    if (x <= 1) {
        return 1;
    }
    x--;

#ifdef HAVE_BITSCANREVERSE
    if (x > (UINT_MAX >> 1)) {
        /* This is undefined behavior in C23, but we play it safe. */
        return 0;
    } else {
        unsigned long index;
        (void) _BitScanReverse(&index, x);
        return (1U << (index + 1));
    }
#elif defined(HAVE_BUILTIN_CLZ)
    if (x > (UINT_MAX >> 1)) {
        /* This is undefined behavior in C23. */
        return 0;
    }
    return (1U << (sizeof(x) * CHAR_BIT - __builtin_clz(x)));
#else
    /* Solution from "Bit Twiddling Hacks"
       <http://www.graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2>
       but converted to a loop for smaller code size.
       ("gcc -O3" will unroll this.) */
    {
        unsigned int shift;
        for (shift = 1; shift < sizeof(x) * CHAR_BIT; shift <<= 1) {
            x |= (x >> shift);
        }
    }
    return (x + 1);
#endif
}

/**
 * Returns the smallest power of two that is not smaller than x.
 */
unsigned long stdc_bit_ceil_ul(unsigned long x)
{
    if (x <= 1) {
        return 1;
    }
    x--;

#ifdef HAVE_BITSCANREVERSE
    if (x > (ULONG_MAX >> 1)) {
        return 0;
    } else {
        unsigned long index;
        (void) _BitScanReverse(&index, x);
        return (1UL << (index + 1));
    }
#elif defined(HAVE_BUILTIN_CLZ)
    if (x > (ULONG_MAX >> 1)) {
        return 0;
    }
    return (1UL << (sizeof(x) * CHAR_BIT - __builtin_clzl(x)));
#else
    {
        unsigned int shift;
        for (shift = 1; shift < sizeof(x) * CHAR_BIT; shift <<= 1) {
            x |= (x >> shift);
        }
    }
    return (x + 1);
#endif
}


/**
 * Returns the smallest power of two that is not smaller than x.
 */
unsigned long long stdc_bit_ceil_ull(unsigned long long x)
{
    if (x <= 1) {
        return 1;
    }
    x--;

#if (defined(HAVE_BITSCANREVERSE) && \
    ULLONG_MAX == 18446744073709551615ULL)
    if (x > (ULLONG_MAX >> 1)) {
        return 0;
    } else {
        /* assert(sizeof(__int64) == sizeof(long long)); */
        unsigned long index;
        (void) _BitScanReverse64(&index, x);
        return (1ULL << (index + 1));
    }
#elif defined(HAVE_BUILTIN_CLZ)
    if (x > (ULLONG_MAX >> 1)) {
        return 0;
    }
    return (1ULL << (sizeof(x) * CHAR_BIT - __builtin_clzll(x)));
#else
    {
        unsigned int shift;
        for (shift = 1; shift < sizeof(x) * CHAR_BIT; shift <<= 1) {
            x |= (x >> shift);
        }
    }
    return (x + 1);
#endif
}
Cockcroft answered 1/7, 2021 at 3:25 Comment(0)
D
2

Portable solution in C#:

int GetNextPowerOfTwo(int input) {
    return 1 << (int)Math.Ceiling(Math.Log2(input));
}

Math.Ceiling(Math.Log2(value)) calculates the exponent of the next power of two, the 1 << calculates the real value through bitshifting.

Faster solution if you have .NET Core 3 or above:

uint GetNextPowerOfTwoFaster(uint input) {
    return (uint)1 << (sizeof(uint) * 8 - System.Numerics.BitOperations.LeadingZeroCount(input - 1));
}

This uses System.Numerics.BitOperations.LeadingZeroCount() which uses a hardware instruction if available:

https://github.com/dotnet/corert/blob/master/src/System.Private.CoreLib/shared/System/Numerics/BitOperations.cs

Update:

RoundUpToPowerOf2() is Coming in .NET 6! The internal implementation is mostly the same as the .NET Core 3 solution above.

Here's the community update.

Delubrum answered 23/10, 2020 at 3:25 Comment(0)
R
1

Here is what I'm using to have this be a constant expression, if the input is a constant expression.

#define uptopow2_0(v) ((v) - 1)
#define uptopow2_1(v) (uptopow2_0(v) | uptopow2_0(v) >> 1)
#define uptopow2_2(v) (uptopow2_1(v) | uptopow2_1(v) >> 2)
#define uptopow2_3(v) (uptopow2_2(v) | uptopow2_2(v) >> 4)
#define uptopow2_4(v) (uptopow2_3(v) | uptopow2_3(v) >> 8)
#define uptopow2_5(v) (uptopow2_4(v) | uptopow2_4(v) >> 16)

#define uptopow2(v) (uptopow2_5(v) + 1)  /* this is the one programmer uses */

So for instance, an expression like:

uptopow2(sizeof (struct foo))

will nicely reduce to a constant.

Ruin answered 15/2, 2019 at 20:19 Comment(0)
L
1

You might find the following clarification to be helpful towards your purpose:

Laocoon answered 16/10, 2019 at 11:0 Comment(0)
E
1

constexpr version of clp2 for C++14

#include <iostream>
#include <type_traits>

// Closest least power of 2 minus 1. Returns 0 if n = 0.
template <typename UInt, std::enable_if_t<std::is_unsigned<UInt>::value,int> = 0>
  constexpr UInt clp2m1(UInt n, unsigned i = 1) noexcept
    { return i < sizeof(UInt) * 8 ? clp2m1(UInt(n | (n >> i)),i << 1) : n; }

/// Closest least power of 2 minus 1. Returns 0 if n <= 0.
template <typename Int, std::enable_if_t<std::is_integral<Int>::value && std::is_signed<Int>::value,int> = 0>
  constexpr auto clp2m1(Int n) noexcept
    { return clp2m1(std::make_unsigned_t<Int>(n <= 0 ? 0 : n)); }

/// Closest least power of 2. Returns 2^N: 2^(N-1) < n <= 2^N. Returns 0 if n <= 0.
template <typename Int, std::enable_if_t<std::is_integral<Int>::value,int> = 0>
  constexpr auto clp2(Int n) noexcept
    { return clp2m1(std::make_unsigned_t<Int>(n-1)) + 1; }

/// Next power of 2. Returns 2^N: 2^(N-1) <= n < 2^N. Returns 1 if n = 0. Returns 0 if n < 0.
template <typename Int, std::enable_if_t<std::is_integral<Int>::value,int> = 0>
  constexpr auto np2(Int n) noexcept
    { return clp2m1(std::make_unsigned_t<Int>(n)) + 1; }

template <typename T>
  void test(T v) { std::cout << clp2(v) << std::endl; }

int main()
{
    test(-5);                          // 0
    test(0);                           // 0
    test(8);                           // 8
    test(31);                          // 32
    test(33);                          // 64
    test(789);                         // 1024
    test(char(260));                   // 4
    test(unsigned(-1) - 1);            // 0
    test<long long>(unsigned(-1) - 1); // 4294967296

    return 0;
}
Excoriate answered 15/9, 2021 at 7:28 Comment(0)
R
1

Here's a snippet, using the builtin GCC or Clang count leading zeros. Also available usually with some inline assembly.

#include <stdint.h>

uint32_t nextPo2(uint32_t in) { // Return the next power of 2 of the given number.

 in = in > 2 ? in : 2; // We don't want 0 or 1 as input.

 int32_t nlz = __builtin_clz(in - 1);

 return nlz ? (1u << (32 - nlz)) : 0xffffffff;

}

Note that the return value of nextPo2(8) == 8, nextPo2(16) == 16, ...

It probably is faster than the bit twiddling hack because of the clz() command. This answer has been given already in other questions.

Returning 0xffffffff when the given integer is > UINT_MAX/2 + 1 is a design choice. An alternative would be to make the function return uint64_t and make it return 0x100000000LL but that all depends upon the context of usage.

Rectum answered 9/1 at 13:55 Comment(1)
This is non-portable and the generated assembly is poor quality, generating a jump in GCC and 12 assembly instructions on x86_64 in Clang. See my answer to do it in 4 instructions: https://mcmap.net/q/20261/-rounding-up-to-next-power-of-2Solmization
S
1

Fast, portable 4-instructions for GCC/Clang+MSVC

For the best performance on common compilers and platforms with fallbacks for everything else:

#include <stdint.h>
#include <limits.h>

#ifdef _MSC_VER
    unsigned char _BitScanReverse64(unsigned long*Index,unsigned __int64 Mask);
    #pragma intrinsic(_BitScanReverse64)
#endif

#if defined(INTPTR_MAX) && INTPTR_MAX > INT32_MAX
uint32_t nextPow2int32(uint64_t in) { // Returns next power of 2
#else
uint32_t nextPow2int32(uint32_t in) { // Returns next power of 2
#endif
    #if defined(__GNUC__) && defined(__x86_64__)
        if (in > UINT32_MAX) __builtin_unreachable();
        uint64_t out, tmp;
        __asm__("lea{q    -1(,%q[in],2), %q[tmp]|     %q[tmp], [%q[in]*2-1]}"
            "\n\tbsr{q    %q[tmp], %q[tmp]|     %q[tmp], %q[tmp]}"
            "\n\txor{l    %k[out], %k[out]|     %k[out], %k[out]}"
            "\n\tbtc{q    %q[tmp], %q[out]|     %q[out], %q[tmp]}"
            : [out]"=r"(out),[tmp]"=r"(tmp):[in]"r"(in));
        return (uint32_t)out;
    #elif defined(__GNUC__) && defined(__aarch64__)
        uint64_t x = (uint64_t)in*2 - 1;
        #ifdef __builtin_arm_rbitll
            x = __builtin_arm_rbitll(x);
        #else
            __asm__ ("rbit %0, %1" : "=r"(x) : "r"(x));
        #endif
        uint64_t y = x & -x;
        #ifdef __builtin_arm_rbitll
            y = __builtin_arm_rbitll(y);
        #else
            __asm__ ("rbit %0, %1" : "=r"(y) : "r"(y));
        #endif
        return (uint32_t) y; 
    #elif defined(__GNUC__) 
        if (in > UINT32_MAX) __builtin_unreachable();
        unsigned long long inT2M1=2*(unsigned long long)in-1, clz;
        clz = __builtin_clzll(inT2M1)^(sizeof(long long)*8-1);
        inT2M1 = inT2M1 == 0 ? inT2M1 : clz;
        return (uint32_t)((unsigned long long)1 << inT2M1);
    #elif defined(_MSC_VER) 
        unsigned __int64 inT2M1=2*(unsigned __int64)in-1;
        unsigned long res = (unsigned long)inT2M1;
        _BitScanReverse64(&res, inT2M1);
        return (uint32_t)((__int64)1 << res);
    #else // otherwise, use the slow fallback
        in -= 1;
        if (in >> 27) in |= in >> 27; // very unlikely, so free
        in|=(in>>9)|(in>>18), in|=(in>>3)|(in>>6);
        return (in | (in>>1) | (in>>2)) + 1;
    #endif
}

This yields the following ultra compact 4-instruction assembly on x86_64 GCC/Clang:

nextPow2int32:
        lea     rdx, [rdi*2-1]
        bsr     rdx, rdx
        xor     eax, eax
        btc     rax, rdx
        ret

For MSVC on x86_64:

nextPow2int32 PROC
        lea     rcx, QWORD PTR [rcx*2-1]
        mov     eax, 1
        bsr     rcx, rcx
        shl     rax, cl
        ret     0
nextPow2int32 ENDP

Test:

#include <stdio.h>
#include <inttypes.h>

int main() {
    printf("nextPowerOf2(0) = %" PRIu32 "\n", nextPowerOf2(0)); // 0
    printf("nextPowerOf2(1) = %" PRIu32 "\n", nextPowerOf2(1)); // 1
    printf("nextPowerOf2(2) = %" PRIu32 "\n", nextPowerOf2(2)); // 2
    printf("nextPowerOf2(3) = %" PRIu32 "\n", nextPowerOf2(3)); // 4
    printf("nextPowerOf2(4) = %" PRIu32 "\n", nextPowerOf2(4)); // 4
    printf("nextPowerOf2(5) = %" PRIu32 "\n", nextPowerOf2(5)); // 8
    printf("nextPowerOf2(6) = %" PRIu32 "\n", nextPowerOf2(6)); // 8
    printf("nextPowerOf2(7) = %" PRIu32 "\n", nextPowerOf2(7)); // 8
}
Solmization answered 15/2 at 21:7 Comment(0)
M
0

Many processor architectures support log base 2 or very similar operation – count leading zeros. Many compilers have intrinsics for it. See https://en.wikipedia.org/wiki/Find_first_set

Mauldon answered 4/10, 2013 at 21:52 Comment(1)
it's not about finding the highest set bit (=bsr) or counting leading zeros. he wants to round up to nearest power of 2. the answer with "subtract 1, then do bsr and shift 1 left" does that.Esmaria
A
0

Assuming you have a good compiler & it can do the bit twiddling before hand thats above me at this point, but anyway this works!!!

    // http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious
    #define SH1(v)  ((v-1) | ((v-1) >> 1))            // accidently came up w/ this...
    #define SH2(v)  ((v) | ((v) >> 2))
    #define SH4(v)  ((v) | ((v) >> 4))
    #define SH8(v)  ((v) | ((v) >> 8))
    #define SH16(v) ((v) | ((v) >> 16))
    #define OP(v) (SH16(SH8(SH4(SH2(SH1(v))))))         

    #define CB0(v)   ((v) - (((v) >> 1) & 0x55555555))
    #define CB1(v)   (((v) & 0x33333333) + (((v) >> 2) & 0x33333333))
    #define CB2(v)   ((((v) + ((v) >> 4) & 0xF0F0F0F) * 0x1010101) >> 24)
    #define CBSET(v) (CB2(CB1(CB0((v)))))
    #define FLOG2(v) (CBSET(OP(v)))

Test code below:

#include <iostream>

using namespace std;

// http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious
#define SH1(v)  ((v-1) | ((v-1) >> 1))  // accidently guess this...
#define SH2(v)  ((v) | ((v) >> 2))
#define SH4(v)  ((v) | ((v) >> 4))
#define SH8(v)  ((v) | ((v) >> 8))
#define SH16(v) ((v) | ((v) >> 16))
#define OP(v) (SH16(SH8(SH4(SH2(SH1(v))))))         

#define CB0(v)   ((v) - (((v) >> 1) & 0x55555555))
#define CB1(v)   (((v) & 0x33333333) + (((v) >> 2) & 0x33333333))
#define CB2(v)   ((((v) + ((v) >> 4) & 0xF0F0F0F) * 0x1010101) >> 24)
#define CBSET(v) (CB2(CB1(CB0((v)))))
#define FLOG2(v) (CBSET(OP(v))) 

#define SZ4         FLOG2(4)
#define SZ6         FLOG2(6)
#define SZ7         FLOG2(7)
#define SZ8         FLOG2(8) 
#define SZ9         FLOG2(9)
#define SZ16        FLOG2(16)
#define SZ17        FLOG2(17)
#define SZ127       FLOG2(127)
#define SZ1023      FLOG2(1023)
#define SZ1024      FLOG2(1024)
#define SZ2_17      FLOG2((1ul << 17))  // 
#define SZ_LOG2     FLOG2(SZ)

#define DBG_PRINT(x) do { std::printf("Line:%-4d" "  %10s = %-10d\n", __LINE__, #x, x); } while(0);

uint32_t arrTble[FLOG2(63)];

int main(){
    int8_t n;

    DBG_PRINT(SZ4);    
    DBG_PRINT(SZ6);    
    DBG_PRINT(SZ7);    
    DBG_PRINT(SZ8);    
    DBG_PRINT(SZ9); 
    DBG_PRINT(SZ16);
    DBG_PRINT(SZ17);
    DBG_PRINT(SZ127);
    DBG_PRINT(SZ1023);
    DBG_PRINT(SZ1024);
    DBG_PRINT(SZ2_17);

    return(0);
}

Outputs:

Line:39           SZ4 = 2
Line:40           SZ6 = 3
Line:41           SZ7 = 3
Line:42           SZ8 = 3
Line:43           SZ9 = 4
Line:44          SZ16 = 4
Line:45          SZ17 = 5
Line:46         SZ127 = 7
Line:47        SZ1023 = 10
Line:48        SZ1024 = 10
Line:49        SZ2_16 = 17
Ambitious answered 19/8, 2016 at 2:10 Comment(0)
S
0

I'm trying to get nearest lower power of 2 and made this function. May it help you.Just multiplied nearest lower number times 2 to get nearest upper power of 2

int nearest_upper_power(int number){
    int temp=number;
    while((number&(number-1))!=0){
        temp<<=1;
        number&=temp;
    }
    //Here number is closest lower power 
    number*=2;
    return number;
}
Spinney answered 16/1, 2017 at 22:23 Comment(0)
S
0

Adapted Paul Dixon's answer to Excel, this works perfectly.

 =POWER(2,CEILING.MATH(LOG(A1)/LOG(2)))
Staggers answered 25/9, 2018 at 8:1 Comment(0)
T
0

A variant of @YannDroneaud answer valid for x==1, only for x86 plateforms, compilers, gcc or clang:

__attribute__ ((const))
static inline uint32_t p2(uint32_t x)
{
#if 0
    assert(x > 0);
    assert(x <= ((UINT32_MAX/2) + 1));
#endif
  int clz;
  uint32_t xm1 = x-1;
  asm(
    "lzcnt %1,%0"
    :"=r" (clz)
    :"rm" (xm1)
    :"cc"
    );
    return 1 << (32 - clz);
}
Toffic answered 9/10, 2018 at 16:39 Comment(0)
M
-1

If you need it for OpenGL related stuff:

/* Compute the nearest power of 2 number that is 
 * less than or equal to the value passed in. 
 */
static GLuint 
nearestPower( GLuint value )
{
    int i = 1;

    if (value == 0) return -1;      /* Error! */
    for (;;) {
         if (value == 1) return i;
         else if (value == 3) return i*4;
         value >>= 1; i *= 2;
    }
}
Molly answered 21/1, 2009 at 17:40 Comment(2)
florin: it is. and it is used as a loop here, isn't it?Filigree
DrJokepu - I think florin meant to say here that the OP asked for a loop-less solutionTotemism
P
-1

The g++ compiler provides a builtin function __builtin_clz that counts leading zeros:

So we could do:

int nextPowerOfTwo(unsigned int x) {
  return 1 << sizeof(x)*8 - __builtin_clz(x);
}

int main () {
  std::cout << nextPowerOfTwo(7)  << std::endl;
  std::cout << nextPowerOfTwo(31) << std::endl;
  std::cout << nextPowerOfTwo(33) << std::endl;
  std::cout << nextPowerOfTwo(8)  << std::endl;
  std::cout << nextPowerOfTwo(91) << std::endl;
  
  return 0;
}

Results:

8
32
64
16
128

But note that, for x == 0, __builtin_clz return is undefined.

Profligate answered 26/4, 2021 at 2:49 Comment(2)
There's another undefined behavior in your code. When x > (UINT_MAX / 2), __builtin_clz would return 0 causing the left shift (<<) operator to overflow. Also, you should use 1U, not 1, as the left operand of <<, otherwise your defined domain will be smaller ((INT_MAX / 2) instead of (UINT_MAX / 2)).Cockcroft
Returning "16" for the input "8" is not really the right answer.Integrant
E
-1
from math import ceil, log2
pot_ceil = lambda N: 0x1 << ceil(log2(N))

Test:

for i in range(10):
    print(i, pot_ceil(i))

Output:

1 1
2 2
3 4
4 4
5 8
6 8
7 8
8 8
9 16
10 16
Egmont answered 22/10, 2021 at 10:0 Comment(1)
This question was tagged C, not Python.Ticking
J
-3

If you want an one-line-template. Here it is

int nxt_po2(int n) { return 1 + (n|=(n|=(n|=(n|=(n|=(n-=1)>>1)>>2)>>4)>>8)>>16); }

or

int nxt_po2(int n) { return 1 + (n|=(n|=(n|=(n|=(n|=(n-=1)>>(1<<0))>>(1<<1))>>(1<<2))>>(1<<3))>>(1<<4)); }
Jeana answered 1/5, 2020 at 8:56 Comment(2)
This is undefined behaviour in C or C++ and will lead to errors. Modifying n multiple times without a sequence point is invalid. You wrote it as if n-=1 should happen first but the only guarantee here is that n contains its new value after the ; and the parentheses do not change that.Angelaangele
@samhocevar: good catch, but I want to point out that starting with C++11 there is no longer a concept of sequence points, instead operations are defined as being sequenced before / after other operations. I don't know whether it affects this UB - probably not, it's probably still UB.Abwatt

© 2022 - 2024 — McMap. All rights reserved.