Is bit shifting O(1) or O(n)?

Asked 31/1, 2012 at 17:5 Answered 29/2, 2024 at 13:35

Solved language-agnostic big-o cpu hardware bit-shift

Are shift operations O(1) or O(n) ?

Does it make sense that computers generally require more operations to shift 31 places instead of shifting 1 place?

Or does it make sense the number of operations required for shifting is constant regardless of how many places we need to shift?

PS: wondering if hardware is an appropriate tag..

Anacreontic answered 31/1, 2012 at 17:5 Comment(10)

Do they require more operations to shift left 31? – Downwards 31/1, 2012 at 17:8

My CPU knowledge is pretty old, but every shift instruction I've seen shifts by one bit, so you need to run a loop to shift more than once. I suppose it's possible that modern CPU's have shift instructions that shift by a specified number of bits in one clock cycle. – Chretien 31/1, 2012 at 17:8

On my machine int test(int i) { return i << 30; } becomes sall $30, %eax – Downwards 31/1, 2012 at 17:12

when you go down to the assembly its like shl or shr on x86 – Mancuso 31/1, 2012 at 17:12

x86 asm's shl and shr allow arbitrary shift counts, but the actual required cpu cycles depends on what you're doing and the CPU itself. For 286's, it's O(n), for 386+, it's O(1). – Worsham 31/1, 2012 at 17:14

@Robert Harvey shr/shl instruction on x86 can take number of bits to shift since at least 8086, that's since 1978. You definitely need to refresh your CPU knowledge ;) – Wivina 31/1, 2012 at 17:15

@awoodland Is that java or C# ? – Anacreontic 31/1, 2012 at 17:16

@MarcB zsmith.co/intel/intel_s.html#sal has nice details on sall – Downwards 31/1, 2012 at 17:18

@unbeli: Yes, but that doesn't necessarily mean those operations are O(1). – Chretien 31/1, 2012 at 17:26

Cortex-M4 Manual states that it takes 1 cycle. developer.arm.com/documentation/ddi0439/b/Programmers-Model/… – Banda 10/2, 2021 at 19:7

Some instruction sets are limited to one bit shift per instruction. And some instruction sets allow you to specify any number of bits to shift in one instruction, which usually takes one clock cycle on modern processors (modern being an intentionally vague word). See dan04's answer about a barrel shifter, a circuit that shifts more than one bit in one operation.

It all boils down to the logic algorithm. Each bit in the result is a logic function based on the input. For a single right shift, the algorithm would be something like:

If the instruction is [shift right] and bit 1 of the input is 1, then bit 0 of the result is 1, else bit 0 is 0.
If the instruction is [shift right], then bit 1 = bit 2.
etc.

But the logic equation could just as easily be:

If the instruction is [shift right] and the amount operand is 1, then result bit 0 = shifted input bit 1.
if the amount is 2 then bit 0 = bit 2.
and so on.

The logic gates, being asynchronous, can do all of this in one clock cycle. Yet it is true the single shift allows for a faster clock cycle and less gates to settle, if all you are comparing is these two flavors of an instruction. Or the alternative is making it take longer to settle, so the instruction takes 2 or 3 clocks or whatever, and the logic counts to 3 then latches the result.

The MSP430, for example, only has single bit rotate right instructions (because you can perform a single bit shift or a rotate left with another instruction, which I will leave to the reader to figure out).

The ARM instruction set allows both immediate and register based multi-bit rotates, arithmetic shifts and logical shifts. I think there is only one actual rotate instruction and the other is an alias, because rotate left 1 is the same as a rotate right 32, you only need an one direction barrel shifter to implement a multi bit rotate.

SHL in the x86 allows more than one bit per instruction, but it used to take more than one clock.

and so on, you can easily examine any of the instruction sets out there.

The answer to your question is that it is not fixed. Sometimes it is one operation, one cycle, one instruction. Sometimes it is one instruction multiple clock cycles. Sometimes it is multiple instructions, multiple clock cycles.

The compilers often optimize for these sorts of things. Say you have a 16 bit register instruction set with a swap byte instruction and an AND instruction with immediate, but only a single bit shift. You may think shifting 8 bits would require 8 shift instruction cycles, but you could just swap bytes (one instruction) and then AND the lower half to zeros (which might take two instructions, or might be a variable word length instruction of two words, or it might encode into a single instruction) so it only takes 2 or 3 instruction/clock cycles instead of 8. For a shift of 9 bits, you can do the same thing and add a shift, making it 9 clocks vs 3 or 4. Also, on some architectures, it is faster to multiply by 256 than to shift by 8, etc, etc. Each instruction set has its own limitations and tricks.

It is not even the case that either most instruction sets provide multi bit or most limit to single bit. The processors that fall into the "computer" category, like X86, ARM, PowerPC, and MIPS, would lean toward one operation to shift. Expand to all processors but not necessarily "computers" commonly used today, and it shifts the other way, I would say more of them are single bit than multi bit, so multiple operations are needed to perform a multi-bit shift.

Registry answered 31/1, 2012 at 18:6 Comment(0)

A barrel shifter allows the shift to be performed in O(log n) passes — which may be done in the same clock cycle, making shifting an O(1) operation.

Milson answered 31/1, 2012 at 17:14 Comment(0)

As already noted, a barrel shifter can shift an operand an arbitrary distance in constant time. A barrel shifter, however, consumes a fair amount of space on a CPU die, so they're not included in all CPU designs.

Just for one fairly well known example, the Intel Pentium III included a barrel shifter -- but the Pentium IV did not. Code written for a Pentium III assuming a barrel shifter was present sometimes slowed down quite a bit on a Pentium IV. I had some encryption code (which includes lots of shifting and rotating) that ran about 4 times faster on a 1.2 GHz Pentium III than it did on a 2.8 GHz Pentium IV.

Buatti answered 31/1, 2012 at 20:12 Comment(0)

Bit shifting is O(1) on practically every current processor.

Take a look, for example, at the x86 "shrw" instruction. The first operand (in AT&T syntax) is the number of bits to shift. How a compiler implements shifting is dependent on the compiler, but it would be silly to put shifts in a loop when a processor can shift N bits in one go.

Addendum: Re: "Do they require more operations to shift left 31?" There are different kinds of shifts (if you're wondering why, consider what to do with the bits that are shifted off the register), but most processors can do a single-instruction shift of as many bits as the GPR can store. To do a 40-bit shift on a 32-bit register would require shifting across multiple registers (this is assuming a 64-bit number is stored across 2 32-bit registers), which on every processor I know of will require more instructions. It would still be O(1), just probably not 1 clock. As an interesting side-note, the Pentium IV processor is amazingly slow at bit shifts. This is ironic because Intel has historically recommended optimization of ^2 divides and multiplies by way of bit shifting. See: this PDF and Google for more info if interested.

Billibilliard answered 31/1, 2012 at 17:12 Comment(5)

only on 386+. when using arbitrary shift counts on 286, it's O(N) as the clock cycles required is 5+n. – Worsham 31/1, 2012 at 17:15

@MarcB: Because, of course, I still use a 286, and therefore need to know this. :P – Chretien 31/1, 2012 at 17:16

Do you mean that 90% of processors these days are like the x86 in that they are able to shift N bits in one go? – Anacreontic 31/1, 2012 at 17:20

@Pacerier: Probably more like 99% or 99.9%. – Chretien 31/1, 2012 at 17:24

@RobertHarvey: Essentially everything that targets something you'd think of as a "computer" will do constant-time shifts/rotates. But virtually no 8-bit microcontroller does, and they sell lots of units (my computer has one CPU that can do constant-time shifts--and around a dozen microcontrollers that can't). – Buatti 30/10, 2023 at 8:14

Ahem, tested that out of curiosity in c# and got funny results.

var sw = Stopwatch.StartNew();
long l = 1;
for (long i = 0; i < 20000000; i++) {
    l = l << 60; l = l >> 60;
    l = l << 60; l = l >> 60;
    l = l << 60; l = l >> 60;
    //...
    // 50 of ^them^ total

}
Console.WriteLine(l + " " + sw.Elapsed);

That takes 1.2 secs on my PC. But if I replace

l = l << 60; l = l >> 60;

with

l = l << 1; l = l >> 1;

then the time increases to 2.0 secs. Have no idea what kind of optimizations are in play here, but it looks weird.

Raid answered 31/1, 2012 at 17:24 Comment(4)

Just because your compiler doesn't do smart things it doesn't mean the instructions aren't there in hardware. – Downwards 31/1, 2012 at 17:27

It actually does smart things, as I get the same time for all shifts that are between 1 and 31 - 2.0 seconds. Then I get 0.6 seconds if I shift by exactly 32 bits, but then all of a sudden I get only 1.2 seconds if the number of shifts is larger than 32. That's just weird. – Raid 31/1, 2012 at 17:32

You might be better off trying this in C instead of a managed language. The jitter is probably doing some things under the hood that you're not seeing. – Chretien 31/1, 2012 at 17:36

This may depend on whether your CPU is 32-bit 64-bit. On 32-bit x86, 64-bit numbers require two words to represent, so a 32-bit shift is just a register move. A shift of more than 32 is throwing away one word and then shifting what is left. The .NET JIT is surely performing such optimizations. – Spaak 31/1, 2012 at 20:21

For normal hardware, fixed size registers it's constant regardless of how many places you shift.

Also note, that the usage of the O notation is quite weird here, you would normally use it to denote the algorithmic complexity based on the number to shift not the number of places to shift..

Shirk answered 31/1, 2012 at 17:13 Comment(3)

yes I'd thought O notation is weird here, but otherwise the title would be very long.. – Anacreontic 31/1, 2012 at 17:17

1) By normal, you mean modern? 2) I don't find the O notation strange here at all. – Chretien 31/1, 2012 at 17:19

@yi_H does that include stuff like MIPS and ARM? – Downwards 31/1, 2012 at 17:24

As a concrete example, according to Table C-17. General Purpose Instructions of the Intel® 64 and IA-32 Architectures Optimization Reference Manual:

SAL/SAR/SHL/SHR reg, imm   1 cycle latency
SAL/SAR/SHL/SHR reg, cl    1.5 cycles latency

So that's still a constant factor and O(1.5) = O(1). There may be simpler microarchitectures as outliers but in general, O(1).

Culmination answered 26/5, 2019 at 18:34 Comment(0)

Any implementation, even if it requires n operation, will be considered O(1).

You have the number of bits capped, so there is a maximum number of bits that can be shifted.

Even if it takes more operations for bit shifts, one operation will require X time. The maximum will be MAX_BITS*X.

So, the maximum time of operation is constant.

O notation should explain how time grows with the size of the task.

Example: You perform random bit shifts in a loop.

If you run it 1000 or 100000 times, the time you will need will be strictly O(n), where n is the number of iterations. If the bit shift was considered O(n), the time loop should have taken was O(n^n), which is not

Heterozygote answered 29/2, 2024 at 13:35 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags