Why does left shift operation invoke Undefined Behaviour when the left side operand has negative value?
Asked Answered
C

8

54

In C bitwise left shift operation invokes Undefined Behaviour when the left side operand has negative value.

Relevant quote from ISO C99 (6.5.7/4)

The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

But in C++ the behaviour is well defined.

ISO C++-03 (5.8/2)

The value of E1 << E2 is E1 (interpreted as a bit pattern) left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 multiplied by the quantity 2 raised to the power E2, reduced modulo ULONG_MAX+1 if E1 has type unsigned long, UINT_MAX+1 otherwise. [Note: the constants ULONG_MAXand UINT_MAXare defined in the header ). ]

That means

int a = -1, b=2, c;
c= a << b ;

invokes Undefined Behaviour in C but the behaviour is well defined in C++.

What forced the ISO C++ committee to consider that behaviour well defined as opposed to the behaviour in C?

On the other hand the behaviour is implementation defined for bitwise right shift operation when the left operand is negative, right?

My question is why does left shift operation invoke Undefined Behaviour in C and why does right shift operator invoke just Implementation defined behaviour?

P.S : Please don't give answers like "It is undefined behaviour because the Standard says so". :P

Craner answered 24/9, 2010 at 7:26 Comment(7)
C and C++ are different languages standardized by different committees. I don't see anything astonishing about this.Nils
Further, C++ was based on C89/C90. The C committee then moved in different directions for C99. C99 and C++ are both based on the original C standard, but the divergences were not at all coordinated.Fsh
Your C++ citation only defines the behavior when the type is unsigned. Did you forget to copy the paragraph about signed values?Carvelbuilt
@R.. the text defines behavior for signed by its first sentence. It then further details on behavior of unsigned by its other sentences.Knucklebone
I see that earlier standards had that difference but C99 and C++11 standards are inline with both left and right shifts for both signed and unsigned integral types mandating the same behaviour.Pupiparous
It IS undefined behavior, because the Standard says so in 5p5.Costate
Isn't https://mcmap.net/q/23255/-why-does-left-shift-operation-invoke-undefined-behaviour-when-the-left-side-operand-has-negative-value the correct answer as of c++20?Lubra
O
44

The paragraph you copied is talking about unsigned types. The behavior is undefined in C++. From the last C++0x draft:

The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

EDIT: got a look at C++98 paper. It just doesn't mention signed types at all. So it's still undefined behavior.

Right-shift negative is implementation defined, right. Why? In my opinion: It's easy to implementation-define because there is no truncation from the left issues. When you shift left you must say not only what's shifted from the right but also what happens with the rest of the bits e.g. with two's complement representation, which is another story.

Oringas answered 24/9, 2010 at 7:46 Comment(4)
This paragraph is neither present in C++03 nor in C++98.Craner
@Prasoon Saurav: That paragraph is part of the current C++0x final draft, and it shows that the C++ Standard Committee considered that to be a flaw in the current standard and fixed it by actually stating that it is undefined --instead of implicitly not defining the result.Trimetrogon
@David "EDIT: got a look at C++98 paper. It just doesn't mention signed types at all. So it's still undefined behavior." I don't agree with that interpretation. "The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled." is a clear statement, and does not exclude signed types. I think they just overlooked the case of signed negative operands.Knucklebone
@JohannesSchaub-litb: It's explicitly undefined on account of 5p5: "If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined". You're correct that the first part applies to all types, the second part forces operations on unsigned types to generate a representable value, because otherwise unsigned left shift would also cause undefined behavior if any bits overflowed.Costate
R
23

In C bitwise left shift operation invokes Undefined Behaviour when the left side operand has negative value. [...] But in C++ the behaviour is well defined. [...] why [...]

The easy answer is: Becuase the standards say so.

A longer answer is: It has probably something to do with the fact that C and C++ both allow other representations for negative numbers besides 2's complement. Giving fewer guarantees on what's going to happen makes it possible to use the languages on other hardware including obscure and/or old machines.

For some reason, the C++ standardization committee felt like adding a little guarantee about how the bit representation changes. But since negative numbers still may be represented via 1's complement or sign+magnitude the resulting value possibilities still vary.

Assuming 16 bit ints, we'll have

 -1 = 1111111111111111  // 2's complement
 -1 = 1111111111111110  // 1's complement
 -1 = 1000000000000001  // sign+magnitude

Shifted to the left by 3, we'll get

 -8 = 1111111111111000  // 2's complement
-15 = 1111111111110000  // 1's complement
  8 = 0000000000001000  // sign+magnitude

What forced the ISO C++ committee to consider that behaviour well defined as opposed to the behaviour in C?

I guess they made this guarantee so that you can use << appropriately when you know what you're doing (ie when you're sure your machine uses 2's complement).

On the other hand the behaviour is implementation defined for bitwise right shift operation when the left operand is negative, right?

I'd have to check the standard. But you may be right. A right shift without sign extension on a 2's complement machine isn't particularly useful. So, the current state is definitely better than requiring vacated bits to be zero-filled because it leaves room for machines that do a sign extensions -- even though it is not guaranteed.

Rehearing answered 24/9, 2010 at 18:2 Comment(2)
One of the goals when writing the Standard was to assure, as much as possible, that if any implementation did something useful in a certain situation, a conforming implementation should be allowed to behave likewise. Cases where an implementation might usefully trap in a fashion outside the Standard's jurisdiction are labeled as invoking Undefined Behavior. The authors of the C Standard could imagine that some implementations might trap when left-shifting at least some negative values, and someone might find that useful, so the behavior was left Undefined.Roundtree
Some existing implementations zero-filled on right shifts, while others sign-extended, and because some code written for the former implementations might have been relying upon the behavior it was left as Implementation-Defined. I think the C++ committee fixed the left-shift behavior when they realized that while it might have been plausible that some platforms might trap when left-shifting negative values, none actually did so and there was nothing to be gained by allowing future implementations to start doing so.Roundtree
E
7

To answer your real question as stated in the title: as for any operation on a signed type, this has undefined behavior if the result of the mathematical operation doesn't fit in the target type (under- or overflow). Signed integer types are designed like that.

For the left shift operation if the value is positive or 0, the definition of the operator as a multiplication with a power of 2 makes sense, so everything is ok, unless the result overflows, nothing surprising.

If the value is negative, you could have the same interpretation of multiplication with a power of 2, but if you just think in terms of bit shift, this would be perhaps surprising. Obviously the standards committee wanted to avoid such ambiguity.

My conclusion:

  • if you want to do real bit pattern operations use unsigned types
  • if you want to multiply a value (signed or not) by a power of two, do just that, something like

    i * (1u << k)

your compiler will transform this into decent assembler in any case.

Ermaermanno answered 24/9, 2010 at 8:18 Comment(1)
Setting the sign bit of a two's complement number is equivalent to setting an infinite number of bits to the left. The values that are representable in a 32-bit number are those where all bits to the left of the 31st have the same value. There is nothing unusual or anomalous about shifting negative two's-complement values except in cases where there would be values past the sign bit whose state doesn't match the sign bit.Roundtree
T
3

A lot of these kind of things are a balance between what common CPUs can actually support in a single instruction and what's useful enough to expect compiler-writers to guarantee even if it takes extra instructions. Generally, a programmer using bit-shifting operators expects them to map to single instructions on CPUs with such instructions, so that's why there's undefined or implementation behaviour where CPUs had various handling of "edge" conditions, rather than mandating a behaviour and having the operation be unexpectedly slow. Keep in mind that the additional pre/post or handling instructions may be made even for the simpler use cases. undefined behaviour may have been necessary where some CPUs generated traps/exceptions/interrupts (as distinct from C++ try/catch type exceptions) or generally useless/inexplicable results, while if the set of CPUs considered by the Standards Committee at the time all provided at least some defined behaviour, then they could make the behaviour implementation defined.

Thump answered 24/9, 2010 at 7:35 Comment(4)
From what I've been told, on some CPU's, the shift-left-N instruction will perform N shifts. If N is a long which holds -1, that would require roughly four billion cycles to complete. Having an instruction which would normally take a few microseconds instead lock up the CPU for many minutes is enough of an odd side-effect that it would be fair to regard that as "undefined behavior", rather than merely saying the value is "implementation-defined", especially since having one instruction execute for that long could cause something like a watchdog to reset the CPU.Roundtree
Well, thanks to the as-if rule, the compiler only has to add in as many shift instructions on such an architecture as there are bits in the number. So for a 64-bit number, it could implement it as 64 shifts at most (or either set to 0 or shift up to 63, depending on how the compiler chooses to implement it).Gann
Unfortunately, since the time you wrote the above, things have changed. Even when running on a processor with a left-shift instruction that would behave exactly as one would expect in two's-complement arithmetic, hyper-modern compiler philosophy would suggest that's no reason to make the behavior of such a left-shift obey the laws of time and causality. Modern philsoophy dictates that, given if (x >= 0) launch_missiles(); x<<=1; a compiler should recognize that it is allowed to do anything it likes if x is negative, and therefore it can launch missiles unconditionally.Roundtree
Personally, I find such hyper-modern thought distressing; the scenario of a jump table which only handles up to 63 and falls off for anything beyond that would be a plausible excuse, but masking would at worst add one instruction to what would even in best-case be a 4-5 instruction sequence.Roundtree
R
1

My question is why does left shift operation invoke Undefined Behaviour in C and why does right shift operator invoke just Implementation defined behaviour?

The folks at LLVM speculate the shift operator has constraints because of the way the instruction is implemented on various platforms. From What Every C Programmer Should Know About Undefined Behavior #1/3:

... My guess is that this originated because the underlying shift operations on various CPUs do different things with this: for example, X86 truncates 32-bit shift amount to 5 bits (so a shift by 32-bits is the same as a shift by 0-bits), but PowerPC truncates 32-bit shift amounts to 6 bits (so a shift by 32 produces zero). Because of these hardware differences, the behavior is completely undefined by C...

Nate that the discussion was about shifting an amount greater than the register size. But its the closest I've found to explaining the shift constraints from an authority.

I think a second reason is the potential sign change on a 2's compliment machine. But I've never read it anywhere (no offense to @sellibitze (and I happen to agree with him)).

Rhythmics answered 5/4, 2014 at 17:3 Comment(1)
You appear to be discussing signedness of the right operand; the question is only looking at the left one.Costate
R
1

In C89, the behavior of left-shifting negative values was unambiguously defined on two's-complement platforms which did not use padding bits on signed and unsigned integer types. The value bits that signed and unsigned types had in common to be in the same places, and the only place the sign bit for a signed type could go was in the same place as the upper value bit for unsigned types, which in turn had to be to the left of everything else.

The C89 mandated behaviors were useful and sensible for two's-complement platforms without padding, at least in cases where treating them as multiplication would not cause overflow. The behavior may not have been optimal on other platforms, or on implementations that seek to reliably trap signed integer overflow. The authors of C99 probably wanted to allow implementations flexibility in cases where the C89 mandated behavior would have been less than ideal, but nothing in the rationale suggests an intention that quality implementations shouldn't continue to behave in the old fashion in cases where there was no compelling reason to do otherwise.

Unfortunately, even though there have never been any implementations of C99 that don't use two's-complement math, the authors of C11 declined to define the common-case (non-overflow) behavior; IIRC, the claim was that doing so would impede "optimization". Having the left-shift operator invoke Undefined Behavior when the left-hand operand is negative allows compilers to assume that the shift will only be reachable when the left-hand operand is non-negative.

I'm dubious as to how often such optimizations are genuinely useful, but the rarity of such usefulness actually weighs in favor of leaving the behavior undefined. If the only situations where two's-complement implementations wouldn't behave in commonplace fashion are those where the optimization would actually be useful, and if no such situations actually exist, then implementations would behave in commonplace fashion with or without a mandate, and there's no need to mandate the behavior.

Roundtree answered 17/4, 2015 at 23:43 Comment(6)
But it's the return x<<4; line that would trigger UB and the compiler can hardly change the well-defined semantics of code preceding that line. I tested with both -O2 and -O3 and at least gcc doesn't perform that optimization you are suggesting that it could do.Insectile
@BjörnLindqvist: The present main-line version of gcc doesn't perform dead code elimination as aggressively as the Standard allows, but language was added to the Standard which expressly states that if an execution of the code with given input would result in Undefined Behavior, then the Standard imposes no requirements on any behavior of the program, even before the UB would have taken place. Personally I think the Standard would be much better if most of the things which are presently UB were constrained sufficiently that it would be possible for a program whose requirements are...Roundtree
"(1) When given valid input, produce valid output; (2) Respect laws of time and causality even when given invalid input", to meet such requirements even when things like arithmetic overflow occur, but the Standard imposes no such requirement.Roundtree
@BjörnLindqvist: The compiler is allowed to start doing whatever it wants as soon as it can determine that UB is inevitable with the input that it's going to receive. Allowing UB a certain degree of exemption from the laws of time is reasonable, from the standpoint that it would allow a compiler to do things like hoist loop-invariant expressions without having to first validate that they won't trigger overflows. The language of the Standard, however, does not limit the extent to which compilers can "exploit" Undefined Behavior, and some authors seek to maximize such opportunities.Roundtree
@BjörnLindqvist: From what I read, it appears that left-shift of a negative number was originally left Undefined Behavior to allow for the possibility that there might exist a machine somewhere where it would trigger a trap; given the lack of evidence of such machines' ever having existed the Committee considered changing the spec to have it merely yield an unspecified value, but compiler authors argued against the change saying it would "impede optimization". My answer was perhaps overly snarky, but given that compiler researchers are working to find ways of pruning code which would...Roundtree
...only be relevant in cases where code invokes Undefined Behavior (a concept which might be useful for some forms of UB, but is decidedly unhelpful for others) my interpretation of their desire to keep a left-shift of a negative value Undefined is that they want to let the compiler assume that no value which is going to be left-shifted will ever be negative and omit any code which would only be relevant when a value which will be left-shifted, is negative.Roundtree
C
0

The behavior in C++03 is the same as in C++11 and C99, you just need to look beyond the rule for left-shift.

Section 5p5 of the Standard says that:

If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined

The left-shift expressions which are specifically called out in C99 and C++11 as being undefined behavior, are the same ones that evaluate to a result outside the range of representable values.

In fact, the sentence about unsigned types using modular arithmetic is there specifically to avoid generating values outside the representable range, which would automatically be undefined behavior.

Costate answered 19/7, 2014 at 20:6 Comment(2)
In two's-complement notion, the bitwise representation of -1 is ...111[111].000... with computers often just storing the middle portion, duplicating the MSB to the left, and padding the right with zeroes; shifting that left one bit should give ...111[110].000... i.e. -2. In one's-complement notation, -1 is ...111[110].111... with computers storing the middle portion and duplicating the leftmost bit on both sides. Shifting that left one should give ...[101].111..., i.e. -2, though some implementations might shift in a zero rather than duplicating the sign bit.Roundtree
In any case, the result of either operation should certainly be within the range of the indicated types. Only for sign-magnitude system should there be any real problem.Roundtree
M
-2

The result of shifting depends upon the numeric representation. Shifting behaves like multiplication only when numbers are represented as two's complement. But the problem is not exclusive to negative numbers. Consider a 4-bit signed number represented in excess-8 (aka offset binary). The number 1 is represented as 1+8 or 1001 If we left shift this as bits, we get 0010 which is the representation for -6. Similarly, -1 is represented as -1+8 0111 which becomes 1110 when left-shifted, the representation for +6. The bitwise behavior is well-defined, but the numeric behavior is highly dependent on the system of representation.

Mazer answered 29/3, 2016 at 21:9 Comment(3)
I see that I have received a couple of negative ratings for this post. I expect that this is due to a conflict with the statement in the spec "The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2E^2 ....". The excess-N representation does not have this property. This implies that C/C++ cannot be implemented to spec on such a machine.Mazer
Under C89 your statement would have been correct. C99, however, simultaneously added an explicit statement saying that left-shifting a negative value yields Undefined Behavior, while effectively forbidding anything other than two's-complement implementations on machines with a word size of 64 bits or fewer (so far as I can tell, the number of non-contrived non-two's-complement C99 implementations stands at zero).Roundtree
@Prasoon Saurav Isn't this the correct answer as per C++20 standard?Lubra

© 2022 - 2024 — McMap. All rights reserved.