Purpose of integer literal suffix in left shift
Asked Answered
E

3

9

In C, many operations employ bit shifting, in which an integer literal is often used. For example, consider the following code snippet:

#define test_bit(n, flag) (1UL << (n) & (flag))

As far as I know, the integer literal suffix UL is supposed to suppress unwanted behavior in a shift, e.g. sign-extending a signed integer may result in multiple bits being set. However, if the case is doing a left shift only, as shown above, do we still need the integer literal suffix?

As a left shift won't cause unintended behavior, I can't figure what its purpose is. Code like the above often appears in projects such as Linux kernel, which makes me think that there must be a need for it. Does anyone know the purpose of the UL suffix in this case?

Eccentricity answered 15/1 at 16:46 Comment(5)
I'm not sure what you mean by "integer literal", that's not a defined term in the C language. Are you asking about why we use 1UL instead of simply 1? The UL stands for unsigned long and so 1UL is an integer expression of type unsigned long, whereas 1 would just be int.Keratoplasty
I suppose it is a way to specify how many bits the number has, in case (n) is larger than 32.Pammie
"As a left shift won't cause un-intended behavior". C18 says If E1 has a signed type and nonnegative value, and E1 × 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.Search
I believe "suffix" is the term you meant to use.Matthaus
@NateEldredge: Re “… that's not a defined term in the C language”: No, but it should be. “Literal” is used in programming to mean something whose value is conveyed in its letters (characters). "abc" and 123 are literals because you know what their values are just from looking at them. That is not true of an enum member foo, which is also an integer constant but is not a literal. The C standard would have been to use the term “literal” for all the literal constants, instead of just string literals, and that is how OP is using it.Akira
L
9

Sign extending only applies to right shifts, so that's not applicable.


<< is defined as follows:

C23 §6.5.7 ¶4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, wrapped around. If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

There are two ways in which left-shifting values can result in undefined behaviour based on E1:[1]

  • E1 has a signed type and negative value.
  • E1 has a signed type and nonnegative value, and E1 × 2E2 is unrepresentable.

In our case, E1 is a positive value, so the former isn't applicable. However, the latter could apply depending on the type of E1.

Let's look at what results we get for different types on two systems.

  • System "L" has a 32-bit int and a 64-bit long (e.g. Linux on x86-64).
  • System "W" has a 32-bit int and a 32-bit long (e.g. Windows on x86-64).
Implementation Usage Result on "L" Result on "W"
1 << (n) test_bit( 31, flag ) Undefined behaviour Undefined behaviour
1L << (n) test_bit( 31, flag ) ok (since long is 64 bits) Undefined behaviour
1U << (n) test_bit( 31, flag ) ok ok
1U << (n) test_bit( 63, flag ) Incorrect result
1L << (n) test_bit( 63, flag ) Undefined behaviour
1UL << (n) test_bit( 63, flag ) ok

So, assuming you want to be able to test any of the bits of flag

  • 1U is needed if flag can be a signed int or an unsigned int or shorter.
  • 1UL is needed if flag can also be a signed long or an unsigned long.

  1. Undefined behaviour can also result based on the value of E2. This happens if E2 is negative, equal to the width of E1, or greater than the width of E1. This puts a constraint on the valid values for test_bit's first argument.
Lamented answered 15/1 at 17:7 Comment(2)
There's strictly speaking also a 3rd form of UB, in case E2 is negative. "If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined."Brahmin
ah, I see, I made it sound like the list I presented was exhaustive. I've addressed this.Lamented
K
12

If your int is 32 bits, and you have

#define test_bit(n, flag) ((1 << (n)) & (flag))

then test_bit(31, flag) has undefined behavior because of the signed integer overflow. Why is unsigned integer overflow defined behavior but signed integer overflow isn't?

Making the 1 an unsigned type avoids the UB. Making it unsigned long (which is what 1UL achieves) allows the same macro to be used for masks that are wider. For instance, on a system where int is 32 bits and long is 64, by using 1UL you can safely use bits up through test_bit(63, flag).

Keratoplasty answered 15/1 at 17:4 Comment(5)
This isn't an integer overflow as such, since C traditionally does not mandate one particular signedness format. It is simply UB because the left shift operator says so (C17 6.5.7) - we simply aren't allowed to shift data into the sign bit(s).Brahmin
@Lundin: True, by "if int is 32 bits" I'm assuming 32 bits two's-complement or something like that. But I think of "integer overflow is UB" as coming from 6.5p5, which also applies here, since the mathematical result of 1 << 31 (namely 2^31) is not in the range of representable values for such an int.Keratoplasty
@Lundin: All three of the allowed signed-integer representations (ones or twos complement, and sign/magnitude) are unable to represent 1<<31 in a 32-bit signed integer. It's implementation-defined which is used, but IIRC it has to be one of the 3. They all use the same bit-patterns as unsigned for non-negative integers, so have the same INT_MAX for the same number of value-bits. C signed left shifts are defined in terms of arithmetic doubling, not in terms of fiddling with the bit-pattern, aren't they?Apiarian
@PeterCordes In theory variables may also have padding bits (which may be trap representations). Not that I have ever seen such a system or understand how it would make sense, but the standard allows it. Might make sense for signed magnitude on some oddball ISA to place the sign bit in a byte of its own I guess.Brahmin
@Lundin: Exactly, that's why I said "same INT_MAX for same number of value-bits", rather than same sizeof(int) * CHAR_BIT total bits in the object-representation. With the no-padding case being the upper bound on number of value bits in a "32-bit integer". Only unsigned char and the optional fixed-width types like int32_t (which also guarantees 2's complement if it exists) guarantee no padding. And yeah, IDK why you'd have padding in a primitive type like intApiarian
L
9

Sign extending only applies to right shifts, so that's not applicable.


<< is defined as follows:

C23 §6.5.7 ¶4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, wrapped around. If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

There are two ways in which left-shifting values can result in undefined behaviour based on E1:[1]

  • E1 has a signed type and negative value.
  • E1 has a signed type and nonnegative value, and E1 × 2E2 is unrepresentable.

In our case, E1 is a positive value, so the former isn't applicable. However, the latter could apply depending on the type of E1.

Let's look at what results we get for different types on two systems.

  • System "L" has a 32-bit int and a 64-bit long (e.g. Linux on x86-64).
  • System "W" has a 32-bit int and a 32-bit long (e.g. Windows on x86-64).
Implementation Usage Result on "L" Result on "W"
1 << (n) test_bit( 31, flag ) Undefined behaviour Undefined behaviour
1L << (n) test_bit( 31, flag ) ok (since long is 64 bits) Undefined behaviour
1U << (n) test_bit( 31, flag ) ok ok
1U << (n) test_bit( 63, flag ) Incorrect result
1L << (n) test_bit( 63, flag ) Undefined behaviour
1UL << (n) test_bit( 63, flag ) ok

So, assuming you want to be able to test any of the bits of flag

  • 1U is needed if flag can be a signed int or an unsigned int or shorter.
  • 1UL is needed if flag can also be a signed long or an unsigned long.

  1. Undefined behaviour can also result based on the value of E2. This happens if E2 is negative, equal to the width of E1, or greater than the width of E1. This puts a constraint on the valid values for test_bit's first argument.
Lamented answered 15/1 at 17:7 Comment(2)
There's strictly speaking also a 3rd form of UB, in case E2 is negative. "If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined."Brahmin
ah, I see, I made it sound like the list I presented was exhaustive. I've addressed this.Lamented
E
5

the integer literal "UL"

... is not called an "integer literal". That term is not used in C language specification at all, and it risks confusion with "integer constant", of which the whole 1UL would be an example. Informally, you might call a (whole) integer constant an "integer literal", but not the "UL" part alone. The "UL" by itself can be called a "suffix", and if we draw names from the formal grammar in the language spec then we could call it an "integer suffix" more specifically.

IMHO, the integer literal "UL" is suppose to suppress unwanted shift, e.g. sign-extending a signed integer may result in multiple bits being set.

The primary purpose of expressing an integer constant with a suffix is to control its data type. 1UL has type unsigned long int, whereas unsuffixed 1 has type int.

However, if the case is, doing logical left shift only, as shown above, do we still need the integer literal?

In a bitwise shift operation, the type of the left operand is the type of the result, and it can affect the value of the result. It can also affect whether evaluating the expression even has defined behavior at all.

In the case of the particular macro you present, whether it is important to use 1UL instead of just 1 depends on how the macro is used. But there are wholly reasonable uses for which using 1UL produces the desired effect but just 1 does not.

As a left shift won't cause un-intended behavior,

Did you mean undefined behavior? A left shift of a value of a signed type (even a positive one) absolutely can have undefined behavior as far as C is concerned. And in cases where the behavior is undefined, you are totally unjustified in assuming the result will be what you expect or intend.

I can't figure what is its purpose then.

Even if we ignore questions around signedness, if int and long int are different size, as often they are, then 1U << n may yield a different, well defined result than 1UL does. In such a case, it is unreasonable to expect 1 << n to evaluate to the same result that 1UL << n does, undefined behavior notwithstanding.

Extrapolate answered 15/1 at 17:26 Comment(3)
Re “… is not called an "integer literal"”: Yes, it is. The C standard is not the only source of terms used in programming, and “literal” is a term for something whose value is conveyed in its letters (characters). 123 is a literal because you can see its value from its source code characters, while enum member foo is an integer constant but is not a literal.Akira
If anything, the fact that C standard doesn't use the word literal (unless combined with other words) legitimizes its use since there are no conflicts.Lamented
@EricPostpischil, the OP is calling the "UL" suffix itself an "integer literal". The suffix alone absolutely is not called by that name.Extrapolate

© 2022 - 2024 — McMap. All rights reserved.