C - Casting ~0 to unsigned long
Asked Answered
I

2

8

I’ve seen low-level bitwise expressions that use ~0 to generate a bit-pattern of all 1s, which is then used as masks, etc. For example on K&R page 45:

/* getbits: get n bits from position p */
unsigned getbits(unsigned x, int p, int n)
{
    return (x >> (p+1-n)) & ~(~0 << n);
}

In my machine, (unsigned long) ~0 evaluates to 0x FF FF FF FF FF FF FF FF. This lets us easily generate 1 masks larger than ints and is kinda nice.

However, shouldn’t (unsigned long) ~0 instead evaluate to 0x 00 00 00 00 FF FF FF FF? Without any suffixes, 0 is considered an integer constant, so ~0 evaluates to 0x FF FF FF FF. Why doesn’t casting this to unsigned long result in a zero-padded value? What am I missing here?

Edit: On my machine, sizeof(int) and sizeof(long) are 4 and 8, respectively.

Insincere answered 10/3, 2023 at 5:23 Comment(9)
zero-padded value? during integer promotion it pads with sign bit value (ie it is 1 padded for negative), otherwise (long)-1 will not be equal to -1LElin
~0 produces (int)0xffffffff = -1 Casting to an unsigned type produces the unique unsigned value which is equal to the original value modulo 2^n where n is the number of unsigned bits. In your case, n=64, and the unique 64-bit value which is equal to -1 modulo 2^64 is 0xffffffffffffffff.Plumbery
@Bob__ sizeof(int) = 4 and sizeof(long)=8. I also added this to the post.Insincere
Hermit, To easily generate an (unsigned long) of all 1's use ULONG_MAX or perhaps ~0UL or -1UL. Avoid using signed constants for this task.Tedder
@IłyaBursov This seems to be the case. I didn’t know C sign-extends the values even when they are being cased into larger unsigned types.Insincere
@Insincere This isn't actually sign-extension. The purpose of sign extension is to use more bits for storing a value but keep the original value. When you cast a negative value to an unsigned type the original value isn't kept.Process
@chux-ReinstateMonica This works well for ints and longs. Is there also a clean way to generate chars and shorts without casting a ~0u to char?Insincere
Oh god this code is actually in K&R... For the love of undefined behavior put down that horrible book! You cannot left shift signed and negative numbers in C, this was widely known before K&R 2nd edition was published. Just get rid of the book, it is actively harmful.Zygospore
@Insincere For char, use char all_ones = CHAR_MAX | CHAR_MIN;. Likewise for short. Yet be advised: avoid signed types for bit manipulation: stay with unsigned char, unsigned short, unsigned, ...Tedder
P
14

On your system ~0 is an int with bit pattern FF FF FF FF which represent the decimal value -1.

When you cast to unsigned long you convert one integer type to another integer. The rules for that can be found in 6.3.1.3 of the C (draft) standard n1570:

1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.60)

60) The rules describe arithmetic on the mathematical value, not the value of a given type of expression.

The first rule doesn't apply as -1 can't be represented in unsigned long. The second rule aplies, i.e. the new value is found like

new-value = original-value + (ULONG_MAX + 1)

In your case:

new-value = -1 + (ULONG_MAX + 1) = ULONG_MAX

so the new value is ULONG_MAX which has the representation FF FF FF FF FF FF FF FF on your system.

Process answered 10/3, 2023 at 5:56 Comment(1)
To add a little detail: when doing these conversions, the equations should be carried out as pure abstract math without taking the widths of the operands in account. There is a helpful foot note at 6.3.1.3: "The rules describe arithmetic on the mathematical value, not the value of a given type of expression." Maybe add that note to the answer for completeness.Zygospore
S
3

In addition to the excellent response by @SupportUkraine, you should be aware that return (x >> (p+1-n)) & ~(~0 << n); invokes undefined behavior:

6.5.7 Bitwise shift operators

[...]

4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, wrapped around. If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

~0 has type int and a value of -1, hence ~0 << n has undefined behavior for n > 0. You should use ~0U << n instead and use the type of x is larger than unsigned int. A generic expression for this is ~(0U * x) << n for all unsigned types of x.

Smoothspoken answered 10/3, 2023 at 14:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.