Is there a safe way to get the unsigned absolute value of a signed integer, without triggering overflow?
Asked Answered
E

2

17

Consider a typical absolute value function (where for the sake of argument the integral type of maximum size is long):

unsigned long abs(long input);

A naive implementation of this might look something like:

unsigned long abs(long input)
{
    if (input >= 0)
    {
        // input is positive
        // We know this is safe, because the maximum positive signed
        // integer is always less than the maximum positive unsigned one
        return static_cast<unsigned long>(input);
    }
    else
    {
        return static_cast<unsigned long>(-input); // ut oh...
    }
}

This code triggers undefined behavior, because the negation of input may overflow, and triggering signed integer overflow is undefined behavior. For instance, on 2s complement machines, the absolute value of std::numeric_limits<long>::min() will be 1 greater than std::numeric_limits<long>::max().

What can a library author do to work around this problem?

Elyseelysee answered 26/6, 2013 at 7:3 Comment(0)
E
24

One can cast to the unsigned variant first to avoid any undefined behavior:

unsigned long uabs(long input)
{
    if (input >= 0)
    {
        // input is positive
        return static_cast<unsigned long>(input);
    }
    else
    {
        return -static_cast<unsigned long>(input); // read on...
    }
}

In the above code, we invoke two well defined operations. Converting the signed integer to the unsigned one is well defined by N3485 4.7 [conv.integral]/2:

If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2^n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). — end note ]

This basically says that when making the specific conversion of going from signed to unsigned, one can assume unsigned-style wraparound.

The negation of the unsigned integer is well defined by 5.3.1 [expr.unary.op]/8:

The negative of an unsigned quantity is computed by subtracting its value from 2^n , where n is the number of bits in the promoted operand.

These two requirements effectively force implementations to operate like a 2s complement machine would, even if the underlying machine is a 1s complement or signed magnitude machine.


A generalized C++11 version that returns the unsigned version of an integral type:

#include <type_traits>

template <typename T>
constexpr
typename std::make_unsigned<T>::type uabs(T x)
{
    typename std::make_unsigned<T>::type ux = x;
    return (x<0) ? -ux : ux;   // compare signed x, negate unsigned x
}

This compiles on the Godbolt compiler explorer, with a test case showing that gcc -O3 -fsanitize=undefined finds no UB in uabs(std::numeric_limits<long>::min()); after constant-propagation, but does in std::abs().

Further template stuff should be possible to make a version that would return the unsigned version of integral types, but return T for floating-point types, if you want a general-purpose replacement for std::abs.

Elyseelysee answered 26/6, 2013 at 7:11 Comment(3)
Nice answer, albeit to your own question, but +1.Quickfreeze
You can write this compactly as return x<0 ? 0UL - x : x;. That uses implicit conversion to the unsigned type in 0UL - x, vs. your way explicitly documenting when the conversion happens. The explicit way is probably good for clarity, although you might do unsigned long ux = static_cast<unsigned long>(x); return x<0 : -ux : ux; to still write it as a ternary, hinting compilers to do it branchlessly. And to avoid repeating the conversion, especially if you want to template it to deduce the unsigned version of the input type so the cast is even longer.Kenning
A name like uabs or absu would be better than abs. I know abs is different from std::abs in C++, but for human readers it's worth the reminder that this is a function that returns unsigned, and is well-defined for every input, unlike the inconvenient std::abs, and C's abs from stdlib.h. (People who regularly work with C as well as C++ will tend to think of that.)Kenning
P
1

Just add one if negative.

unsigned long absolute_value(long x) {
  if (x >= 0) return (unsigned long)x;
  x = -(x+1);
  return (unsigned long)x + 1;
}
Phenobarbital answered 3/2, 2019 at 23:32 Comment(2)
Fun fact: compilers are smart enough to untangle this back to an efficient abs() idiom for whatever target machine. e.g. godbolt.org/z/34h3v8KrP shows GCC for RISC-V compiling it to a branchless bithack, with -O2. And clang for x86 using neg/cmov. But I still wouldn't write the source this way; I'd much rather use unsigned operations to avoid UB, without extra operations that I want the compiler to remove. Also in terms of readability and easy verification of correctness just from looking at it, this takes a small amount of effort vs. near zero.Kenning
It's interesting to observe why my answer works: in two's complement negation means bitwise not, plus 1. So (-(x+1) + 1) reduces to (~x + 1) reduces to -x. No compiler heroics required! But I certainly agree with your point regarding readibility.Phenobarbital

© 2022 - 2024 — McMap. All rights reserved.