Handling overflow when casting doubles to integers in C

Asked 8/2, 2009 at 17:27 Answered 1/7 at 23:27

Today, I noticed that when I cast a double that is greater than the maximum possible integer to an integer, I get -2147483648. Similarly, when I cast a double that is less than the minimum possible integer, I also get -2147483648.

Is this behavior defined for all platforms?
What is the best way to detect this under/overflow? Is putting if statements for min and max int before the cast the best solution?

Trihedron answered 8/2, 2009 at 17:27 Comment(1)

Casting minimum 32-bit integer (-2147483648) to float gives positive number (2147483648.0) – Decay 20/7, 2016 at 15:49

When casting floats to integers, overflow causes undefined behavior. From the C99 spec, section 6.3.1.4 Real floating and integer:

When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.

You have to check the range manually, but don't use code like:

// DON'T use code like this!
if (my_double > INT_MAX || my_double < INT_MIN)
    printf("Overflow!");

INT_MAX is an integer constant that may not have an exact floating-point representation. When comparing to a float, it may be rounded to the nearest higher or nearest lower representable floating point value (this is implementation-defined). With 64-bit integers, for example, INT_MAX is 2^63 - 1 which will typically be rounded to 2^63, so the check essentially becomes my_double > INT_MAX + 1. This won't detect an overflow if my_double equals 2^63.

For example with gcc 4.9.1 on Linux, the following program

#include <math.h>
#include <stdint.h>
#include <stdio.h>

int main() {
    double  d = pow(2, 63);
    int64_t i = INT64_MAX;
    printf("%f > %lld is %s\n", d, i, d > i ? "true" : "false");
    return 0;
}

prints

9223372036854775808.000000 > 9223372036854775807 is false

It's hard to get this right if you don't know the limits and internal representation of the integer and double types beforehand. But if you convert from double to int64_t, for example, you can use floating point constants that are exact doubles (assuming two's complement and IEEE doubles):

if (!(my_double >= -9223372036854775808.0   // -2^63
   && my_double <   9223372036854775808.0)  // 2^63
) {
    // Handle overflow.
}

The construct !(A && B)also handles NaNs correctly. A portable, safe, but slighty inaccurate version for ints is:

if (!(my_double > INT_MIN && my_double < INT_MAX)) {
    // Handle overflow.
}

This errs on the side of caution and will falsely reject values that equal INT_MIN or INT_MAX. But for most applications, this should be fine.

Hydrosphere answered 24/5, 2015 at 14:7 Comment(4)

I have just done a little empirical testing, and this answer appears to be correct (again, assuming two's complement integers; if you can't assume that, maybe Boost or SafeInt is the only reasonable way to go). You should upvote this answer and downvote the incorrect answer that advocates my_double > INT_MAX || my_double < INT_MIN; that is in fact incorrect. – Debase 5/2, 2016 at 0:43

@Debase I just checked, and both Boost and SafeInt make the same mistake that I discuss in my answer. – Hydrosphere 6/6, 2016 at 18:53

Yikes. Did you report the problem to them? – Debase 8/6, 2016 at 22:12

@Debase Yes, here and here. – Hydrosphere 8/6, 2016 at 22:39

limits.h has constants for max and min possible values for integer data types, you can check your double variable before casting, like

if (my_double > nextafter(INT_MAX, 0) || my_double < nextafter(INT_MIN, 0))
    printf("Overflow!");
else
    my_int = (int)my_double;

EDIT: nextafter() will solve the problem mentioned by nwellnhof

Bacterium answered 8/2, 2009 at 17:42 Comment(7)

This doesn't work for me for a float. I try and do this float f = INT_MAX; f++; ConvertToInt(f) with the limit checking that you have above and it does not overflow. What's different? – Saxen 11/8, 2014 at 19:58

@Pittfall: float has (all except very exotic platforms use IEEE-754 floats) has 24 significant binary digits. So when you set it to INT_MAX, which is 2³¹-1 (INT_MAX on 32-bit platform), the last digit is 128s. So if you add anything smaller than 128, the result is the orignal number, that is (float)INT_MAX + 1.f == (float)INT_MAX. With double, which has more significant digits than int it will work. – Rocaille 6/10, 2014 at 14:2

INT_MAX and INT_MIN are the C way of checking. C++ way is using std::numeric_limits<int>::max() and …::min(). – Rocaille 6/10, 2014 at 14:4

This answer is wrong, because it's not guaranteed that INT_MIN and INT_MAX have precise floating point representations. With 64-bit integers for example, INT_MAX is 2^63-1 and (double)INT_MAX will be rounded to 2^63, so this check won't detect an overflow if my_double is 2^63. Changing the check to my_double >= INT_MAX || my_double < INT_MIN should actually work with two's complement integers, even if it looks wrong. – Hydrosphere 24/5, 2015 at 13:37

I agree that this answer is wrong. It should be removed. – Debase 5/2, 2016 at 0:45

The only problem with nextafter is that a double that equals INT_MIN will be detected as overflow, although it could be converted to int. Other than that, it's a nice, portable solution. – Hydrosphere 6/6, 2016 at 18:43

This code remains wrong. Consider my_double = −2,147,483,648 (−2**31), which is exactly representable in ordinary floating-point formats. With a 32-bit int, INT_MIN is this number, but nextafter(INT_MIN, 0) produces a greater number (closer to zero), so my_double < nextafter(INT_MIN, 0) reports “Overflow!”, but the conversion to int would not overflow. – Marismarisa 1/7 at 11:44

To answer your question: The behaviour when you cast out of range floats is undefined or implementation specific.

Speaking from experience: I've worked on a MIPS64 system that didn't implemented these kind of casts at all. Instead of doing something deterministic the CPU threw a CPU exception. The exception handler that ought to emulate the cast returned without doing anything to the result.

I've ended up with random integers. Guess how long it took to trace back a bug to this cause. :-)

You'll better do the range check yourself if you aren't sure that the number can't get out of the valid range.

Braque answered 8/2, 2009 at 20:16 Comment(0)

A portable way for C++ is to use the SafeInt class:

http://www.codeplex.com/SafeInt

The implementation will allow for normal addition/subtract/etc on a C++ number type including casts. It will throw an exception whenever and overflow scenario is detected.

SafeInt<int> s1 = INT_MAX;
SafeInt<int> s2 = 42;
SafeInt<int> s3 = s1 + s2;  // throws

I highly advise using this class in any place where overflow is an important scenario. It makes it very difficult to avoid silently overflowing. In cases where there is a recovery scenario for an overflow, simply catch the SafeIntException and recover as appropriate.

SafeInt now works on GCC as well as Visual Studio

Brythonic answered 8/2, 2009 at 17:57 Comment(3)

A rough test led me to believe it's probably impossible to detect overflow in C++ (without serious overhead or total change of paradigm, such as wrapping every integer as object). This dedicated class can't seem to handle SafeInt<size_t> x = std::numeric_limits<size_t>::max() + 100 (it doesn't throw). – Cornett 2/12, 2010 at 13:2

are you kidding...? your right hand side did already overflow before it even reaches the SafeInt ctor. You can't blame SafeInt for that – Indulgence 8/1, 2011 at 0:2

SafeInt makes the same mistake that I discuss in my answer. SafeInt<int64_t> v = pow(2.0, 63.0) doesn't throw. – Hydrosphere 6/6, 2016 at 18:36

What is the best way to detect this under/overflow?

Compare the truncated double to exact limits near INT_MIN,INT_MAX.

The trick is to exactly convert limits based on INT_MIN,INT_MAX into double values. A double may not exactly represent INT_MAX as the number of bits in an int may exceed that floating point's precision.^*1 In that case, the conversion of INT_MAX to double suffers from rounding. The number after INT_MAX is a power-of-2 and is certainly representable as a double. 2.0*(INT_MAX/2 + 1) generates the whole number one greater than INT_MAX.

The same applies to INT_MIN on non-2s-complement machines.

INT_MAX is always a power-of-2 - 1.
INT_MIN is always:
-INT_MAX (not 2's complement) or
-INT_MAX-1 (2's complement)

int double_to_int(double x) {
  x = trunc(x);
  if (x >= 2.0*(INT_MAX/2 + 1)) Handle_Overflow();
  #if -INT_MAX == INT_MIN
  if (x <= 2.0*(INT_MIN/2 - 1)) Handle_Underflow();
  #else

  // Fixed 2022
  // if (x < INT_MIN) Handle_Underflow();
  if (x - INT_MIN < -1.0) Handle_Underflow();

  #endif
  return (int) x;
}

To detect NaN and not use trunc()

#define DBL_INT_MAXP1 (2.0*(INT_MAX/2+1)) 
#define DBL_INT_MINM1 (2.0*(INT_MIN/2-1)) 

int double_to_int(double x) {
  if (x < DBL_INT_MAXP1) {
    #if -INT_MAX == INT_MIN
    if (x > DBL_INT_MINM1) {
      return (int) x;
    }
    #else
    if (ceil(x) >= INT_MIN) {
      return (int) x;
    }
    #endif 
    Handle_Underflow();
  } else if (x > 0) {
    Handle_Overflow();
  } else {
    Handle_NaN();
  }
}

[Edit 2022] Corner error corrected after 6 years.

double values in the range (INT_MIN - 1.0 ... INT_MIN) (non-inclusive end-points) convert well to int. Prior code failed those.

^*1 This applies too to INT_MIN - 1 when int precision is more than double. Although this is rare, the issues readily applies to long long. Consider the difference between:

  if (x < LLONG_MIN - 1.0) Handle_Underflow(); // Bad
  if (x - LLONG_MIN < -1.0) Handle_Underflow();// Good

With 2's complement, some_int_type_MIN is a (negative) power-of-2 and exactly converts to a double. Thus x - LLONG_MIN is exact in the range of concern while LLONG_MIN - 1.0 may suffer precision loss in the subtraction.

Lucho answered 19/7, 2016 at 23:36 Comment(5)

You say that a power of 2, the size of MAX_INT+1, is certainly representable as a double. Can you explain why? What are your assumptions? Is assuming IEEE enough? – Naturopathy 15/10, 2017 at 2:31

Ok, I can see now that you are assuming that FLT_RADIX is 2 and that INT_MAX is well below DBL_MAX, and nothing else beyond what is guaranteed by the language standard. That's a wonderful and creative solution. I love it. – Naturopathy 15/10, 2017 at 3:18

One comment: What you call underflow, I believe is formally called negative overflow. Underflow is when result snaps to zero. Overflow is when magnitude is impacted, roughly speaking. – Naturopathy 15/10, 2017 at 5:55

@KristianSpangsege In C, underflow with FP matches the comment and C uses overflow for results outside the int range (+ and -). In the vernacular, I have heard and read underflow used with excessively negative results and since OP used it that way, made sense to answer likewise. negative overflow is a good term - even if a bit verbose. – Lucho 15/10, 2017 at 6:7

Strictly speaking, there is one more assumption, namely that DBL_MANT_DIG <= DBL_MAX_EXP such that the result of ceil() is always representable. IEEE types satisfy this condition. I picked this up from the Linux man page for ceil(). – Naturopathy 15/10, 2017 at 6:39

Another option is to use boost::numeric_cast which allows for arbitrary conversion between numerical types. It detects loss of range when a numeric type is converted, and throws an exception if the range cannot be preserved.

The website referenced above also provides a small example which should give a quick overview on how this template can be used.

Of course, this isn't plain C anymore ;-)

Hagerman answered 8/2, 2009 at 19:1 Comment(1)

boost::numeric_cast makes the same mistake that I discuss in my answer. numeric_cast<int64_t>(pow(2.0, 63.0)) doesn't throw. – Hydrosphere 6/6, 2016 at 18:52

We meet the same question. such as:

double d = 9223372036854775807L;
int i = (int)d;

in Linux/window, i = -2147483648. but In AIX 5.3 i = 2147483647.

If the double is outside the range of integer.

Linux/window always return INT_MIN.
AIX will return INT_MAX if double is postive, will return INT_MIN of double is negetive.

Burnish answered 23/1, 2013 at 9:36 Comment(3)

Linux definitely does not always return INT_MIN, and I'd be surprised if Windows did, because this is a function of the processor architecture itself (it isn't operating system specific). – Audrey 4/6, 2019 at 10:1

printf("%d\n", (int) (double) (9223372036854775807L));prints 2147483647, but double d = 9223372036854775807L; int i = (int) d; printf("%d", i); prints -2147483648. MinGW730_64 on Windows – Ignatzia 14/10, 2021 at 12:30

The behavior is a combination of multiple factors, including the processor architecture and the compiler. Some compilers will compute values during compilation and may use a different algorithm than the processor’s built-in conversion instruction. – Marismarisa 1/7 at 11:40

Here is C code to test and report whether a double can be converted to int without overflow and, if it can, return the resulting int. I copied it from my answer here. This code takes pains to use behavior defined by the C standard in a variety of C implementations.

This approach uses the definition of floating-point formats in the C standard—as a signed base-b numeral multiplied by a power of b. Knowing the number of digits in the significand (provided by DBL_MANT_DIG) and the exponent limit (provided by DBL_MAX_EXP) allows us to prepare exact double values as end points.

I believe it will work in all conforming C implementations subject to the modest additional requirements stated in the initial comment.

/*  This code demonstrates safe conversion of double to int in which the
    input double is converted to int if and only if it is in the supported
    domain for such conversions (the open interval (INT_MIN-1, INT_MAX+1)).
    If the input is not in range, an error is indicated (by way of an
    auxiliary argument) and no conversion is performed, so all behavior is
    defined.

    There are a few requirements not fully covered by the C standard.  They
    should be uncontroversial and supported by all reasonable C implementations:

        Conversion of an int that is representable in double produces the
        exact value.

        The following operations are exact in floating-point:

            Dividing by the radix of the floating-point format, within its
            range.

            Multiplying by +1 or -1.

            Adding or subtracting two values whose sum or difference is
            representable.

        FLT_RADIX is representable in int.

        DBL_MIN_EXP is not greater than -DBL_MANT_DIG.  (The code can be
        modified to eliminate this requirement.)
*/


#include <float.h>
#include <errno.h>
#include <limits.h>
#include <stdio.h>


/*  These values will be initialized to the greatest double value not greater
    than INT_MAX+1 and the least double value not less than INT_MIN-1.
*/
static double UpperBound, LowerBound;


/*  Return the double of the same sign of x that has the greatest magnitude
    less than x+s, where s is -1 or +1 according to whether x is negative or
    positive.
*/
static double BiggestDouble(int x)
{
    /*  All references to "digits" in this routine refer to digits in base
        FLT_RADIX.  For example, in base 3, 77 would have four digits (2212).

        In this routine, "bigger" and "smaller" refer to magnitude.  (3 is
        greater than -4, but -4 is bigger than 3.)
    */

    //  Determine the sign.
    int s = 0 < x ? +1 : -1;

    //  Count how many digits x has.
    int digits = 0;
    for (int t = x; t; ++digits)
        t /= FLT_RADIX;

    /*  If the double type cannot represent finite numbers this big, return the
        biggest finite number it can hold, with the desired sign.
    */
    if (DBL_MAX_EXP < digits)
        return s*DBL_MAX;

    //  Determine whether x is exactly representable in double.
    if (DBL_MANT_DIG < digits)
    {
        /*  x is not representable, so we will return the next lower
            representable value by removing just as many low digits as
            necessary.  Note that x+s might be representable, but we want to
            return the biggest double less than it, which is also the biggest
            double less than x.
        */

        /*  Figure out how many digits we have to remove to leave at most
            DBL_MANT_DIG digits.
        */
        digits = digits - DBL_MANT_DIG;

        //  Calculate FLT_RADIX to the power of digits.
        int t = 1;
        while (digits--) t *= FLT_RADIX;

        return x / t * t;
    }
    else
    {
        /*  x is representable.  To return the biggest double smaller than
            x+s, we will fill the remaining digits with FLT_RADIX-1.
        */

        //  Figure out how many additional digits double can hold.
        digits = DBL_MANT_DIG - digits;

        /*  Put a 1 in the lowest available digit, then subtract from 1 to set
            each digit to FLT_RADIX-1.  (For example, 1 - .001 = .999.)
        */
        double t = 1;
        while (digits--) t /= FLT_RADIX;
        t = 1-t;

        //  Return the biggest double smaller than x+s.
        return x + s*t;
    }
}


/*  Set up supporting data for DoubleToInt.  This should be called once prior
    to any call to DoubleToInt.
*/
static void InitializeDoubleToInt(void)
{
    UpperBound = BiggestDouble(INT_MAX);
    LowerBound = BiggestDouble(INT_MIN);
}


/*  Perform the conversion.  If the conversion is possible, return the
    converted value and set *error to zero.  Otherwise, return zero and set
    *error to ERANGE.
*/
static int DoubleToInt(double x, int *error)
{
    if (LowerBound <= x && x <= UpperBound)
    {
        *error = 0;
        return x;
    }
    else
    {
        *error = ERANGE;
        return 0;
    }
}


#include <string.h>


static void Test(double x)
{
    int error, y;
    y = DoubleToInt(x, &error);
    printf("%.99g -> %d, %s.\n", x, y, error ? strerror(error) : "No error");
}


#include <math.h>


int main(void)
{
    InitializeDoubleToInt();
    printf("UpperBound = %.99g\n", UpperBound);
    printf("LowerBound = %.99g\n", LowerBound);

    Test(0);
    Test(0x1p31);
    Test(nexttoward(0x1p31, 0));
    Test(-0x1p31-1);
    Test(nexttoward(-0x1p31-1, 0));
}

Marismarisa answered 1/7 at 23:27 Comment(0)

-1

I am not sure about this but I think it may be possible to "turn on" floating point exceptions for under/overflow...take a look at this Dealing with Floating-point Exceptions in MSVC7\8 so you might have an alternative to if/else checks.

Thigmotaxis answered 8/2, 2009 at 17:39 Comment(1)

Answers should have key information included in the answer, not merely referred to as a link. The link in this answer is dead; it leads to a “404” not-found page. – Marismarisa 1/7 at 11:42

-2

I can't tell you for certain whether it is defined for all platforms, but that is pretty much what's happened on every platform I've used. Except, in my experience, it rolls. That is, if the value of the double is INT_MAX + 2, then when the result of the cast ends up being INT_MIN + 2.

As for the best way to handle it, I'm really not sure. I've run up against the issue myself, and have yet to find an elegant way to deal with it. I'm sure someone will respond that can help us both there.

Donnettedonni answered 8/2, 2009 at 17:40 Comment(1)

Even if the conversion did wrap, INT_MAX+2 would wrap to INT_MIN+1 with two’s complement, not INT_MIN+2, as that is the result of subtracting 2**w, where w is the width of int. – Marismarisa 1/7 at 11:45

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags