No useful and reliable way to detect integer overflow in C/C++?
Asked Answered
M

4

12

No, this is not a duplicate of How to detect integer overflow?. The issue is the same but the question is different.


The gcc compiler can optimize away an overflow check (with -O2), for example:

int a, b;
b = abs(a);                     // will overflow if a = 0x80000000
if (b < 0) printf("overflow");  // optimized away

The gcc people argue that this is not a bug. Overflow is undefined behavior, according to the C standard, which allows the compiler to do anything. Apparently, anything includes assuming that overflow never happens. Unfortunately, this allows the compiler to optimize away the overflow check.

The safe way to check for overflow is described in a recent CERT paper. This paper recommends doing something like this before adding two integers:

if ( ((si1^si2) | (((si1^(~(si1^si2) & INT_MIN)) + si2)^si2)) >= 0) { 
  /* handle error condition */
} else {
  sum = si1 + si2;
}

Apparently, you have to do something like this before every +, -, *, / and other operations in a series of calculations when you want to be sure that the result is valid. For example if you want to make sure an array index is not out of bounds. This is so cumbersome that practically nobody is doing it. At least I have never seen a C/C++ program that does this systematically.

Now, this is a fundamental problem:

  • Checking an array index before accessing the array is useful, but not reliable.

  • Checking every operation in the series of calculations with the CERT method is reliable but not useful.

  • Conclusion: There is no useful and reliable way of checking for overflow in C/C++!

I refuse to believe that this was intended when the standard was written.

I know that there are certain command line options that can fix the problem, but this doesn't alter the fact that we have a fundamental problem with the standard or the current interpretation of it.

Now my question is: Are the gcc people taking the interpretation of "undefined behavior" too far when it allows them to optimize away an overflow check, or is the C/C++ standard broken?

Added note: Sorry, you may have misunderstood my question. I am not asking how to work around the problem - that has already been answered elsewhere. I am asking a more fundamental question about the C standard. If there is no useful and reliable way of checking for overflow then the language itself is dubious. For example, if I make a safe array class with bounds checking then I should be safe, but I'm not if the bounds checking can be optimized away.

If the standard allows this to happen then either the standard needs revision or the interpretation of the standard needs revision.

Added note 2: People here seem unwilling to discuss the dubious concept of "undefined behavior". The fact that the C99 standard lists 191 different kinds of undefined behavior (link) is an indication of a sloppy standard.

Many programmers readily accept the statement that "undefined behavior" gives the license to do anything, including formatting your hard disk. I think it is a problem that the standard puts integer overflow into the same dangerous category as writing outside array bounds.

Why are these two kinds of "undefined behavior" different? Because:

  • Many programs rely on integer overflow being benign, but few programs rely on writing outside array bounds when you don't know what is there.

  • Writing outside array bounds actually can do something as bad as formatting your hard disk (at least in an unprotected OS like DOS), and most programmers know that this is dangerous.

  • When you put integer overflow into the dangerous "anything goes" category, it allows the compiler to do anything, including lying about what it is doing (in the case where an overflow check is optimized away)

  • An error such as writing outside array bounds can be found with a debugger, but the error of optimizing away an overflow check cannot, because optimization is usually off when debugging.

  • The gcc compiler evidently refrains from the "anything goes" policy in case of integer overflow. There are many cases where it refrains from optimizing e.g. a loop unless it can verify that overflow is impossible. For some reason, the gcc people have recognized that we would have too many errors if they followed the "anything goes" policy here, but they have a different attitude to the problem of optimizing away an overflow check.

Maybe this is not the right place to discuss such philosophical questions. At least, most answers here are off the point. Is there a better place to discuss this?

Maleate answered 28/7, 2011 at 8:42 Comment(4)
You are conflating two fundamentally different things here – array overflow check, which is rarely useful except for debugging, and arithmetic overflow checks, which are often useful.Perfidy
"I refuse to believe" - that's your call, but if you refuse to believe true things then you're going to have difficulty. Signed integer types are no use in C or C++ for calculations that might overflow. That's a consequence of the standard being designed to avoid imposing overhead on machine architectures whose "natural" behavior is different from whatever you or I think the behavior should be. That consequence was considered worth paying to avoid the implementation burden.Guido
Calculations with arry bounds are especially suitable for unsigned arithmetic, as you know a priori that you should never get a negative end result anyway.Sabayon
"If there is no useful and reliable way of checking for overflow then the language itself is dubious" - then the language is dubious by your definition. If you don't find the language useful, don't use it (or only use implementations and command-line options which provide additional guarantees about overflow). Probably where you part company with the C standard committee is that you think checking for overflow before the operation is useless, and they think it isn't. Possibly you also think unsigned types are useless, and they don't.Guido
P
4

The gcc developers are entirely correct here. When the standard says that the behavior is undefined that means exactly that there are no requirements on the compiler.

As a valid program can not do anything that causes UB (as then it would not be valid anymore), the compiler can very well assume that UB doesn't happen. And if it still does, anything the compiler does would be ok.

For your problem with overflow, one solution is to consider what ranges the caclulations are supposed to handle. For example, when balancing my bank account I can assume that the amounts would be well below 1 billion, so a 32-bit int will work.

For your application domain you can probably do similar estimates about exactly where an overflow could be possible. Then you can add checks at those points or choose another data type, if available.

Plosion answered 28/7, 2011 at 8:51 Comment(5)
"For example, when balancing my bank account I can assume that the amounts would be well below 1 billion, so a 32-bit int will work." - and has the advantage that in the event of hyper-inflation, you are guaranteed a job fixing your own code. You really don't want to be unemployed during hyper-inflation.Guido
This kind of assumption is exactly what caused the Y2K problem.Piroshki
@Graham - No it isn't. If you do numeric computations you must know the intended domain of your application. If the domain changes, you will have to reevaluate. Or are you already preparing for the Y10K problem?Plosion
It's sort of true. The thinking, "I will use a fixed-size type/field, because it is faster and never fails out-of-memory, and works provided that the range of numbers I handle is limited", is the kind of assumption that means Y2k, Y10k, and inflation at 1000% all cause software problems. The difference between the Y2k and the Y10k problem is just that in the 60's, when people started writing lots of business software, they didn't know their software and its conventions would still be in use in Y2K. Now we know it was, and don't know whether our software and conventions will be in use in Y10K.Guido
Oh, and another difference is I'm more confident that I'll be dead in 8000 years than they should have been that they'd be dead in 40. Hence, less scope for embarrassment even if I'm just as wrong as they were.Guido
F
4
int a, b;
b = abs(a); // will overflow if a = 0x80000000
if (b < 0) printf("overflow");  // optimized away 

(You seem to be assuming 2s complement... let's run with that)

Who says abs(a) "overflows" if a has that binary pattern (more accurately, if a is INT_MIN)? The Linux man page for abs(int) says:

Trying to take the absolute value of the most negative integer is not defined.

Not defined doesn't necessarily mean overflow.

So, your premise that b could ever be less than 0, and that's somehow a test for "overflow", is fundamentally flawed from the start. If you want to test, you can not do it on the result that may have undefined behaviour - do it before the operation instead!

If you care about this, you can use C++'s user-defined types (i.e. classes) to implement your own set of tests around the operations you need (or find a library that already does that). The language does not need inbuilt support for this as it can be implemented equally efficiently in such a library, with the resulting semantics of use unchanged. That's fundamental power is one of the great things about C++.

Fixing answered 28/7, 2011 at 9:1 Comment(0)
P
3

Ask yourself: how often do you actually need checked arithmetic? If you need it often you should write a checked_int class that overloads the common operators and encapsulate the checks into this class. Props for sharing the implementation on an Open Source website.

Better yet (arguably), use a big_integer class so that overflows can’t happen in the first place.

Perfidy answered 28/7, 2011 at 8:50 Comment(0)
S
1

Just use the correct type for b:

int a;
unsigned b = a;
if (b == (unsigned)INT_MIN) printf("overflow");  // never optimized away
else b = abs(a);

Edit: Test for overflow in C can be safely done with the unsigned type. Unsigned types just wrap around on arithmetic and signed types are safely converted to them. So you can do any test on them that you like. On modern processors this conversion is usually just a reinterpretation of a register or so, so it comes for no runtime cost.

Stamen answered 28/7, 2011 at 8:50 Comment(5)
I've never heard of UINT_MIN before - isn't that just 0?Consultant
Doesn't make sense indeed, and the cast is weird too. Besides, it's probably not the test that was intended; the 2s complement INT_MIN that causes the overflow would be equal to UINT_MAX/2 + 1. So I think the intended test was if (b==unsigned(INT_MAX). But why not the clearer if (a==INT_MAX) ?Sabayon
@A Fog, test for overflow in C can be safely done with the unsigned type. Unsigned types just wrap around on arithmetic and signed types are safely converted to them. So you can do any test on them that you like. On modern processors this conversion is usually just a reinterpretation of a register or so, so it comes for no cost.Stamen
@Kerrek, @MSalters, INT_MIN, indeed. I just left it in the way the OP put it to show where the test for overflow would go.Stamen
Agreed. THE CORRECT WAY TO CHECK FOR OVERFLOW IN C/C++ IS UNSIGNED ARITHMATIC. There is a reason why it is specified to wrap in the standard!Chert

© 2022 - 2024 — McMap. All rights reserved.