Why are unsigned integers error prone?
Asked Answered
E

8

64

I was looking at this video. Bjarne Stroustrup says that unsigned ints are error prone and lead to bugs. So, you should only use them when you really need them. I've also read in one of the question on Stack Overflow (but I don't remember which one) that using unsigned ints can lead to security bugs.

How do they lead to security bugs? Can someone clearly explain it by giving an suitable example?

Elamitic answered 22/5, 2015 at 11:11 Comment(11)
I'd argue strongly for using unsigned types. If you get loop conditions wrong, you're a bad developer. It's very simple math to make it work with unsigned integers and it feels much more natural to me that quantities are unsignedContinuous
Problem is most developers are bad...Voyles
They can certainly magnify off-by-one errors. Consider the VLT that awarded a man $2^32-1$ cents. thestar.com/news/ontario/2009/03/18/… Of course there is the a similar problem with signed numbers where the smallest is only one off from the largest, but since we often play near 0, the cliff edge is closer with unsigned numbers.Fe
Working on the old "CISC" IBM System/38, we low-level programmers preferred to use unsigned, since unsigned arithmetic would produce a exception if addition/subtraction overflowed, helping us find our bugs. Absent that feature, though, it's pretty easy to have unsigned silently screw up things pretty royally.Genova
Signed ints are also error prone. I spend an hour debugging a problem in Java when shifting a "byte" value produced weird results. It was due to promotion and sign extension. I'd rather have both and choose the right type for the job.Dulcie
@MattiVirkkunen: Beyond signed and unsigned, I'd rather have types with explicit wrapping semantics, explicit checking semantics, loose mod 2ⁿ semantics, and overflow-equals-UB semantics. Separating out different kinds of integers would make it possible to write code which is more portable, more robust, and more optimizable than is present with the types that are available today and the rules associated with them [which in many cases require smaller signed types to behave with clean wrapping semantics, but allow math on smaller unsigned types to generate Undefined Behavior].Riot
Amen, @supercat. The overflow-equals-UB semantics are often called saturating semantics, and underflow-equals-0 as well.Vaca
@DougMcClean: By "UB" I meant "Undefined Behavior". I'm not sure there are enough cases where saturating semantics are more useful than checked semantics to justify their inclusion in a language. Actually, I'm not sure how often allowing overflow to be Undefined Behavior would allow genuinely-useful optimizations which wouldn't also be permissible under loose modular-arithmetic standards (which would dictate that if int is 32 bits, calculations on int that overflow must yield values which are congruent to the correct results mod 2³², but were not required to behave as values within...Riot
...the range of the type. Among other things, loose semantics would say that given int32_t x=INT32_MAX; x++; int64_t y1=x,y2=x; a compiler would not be required to assign the same value to y1 and y2, but casting y1 and y2 both to uin32_t would be required to give the same value, i.e. (INT32_MAX + 1u). I would expect explicit checked-integer semantics could allow some very useful optimizations if the compiler were allowed to hold correct calculations beyond specified precision, and only had to trap when precision was lost. Given icheck32_t w,x,y,z;, the expression w=x+y+z;...Riot
...would only be required to yield the correct result if both x+y and (x+y)+z were representable in icheck32_t, but the compiler would be free to trap or not at its leisure if x+y was not representable but x+y+z was.Riot
The saturating semantics can sometimes be useful for signal processing applications, but I see what you mean.Vaca
I
52

One possible aspect is that unsigned integers can lead to somewhat hard-to-spot problems in loops, because the underflow leads to large numbers. I cannot count (even with an unsigned integer!) how many times I made a variant of this bug

for(size_t i = foo.size(); i >= 0; --i)
    ...

Note that, by definition, i >= 0 is always true. (What causes this in the first place is that if i is signed, the compiler will warn about a possible overflow with the size_t of size()).

There are other reasons mentioned Danger – unsigned types used here!, the strongest of which, in my opinion, is the implicit type conversion between signed and unsigned.

Imine answered 22/5, 2015 at 11:19 Comment(6)
I would accept this answer bc it's the only one that a compiler wouldn't warn aboutUnsphere
@AndyT Get a better compiler. coliru.stacked-crooked.com/a/c79fc9148dfb5f3fPulpboard
@AndyT And btw, my example actually does not get a warning, unlike the above. :)Pulpboard
@BaummitAugen: indeed :)Unsphere
Time to use operator--> ( go down to ): for (size_t i = sz; i --> 0;) ... iterates from sz-1 to 0Ibiza
This doesn't demonstrate a problem with unsigned integers. This demonstrates a problem with the code itself. Advocating avoiding the appropriate tools for a job because they can be used poorly is not doing anyone any favors. Just don't use them poorly.Trilingual
P
38

One big factor is that it makes loop logic harder: Imagine you want to iterate over all but the last element of an array (which does happen in the real world). So you write your function:

void fun (const std::vector<int> &vec) {
    for (std::size_t i = 0; i < vec.size() - 1; ++i)
        do_something(vec[i]);
}

Looks good, doesn't it? It even compiles cleanly with very high warning levels! (Live) So you put this in your code, all tests run smoothly and you forget about it.

Now, later on, somebody comes along an passes an empty vector to your function. Now with a signed integer, you hopefully would have noticed the sign-compare compiler warning, introduced the appropriate cast and not have published the buggy code in the first place.

But in your implementation with the unsigned integer, you wrap and the loop condition becomes i < SIZE_T_MAX. Disaster, UB and most likely crash!

I want to know how they lead to security bugs?

This is also a security problem, in particular it is a buffer overflow. One way to possibly exploit this would be if do_something would do something that can be observed by the attacker. They might be able to find what input went into do_something, and that way data the attacker should not be able to access would be leaked from your memory. This would be a scenario similar to the Heartbleed bug. (Thanks to ratchet freak for pointing that out in a comment.)

Pulpboard answered 22/5, 2015 at 11:18 Comment(18)
I've always felt uneasy about this alleged counter-example. It's true that just by looking myopically a the code you would think that signed integers are better here. However, this ignores the larger algorithmic issue: The algorithm clearly wants to treat the last element of the range specially. Therefore, this algorithm should have some kind of precondition or branching that actually ensures that the range has a last element! And with such branching in place, unsigned integers would work just fine.Cragsman
@KerrekSB It is probably a bit contrived, I agree, but not completely unrealistic: The vector could be a member of a class where the last element depends on the n previous one. Now, if n == 0, you do not need the last element either, so you leave the vector empty. Our algorithm at hand never handles the last element, so we have the loop I showed. But I still admit such thing never happened in any code I actually saw (probably bc I wrote most of the code I've seen so far, and I don't do such things).Pulpboard
if do_something leaks its input to the world then you have a data leak scenario (think heartbleed).Effortless
@BaummitAugen: Again, using half-open range notation, you would express such a dependency by saying that element n depends on elements [0, n), and there'd be no problem with signedness :-SCragsman
@KerrekSB First of all, I do not claim that this bug would not be fixable using unsigned integers. I also do not say that such bugs would be common in well designed/written code. However, I feel like this happening to people is certainly not off-limit, I've seen weirder stuff. This might not be the best example there is, but if you know a better one or want to proof Stroustroup wrong, I would be glad to see your answer. :)Pulpboard
Why does everybody have to use subtraction here? Why not for (std::size_t i = 0; i + 1 < vec.size(); ++i)?Escape
@SiyuanRen I used subtraction because it is wrong. The whole point of this question and answer is highlighting potential bugs. No one is trying to argue that those bugs are not fixable or avoidable. I just argue that something like this could happen, and it would be bad. So yes, you can use your code, and then have correct code. The point is that one can (kind of easily) get it wrong (like I intentionally did in my answer).Pulpboard
Again, it's bad code. Not a bad variable type. Doesn't make the case. Integers aren't error-prone. programming is error-prone.Trilingual
@Trilingual Neither Stroustrup nor anyone here argues that you cannot write correct code with unsigned types. We all just point out that there a mistakes that can (more or less) easily be made with unsigned types which would have (in this case) at least produced a compiler warning with signed types. Again, the point is not that you cannot get it right, just that you can get it wrong. It's the same thing as with new[] vs std::vector.Pulpboard
@Trilingual And of course programming is error prone. We look for tools that prevent errors (in this case signed types) because programming is error prone, and because people make mistakes.Pulpboard
@fyngyrz: On systems where the maximum length of an object is too large to fit in a signed int, but will fit in an unsigned int, optimal semantics would have been obtained by having size_t be a type which is stored in unsigned-int format, but promotes to long on rvalue conversion. Having such sizes using one word rather than two is beneficial, but the present rules which cause calculations involving them to be evaluated using modular arithmetic are far less so.Riot
@fyngyrz: IMHO, unsigned int is a perfectly fine variable type in cases where one wants to perform modular arithmetic, but it is a semantically inappropriate [not "bad"] type in cases where one is representing quantities.Riot
Here's the thing about looking "for tools that prevent errors (in this case signed types) because programming is error prone, and because people make mistakes" What that does is create weaker programmers and less powerful tools. It's a tough craft;the craftspeople in it know what they'Trilingual
oh lord, the edit for 5 minute thing got me AGAIN. c'mon stackoverflow, what is the POINT of that?Trilingual
"look(ing) for tools that prevent errors (in this case signed types)..." That creates weaker programmers and less powerful tools. Programming is, and should be, a tough craft; the craftspeople in it know what they're doing. No need to tone it down for apprentices and hobbyists. Better to learn to deal with the effective hammers and saws available. Not blunted edges and rubber mallets. "Make it easy" has gone off the rails. Just write it correctly in the first place, that's all. Building skill is always better than tearing down tools. uint? Use the right size uint. Properly. No problem.Trilingual
@Trilingual That would probably make sense if there was a sufficiently big supply of programmers that does not make mistakes and gets everything right on the first try. Unfortunately, I have yet to meet such a human being. Until we get enough perfect programmers, we need tools to prevent and detect all the mistakes that will happen.Pulpboard
Baum, it's not about perfect. Nor has it ever been. You think carpenters and other skilled workers are "perfect"? No. But do they go looking for crippled tools? Again, no. They work to get better. Good programmers do the same. Weak programmers -- who will never get very strong -- gravitate towards the crippleware. Ask them to write up an assembly language PCB router.... Hardly. Whereas a good programmer would grab the CPU manual, and hand you the router in a few days. And it would work. Bug or two? Perhaps. Not the end of the world in 99.99% of situations. Easily fixed, too, most likely.Trilingual
@Trilingual Mediocre programmers in general use tools that are difficult to use correctly and not the better tools that are easy to use, simply because they are too inept to even imagine that their choice of tools might be suboptimal.Interposition
A
28

I'm not going to watch a video just to answer a question, but one issue is the confusing conversions which can happen if you mix signed and unsigned values. For example:

#include <iostream>

int main() {
    unsigned n = 42;
    int i = -42;
    if (i < n) {
        std::cout << "All is well\n";
    } else {
        std::cout << "ARITHMETIC IS BROKEN!\n";
    }
}

The promotion rules mean that i is converted to unsigned for the comparison, giving a large positive number and a surprising result.

Afc answered 22/5, 2015 at 11:23 Comment(11)
Any reason for the downvote? I'd like to correct the answer if it's wrong.Afc
Did not downvote, but just a guess: If your compiler lets you do this, then you are compiling with too few warning flagsComputer
@Computer - your compiler must let you do this; the code is well formed and its meaning is well defined. Granted, a warning may help spot the logic error, but that's not the compiler's primary responsibility.Newmann
The result could be made more interesting by performing comparisons among unsigned n=2; int i=-1, j=1; One will then observe that n < i, i < j, and j < n are all true.Riot
The text should read "C++ IS BROKEN". @PeteBecker says “its meaning is well defined”; formally that is true, but the definition is mathematically ludicrous. Casting i to unsigned is harder to avoid if you are producing an integer result, but for a comparison it is trivial to define the language correctly. Even COBOL had has On size error, but C(++) just gives you enough rope to hang yourself! On VMS, DEC C (don’t know about ++) warns about signed/unsigned comparison/assignment, quite right too (given the broken language),Grassgreen
@Grassgreen - the rule is the same in C: the code is valid and well-defined. Sure, it would be possible to require the compiler to generate more code for mixed comparisons, but that goes against the fundamental design principle of C and C++. If you want big and slow, program in C#.Newmann
@PeteBecker I know that the compiler has to allow this - but if you are not compiling e.g. with -Wall -Wextra -pedantic -Wconversion -Wshadow -Werror (my personal minimal list of warning flags) it is your own fault really... (even without the -Werror you would get a warning. only without -Wall would you not get a warning - and that is just evil)Computer
@PeteBecker You really think the code for mixed comparisons is “big and slow”? In the very small number of applications where an extra 1 (branch on negative) or 2 (test,branch) instructions would matter, people can optimise by getting their typing right. Is the fundamental design principle “fast and wrong”?Grassgreen
@Grassgreen - you really aren't doing the OP a service by pulling this thread down a rathole.Newmann
@PeteBecker: I reckon you’re right about that – I’ll leave it at that.Grassgreen
@PeteBecker "If you want big and slow, program in C#." -- This is the kind unprofessional comment that doesn't behoove of a mature programmer like you especially with the kind of contributions you have made to C++. If anyone is taking anything down the rathole, it is you.Barely
H
12

Although it may only be considered as a variant of the existing answers: Referring to "Signed and unsigned types in interfaces," C++ Report, September 1995 by Scott Meyers, it's particularly important to avoid unsigned types in interfaces.

The problem is that it becomes impossible to detect certain errors that clients of the interface could make (and if they could make them, they will make them).

The example given there is:

template <class T>
  class Array {
  public:
      Array(unsigned int size);
  ...

and a possible instantiation of this class

int f(); // f and g are functions that return
int g(); // ints; what they do is unimportant
Array<double> a(f()-g()); // array size is f()-g()

The difference of the values returned by f() and g() might be negative, for an awful number of reasons. The constructor of the Array class will receive this difference as a value that is implicitly converted to be unsigned. Thus, as the implementor of the Array class, one can not distinguish between an erreonously passed value of -1, and a very large array allocation.

Homeopathy answered 22/5, 2015 at 15:4 Comment(2)
Wouldn't the same argument hold true for references or values? Clearly someone could wrongly pass a nullpointer to Array<double>(*ptrToSize).Manofwar
@Manofwar : You could check this. An assert(ptr != nullptr) could be sufficient there. Something like assert(size < theSizeThatIsLikelyToBeAllocated) doesn't work. Of course, one could still misuse the API with signed types. It's just harder, and the likeliest errors (that are caused by things like implicit conversions) can be covered.Homeopathy
S
5

The big problem with unsigned int is that if you subtract 1 from an unsigned int 0, the result isn't a negative number, the result isn't less than the number you started with, but the result is the largest possible unsigned int value.

unsigned int x = 0;
unsigned int y = x - 1;

if (y > x) printf ("What a surprise! \n");

And this is what makes unsigned int error prone. Of course unsigned int works exactly as it is designed to work. It's absolutely safe if you know what you are doing and make no mistakes. But most people make mistakes.

If you are using a good compiler, you turn on all the warnings that the compiler produces, and it will tell you when you do dangerous things that are likely to be mistakes.

Sabin answered 22/5, 2015 at 11:59 Comment(3)
A nastier gotcha is that given uint32_t x,y,z; expressions like x-y > z will have very different meanings on 32-bit and 64-bit systems.Riot
@Riot afaict it will have the same result on LP32, LP64 and LLP64 systems. Only ILP64 systems will differ.Engstrom
@plugwash: I should have clarified--on systems where int is 64 bits. IMHO the Standard would have benefited from defining non-promoting types whose behavior would be consistent on all compilers that accepted code using them. Operations that use wrap32_t should either yield a result of that type when possible, or refuse compilation altogether (e.g. because the compiler doesn't support the required semantics, or because e.g. code is trying to add a wrap16_t and wrap32_t together--an action that cannot possibly yield a result satisfying both constraints).Riot
R
3

The problem with unsigned integer types is that depending upon their size they may represent one of two different things:

  1. Unsigned types smaller than int (e.g. uint8) hold numbers in the range 0..2ⁿ-1, and calculations with them will behave according to the rules of integer arithmetic provided they don't exceed the range of the int type. Under present rules, if such a calculation exceeds the range of an int, a compiler is allowed to do anything it likes with the code, even going so far as to negate the laws of time and causality (some compilers will do precisely that!), and even if the result of the calculation would be assigned back to an unsigned type smaller than int.
  2. Unsigned types unsigned int and larger hold members of the abstract wrapping algebraic ring of integers congruent mod 2ⁿ; this effectively means that if a calculation goes outside the range 0..2ⁿ-1, the system will add or subtract whatever multiple of 2ⁿ would be required to get the value back in range.

Consequently, given uint32_t x=1, y=2; the expression x-y may have one of two meanings depending upon whether int is larger than 32 bits.

  1. If int is larger than 32 bits, the expression will subtract the number 2 from the number 1, yielding the number -1. Note that while a variable of type uint32_t can't hold the value -1 regardless of the size of int, and storing either -1 would cause such a variable to hold 0xFFFFFFFF, but unless or until the value is coerced to an unsigned type it will behave like the signed quantity -1.
  2. If int is 32 bits or smaller, the expression will yield a uint32_t value which, when added to the uint32_t value 2, will yield the uint32_t value 1 (i.e. the uint32_t value 0xFFFFFFFF).

IMHO, this problem could be solved cleanly if C and C++ were to define new unsigned types [e.g. unum32_t and uwrap32_t] such that a unum32_t would always behave as a number, regardless of the size of int (possibly requiring the right-hand operation of a subtraction or unary minus to be promoted to the next larger signed type if int is 32 bits or smaller), while a wrap32_t would always behave as a member of an algebraic ring (blocking promotions even if int were larger than 32 bits). In the absence of such types, however, it's often impossible to write code which is both portable and clean, since portable code will often require type coercions all over the place.

Riot answered 22/5, 2015 at 17:45 Comment(10)
A thoroughly confusing answer. Are you saying that the wrapping and promotion rules for unsigned integers depend on their size as well as the size of "base" int?Plainspoken
@MartinBa: yes, that's what he's saying. Since you understood it I guess it wasn't confusing, but it might be surprising to some :-) Integer types smaller than int are a complete PITA, unsigned ones especially so.Madras
@MartinBa: The answer is confusing because the underlying rules are. I've added a little more to the first couple points; does that help.Riot
@SteveJessop: Do you agree with my characterization of uint32_t as behaving like one of two very different types depending upon the size of int? I wouldn't call integer types smaller than int a PITA in cases where one is expecting them to behave like numbers within their legitimate range. What's problematic are types that are smaller than int on some platforms but not others; the only advantage I see to having the behavior change depending upon int size is that it allows existing code written for particular int sizes to work on systems where int size is as expected.Riot
cheers. its clearer now. the second 1 bullet is still weird though: its an uint, how can you get -1 ??Plainspoken
@MartinBa: Unsigned types smaller than int will get promoted to signed int whenever any calculations are performed on them. On common 32-bit machines, this is most widely observable with types uint8_t and uint16_t. Promotion to int is often useful when the unsigned values represent quantities, but can be disastrous in cases where they represent things that are supposed to wrap. Note that given uint16_t x=65533; x*=x; a compiler for a system where unsigned int is 16 bits, or where it is larger than 32 bits, must set x=9, but on a system where unsigned is 17 to 32 bits...Riot
...the Standard allows compilers to negate the laws of time and causality instead (the numerical product is 4294574089; dividing by 65536 yields 65530 remainder 9, so performing the computation as uint16_t, or performing it in an int type large enough to hold 4294574089 and storing to uint16_t, will yield 9). If the value gets promoted to an int type that can't hold 4294574089, though, the overflow may have consequences far beyond the value of x.Riot
Cheers. My personal take on this is to try to make sure to also have wrapping signed ints - MSVC has it anyway I think, and gcc has a switch for it, so even if the standard calls it out as UB, all(?) implementations seem to offer a way to have it defined.Plainspoken
@MartinBa: While you are correct that nearly all implementations in practice offer a "wrapping signed int" option, there are a couple of weaknesses: (1) There is no standard means by which a C program can request such semantics, or refuse compilation if the compiler cannot provide them; (2) Requiring that integer values (whether signed or unsigned) wrap precludes many optimizations which are often useful (though sometimes disastrous). I would really like to see C offer a variety of different kinds of integers with different semantics chosen to offer many good optimization opportunities...Riot
...but minimal surprises. The more flexibility compilers offer programmers with regard to saying what they care about, the more opportunities programmers can provide to aggressively optimize things that won't impact correctness.Riot
E
3

Numeric conversion rules in C and C++ are a byzantine mess. Using unsigned types exposes yourself to that mess to a much greater extent than using purely signed types.

Take for example the simple case of a comparison between two variables, one signed and the other unsigned.

  • If both operands are smaller than int then they will both be converted to int and the comparison will give numerically correct results.
  • If the unsigned operand is smaller than the signed operand then both will be converted to the type of the signed operand and the comparison will give numerically correct results.
  • If the unsigned operand is greater than or equal in size to the signed operand and also greater than or equal in size to int then both will be converted to the type of the unsigned operand. If the value of the signed operand is less than zero this will lead to numerically incorrect results.

To take another example consider multiplying two unsigned integers of the same size.

  • If the operand size is greater than or equal to the size of int then the multiplication will have defined wraparound semantics.
  • If the operand size is smaller than int but greater than or equal to half the size of int then there is the potential for undefined behaviour.
  • If the operand size is less than half the size of int then the multiplication will produce numerically correct results. Assigning this result back to a variable of the original unsigned type will produce defined wraparound semantics.
Engstrom answered 22/5, 2017 at 12:7 Comment(0)
K
-3

In addition to range/warp issue with unsigned types. Using mix of unsigned and signed integer types impact significant performance issue for processor. Less then floating point cast, but quite a lot to ignore that. Additionally compiler may place range check for the value and change the behavior of further checks.

Kato answered 22/5, 2015 at 13:31 Comment(4)
Could you elaborate what significant performance issues, and give example code?Benevolence
If you cast a unsigned to int or vice versa the binary representations identify exactly. So there is no overhead for the CPU when you cast the one to the other.Computer
(provided the C++ implementation uses two's complement representation for negative integers)Countercharge
@Computer binary layout not the same. Unsigned value occupy all the bit space (8,16,32,64), but signed has the most significant bit for sign which reduce value space by 1 bit. In case of SIMD instructions there is no one that perform calculation on both type in one instruction. Conversion with saturation take place, that is the performance fall.Kato

© 2022 - 2024 — McMap. All rights reserved.