Is using an unsigned rather than signed int more likely to cause bugs? Why?
Asked Answered
C

7

85

In the Google C++ Style Guide, on the topic of "Unsigned Integers", it is suggested that

Because of historical accident, the C++ standard also uses unsigned integers to represent the size of containers - many members of the standards body believe this to be a mistake, but it is effectively impossible to fix at this point. The fact that unsigned arithmetic doesn't model the behavior of a simple integer, but is instead defined by the standard to model modular arithmetic (wrapping around on overflow/underflow), means that a significant class of bugs cannot be diagnosed by the compiler.

What is wrong with modular arithmetic? Isn't that the expected behaviour of an unsigned int?

What kind of bugs (a significant class) does the guide refer to? Overflowing bugs?

Do not use an unsigned type merely to assert that a variable is non-negative.

One reason that I can think of using signed int over unsigned int, is that if it does overflow (to negative), it is easier to detect.

Capparidaceous answered 3/8, 2018 at 17:57 Comment(19)
Try to do unsigned int x = 0; --x; and see what x becomes. Without limit checks, the size could suddenly get some unexpected value that could easily lead to UB.Highstepper
At least unsigned overflow has a well-defined behavior and produces expected results.Maegan
On an unrelated (to your question but not to Google styleguides) note, if you search a little you will find some (sometimes rightfully) criticism of the Google styleguides. Don't take them as gospel.Highstepper
On the other hand, int overflow and underflow are UB. You are less likely to experience a situation where an int would try to express a value it can't than a situation that decrements an unsigned int below zero but the kind of people that would be surprised by the behavior of unsigned int arithmetic are the kind of people that could also write code that would cause int overflow related UB like using a < a + 1 to check for overflow.Doloritas
Seconding SomeProgrammerDude here... the Google Style Guide is a guide for Google in-house development, and some of its recommendations might hold water for their specific environment... but make for poor general advice.Dalmatia
@Someprogrammerdude I just had to quickly skim the linked section to take issue with the style guide. It says to use the fixed-width types, with the only exception being the use of int. Ignoring that fixed width types are not guaranteed to exist on a given system, I took exception with "You should assume that an int is at least 32 bits"Argufy
Well, you know what they say about assumptions. You do know right? Oh crap. I made an assumption.Molini
@user7586189: For an alternative guide to good C++, Sutter / Alexandrescu, "C++ Coding Standards". But, the best guide really, is -- use common sense, and never stop learning. ;-)Dalmatia
I think this question may fall into the opinion based category, as it's premised on int being safer than unsigned int, which isn't a universally agreed upon opinion.Doloritas
Like every religion, style guides and coding standards which are made by theoreticians have to be believed. There is no evidence. If you want to know why they think it is, ask them. This is not google (at the time of writing this at least).Dean
If unsigned integer overflows, it's well defined. If signed integer overflows, it's undefined behaviour. I prefer well defined behaviour, but if your code can't handle overflowed values, you are lost with both. Difference is: for signed you are already lost for the overflowing operation, for unsigned in the following code. The only point I agree is if you need negative values, an unsigned integer type is the wrong choice - obviously.Dean
@toohonestforthissite can I add back the tag of "C"? This question is not restricted to C++, though the style guide is in C++.Capparidaceous
I tend to think of a good portion of the Google C++ Style Guide (and non-Google one's that cover the same kind of ground) as an indicator that the team is using the wrong language for their problem at hand. C++ is a tool, it has some sharp edges. It's not the only tool (programming language) available, and its not the best for all problems (there is no best for all problems; there are only suitable or less suitable for given problems).Kiln
@Someprogrammerdude w.r.t. fixed-size types they probably want to say that you shouldn't use something like long long if you want a 64-bit integer and should use something like int64_t instead, which makes sense.Airborne
If you're writing your own general-purpose vector class, then you still should allow size to be size_t and provide access with a size_t index, otherwise you are unreasonably limiting the possible size of your array. But otherwise, when you know your collections don't have billions of elements, having int for sizes and indices is quite convenient. It allows to do computation on indices, where intermediate results can be negative, without casts and compiler warnings.Airborne
@VTT It's true, so you could argue that unsigned ints are safer. However, if we assume that the user doesn't intend for the module 2^n behavior (often this is the case), the problem is that it is not legal for the compiler to insert checks for overflow or underflow. For the case of signed integers, since overflow or underflow is UB, you can have an undefined behavior sanitizer which checks for it, making it easier to catch overflow or underflow bugs. TL;DR: unsigned integer arithmetic being well defined means that it's harder to catch overflow and underflow bugs. This can be a bad thing.Quoin
Somehow I feel: the signed vs. unsigned discussion is the new Big Endian vs. Little Endian. But, actually, Jonathan Swift already gave the answer: There is no solution which will satisfy everybody. Funny enough: He let Gulliver provide a solution which satisfied nobody. (When I read this as child I believed it were only a funny story. Meanwhile I learned it provided deep wisdom about real life problems.) ;-)Breeding
try to use iterators and containers rather than pointers and sizes, try not to mix signedness, and try to avoid unsigned types (except for representing bitfields or modular arithmetic)Melancholic
@AbdulAhad Ideally unsigned integers should only be used where (mod 2^n) is wanted and when the sign bit is problematic (bit fields). But the existing C/C++ baggage means that sizeof, size_t and many other types are unsigned. We have to live with all these annoying unsigned integers. Sad!Detrition
C
78

Some of the answers here mention the surprising promotion rules between signed and unsigned values, but that seems more like a problem relating to mixing signed and unsigned values, and doesn't necessarily explain why signed variables would be preferred over unsigned outside of mixing scenarios.

In my experience, outside of mixed comparisons and promotion rules, there are two primary reasons why unsigned values are bug magnets as follows.

Unsigned values have a discontinuity at zero, the most common value in programming

Both unsigned and signed integers have a discontinuities at their minimum and maximum values, where they wrap around (unsigned) or cause undefined behavior (signed). For unsigned these points are at zero and UINT_MAX. For int they are at INT_MIN and INT_MAX. Typical values of INT_MIN and INT_MAX on system with 4-byte int values are -2^31 and 2^31-1, and on such a system UINT_MAX is typically 2^32-1.

The primary bug-inducing problem with unsigned that doesn't apply to int is that it has a discontinuity at zero. Zero, of course, is a very common value in programs, along with other small values like 1,2,3. It is common to add and subtract small values, especially 1, in various constructs, and if you subtract anything from an unsigned value and it happens to be zero, you just got a massive positive value and an almost certain bug.

Consider code iterates over all values in a vector by index except the last0.5:

for (size_t i = 0; i < v.size() - 1; i++) { // do something }

This works fine until one day you pass in an empty vector. Instead of doing zero iterations, you get v.size() - 1 == a giant number1 and you'll do 4 billion iterations and almost have a buffer overflow vulnerability.

You need to write it like this:

for (size_t i = 0; i + 1 < v.size(); i++) { // do something }

So it can be "fixed" in this case, but only by carefully thinking about the unsigned nature of size_t. Sometimes you can't apply the fix above because instead of a constant one you have some variable offset you want to apply, which may be positive or negative: so which "side" of the comparison you need to put it on depends on the signedness - now the code gets really messy.

There is a similar issue with code that tries to iterate down to and including zero. Something like while (index-- > 0) works fine, but the apparently equivalent while (--index >= 0) will never terminate for an unsigned value. Your compiler might warn you when the right hand side is literal zero, but certainly not if it is a value determined at runtime.

Counterpoint

Some might argue that signed values also have two discontinuities, so why pick on unsigned? The difference is that both discontinuities are very (maximally) far away from zero. I really consider this a separate problem of "overflow", both signed and unsigned values may overflow at very large values. In many cases overflow is impossible due to constraints on the possible range of the values, and overflow of many 64-bit values may be physically impossible). Even if possible, the chance of an overflow related bug is often minuscule compared to an "at zero" bug, and overflow occurs for unsigned values too. So unsigned combines the worst of both worlds: potentially overflow with very large magnitude values, and a discontinuity at zero. Signed only has the former.

Many will argue "you lose a bit" with unsigned. This is often true - but not always (if you need to represent differences between unsigned values you'll lose that bit anyways: so many 32-bit things are limited to 2 GiB anyways, or you'll have a weird grey area where say a file can be 4 GiB, but you can't use certain APIs on the second 2 GiB half).

Even in the cases where unsigned buys you a bit: it doesn't buy you much: if you had to support more than 2 billion "things", you'll probably soon have to support more than 4 billion.

Logically, unsigned values are a subset of signed values

Mathematically, unsigned values (non-negative integers) are a subset of signed integers (just called _integers).2. Yet signed values naturally pop out of operations solely on unsigned values, such as subtraction. We might say that unsigned values aren't closed under subtraction. The same isn't true of signed values.

Want to find the "delta" between two unsigned indexes into a file? Well you better do the subtraction in the right order, or else you'll get the wrong answer. Of course, you often need a runtime check to determine the right order! When dealing with unsigned values as numbers, you'll often find that (logically) signed values keep appearing anyways, so you might as well start of with signed.

Counterpoint

As mentioned in footnote (2) above, signed values in C++ aren't actually a subset of unsigned values of the same size, so unsigned values can represent the same number of results that signed values can.

True, but the range is less useful. Consider subtraction, and unsigned numbers with a range of 0 to 2N, and signed numbers with a range of -N to N. Arbitrary subtractions result in results in the range -2N to 2N in _both cases, and either type of integer can only represent half of it. Well it turns out that the region centered around zero of -N to N is usually way more useful (contains more actual results in real world code) than the range 0 to 2N. Consider any of typical distribution other than uniform (log, zipfian, normal, whatever) and consider subtracting randomly selected values from that distribution: way more values end up in [-N, N] than [0, 2N] (indeed, resulting distribution is always centered at zero).

64-bit closes the door on many of the reasons to use unsigned values as numbers

I think the arguments above were already compelling for 32-bit values, but the overflow cases, which affect both signed and unsigned at different thresholds, do occur for 32-bit values, since "2 billion" is a number that can exceeded by many abstract and physical quantities (billions of dollars, billions of nanoseconds, arrays with billions of elements). So if someone is convinced enough by the doubling of the positive range for unsigned values, they can make the case that overflow does matter and it slightly favors unsigned.

Outside of specialized domains 64-bit values largely remove this concern. Signed 64-bit values have an upper range of 9,223,372,036,854,775,807 - more than nine quintillion. That's a lot of nanoseconds (about 292 years worth), and a lot of money. It's also a larger array than any computer is likely to have RAM in a coherent address space for a long time. So maybe 9 quintillion is enough for everybody (for now)?

When to use unsigned values

Note that the style guide doesn't forbid or even necessarily discourage use of unsigned numbers. It concludes with:

Do not use an unsigned type merely to assert that a variable is non-negative.

Indeed, there are good uses for unsigned variables:

  • When you want to treat an N-bit quantity not as an integer, but simply a "bag of bits". For example, as a bitmask or bitmap, or N boolean values or whatever. This use often goes hand-in-hand with the fixed width types like uint32_t and uint64_t since you often want to know the exact size of the variable. A hint that a particular variable deserves this treatment is that you only operate on it with with the bitwise operators such as ~, |, &, ^, >> and so on, and not with the arithmetic operations such as +, -, *, / etc.

    Unsigned is ideal here because the behavior of the bitwise operators is well-defined and standardized. Signed values have several problems, such as undefined and unspecified behavior when shifting, and an unspecified representation.

  • When you actually want modular arithmetic. Sometimes you actually want 2^N modular arithmetic. In these cases "overflow" is a feature, not a bug. Unsigned values give you what you want here since they are defined to use modular arithmetic. Signed values cannot be (easily, efficiently) used at all since they have an unspecified representation and overflow is undefined.


0.5 After I wrote this I realized this is nearly identical to Jarod's example, which I hadn't seen - and for good reason, it's a good example!

1 We're talking about size_t here so usually 2^32-1 on a 32-bit system or 2^64-1 on a 64-bit one.

2 In C++ this isn't exactly the case because unsigned values contain more values at the upper end than the corresponding signed type, but the basic problem exists that manipulating unsigned values can result in (logically) signed values, but there is no corresponding issue with signed values (since signed values already include unsigned values).

Club answered 4/8, 2018 at 4:37 Comment(13)
I agree with everything you've posted, but "64 bits should be enough for everyone" sure seems way too close to "640k ought to be enough for everyone".Malady
@Andrew - yup, I chose my words carefully :).Club
Well, you might want to work a bit on equating defined wrap-around for unsigned types with full undefined behavior for signed types.Soembawa
"64-bit closes the door on unsigned values" --> Disagree. Some integer programming tasks are simple not a case of counting and do not need negative values yet need power-of-2 widths: Passwords, encryption, bit graphics, benefit with unsigned math. Many ideas here point out why code could use signed math when able, yet falls very short of making unsigned type useless and closing the door on them.And
@chux - the title of that section is probably a bit overstated: I really mean what I say in the following text, that it closes the loophole in the arguments above for most types of "counting" and "indexing" values. Unsigned absolutely still has its place for the "bag of bits" type scenarios, where you aren't doing math on it. Can you elaborate a bit on the integer programming case? Are those values used with mathematical expressions? There are certainly exceptions to every rule, and integer programming may be a good one (too obscure to reverse this type of style guide entry though).Club
@chux - I added a section on why you might use unsigned, including the "bag of bits" scenario (encryption, graphics in your list) and also when you want modular arithmetic. I'm not sure if that covers your integer programming example. BTW, I had assumed you were referring to "Integer Programming" as in ILP (e.g., wikipedia) but now I see you might have just been referring to "programming tasks that involve integers". Can you confirm?Club
@AndrewHenle 640K?, Heck, we went to the moon on 32K, that was the same amount of memory I had in my HP-28S -- who would ever need any more.... See Apollo Guidance ComputerDenier
@Soembawa - yeah, I left it out since it seems more or less like a tie. On the side of unsigned mod-2^N wraparound you at least have a defined behavior and no unexpected "optimizations" will kick in. On the side of UB, any overflow during arithmetic on unsigned or signed is probably a bug in the overwhelming majority of cases (outside of the few who expect mod arithmetic), and compilers provide options like -ftrapv that can catch all signed overflow, but not all unsigned overflow. The performance impact isn't too bad, so it might be reasonable to compile with -ftrapv in some scenarios.Club
@Club That's about the age of the universe measured in nanoseconds. I doubt that. The universe is about 13.7*10^9 years old which is 4.32*10^17 s or 4.32*10^26 ns. To represent 4.32*10^26 as int you need at least 90 bits. 9,223,372,036,854,775,807 ns would only be about 292.5 years.Stilu
@Stilu - yup, I forgot to convert from seconds to years - oops! The real number is a much more modest 292 years. So it is certainly plausible that on some type of system built for extreme longevity you'd exceed a 63-bit or 64-bit quantity, and 64-bit nanoseconds values are already innapropriate for point-in-time representation if you need to represent dates more than hundred or so years in the past and future. Also, after some research, my "all the RAM in the world" claim seems to be totally BS, as well, so I modified that to be RAM in one coherently-addressed computer.Club
Did you mean 'unsigned' here: 64-bit closes the door on many of the reasons to use signed values as numbers?Bhili
@Bhili - yes, I did, thanks! Fixed now.Club
@Club generally its easier for compilers to [reason and prove things about signed values](youtu.be/1hm5ZVmBEvo?t=778 because OF is UB.Dissimilitude
B
37

As stated, mixing unsigned and signed might lead to unexpected behaviour (even if well defined).

Suppose you want to iterate over all elements of vector except for the last five, you might wrongly write:

for (int i = 0; i < v.size() - 5; ++i) { foo(v[i]); } // Incorrect
// for (int i = 0; i + 5 < v.size(); ++i) { foo(v[i]); } // Correct

Suppose v.size() < 5, then, as v.size() is unsigned, s.size() - 5 would be a very large number, and so i < v.size() - 5 would be true for a more expected range of value of i. And UB then happens quickly (out of bound access once i >= v.size())

If v.size() would have return signed value, then s.size() - 5 would have been negative, and in above case, condition would be false immediately.

On the other side, index should be between [0; v.size()[ so unsigned makes sense. Signed has also its own issue as UB with overflow or implementation-defined behaviour for right shift of a negative signed number, but less frequent source of bug for iteration.

Braziel answered 3/8, 2018 at 18:36 Comment(16)
While I myself use signed numbers whenever I can, I don't think that this example is strong enough. Someone who uses unsigned numbers for a long time, surely knows this idiom: instead of i<size()-X, one should write i+X<size(). Sure, it's a thing to remember, but it is not that hard to got accustomed to, in my opinion.Paradis
@geza: I already encounter this issue, but if you have better example, share. I personally prefer unsigned for size, but I understand the reason of the guideline which is also done to avoid mistake from "beginners".Braziel
What you are saying is basically one has to know the language and the coercion rules between types. I don't see how this changes whether one uses signed or unsigned as the question asks. Not that I recommend using signed at all if there is no need for negative values. I agree with @geza, only use signed when necessary. This makes the google guide questionable at best. Imo it's bad advice.Dean
@toohonestforthissite The point is the rules are arcane, silent and major causes of bugs. Using exclusively signed types for arithmetic relieves you of the issue. BTW using unsigned types for the purpose of enforcing positive values is one of the worst abuse for them.Albur
Related: youtube.com/watch?v=XrE-RfAYrzs. TL;DW subtractions with unsigned numbers are naughty and cause security bugs.Christean
Thankfully, modern compilers and IDEs give warnings when mixing signed and unsigned numbers in an expression.Airborne
Just to clarify, when you write "between [a; b[" you mean "in the range from a to b, including a but not including b", right? (I haven't seen that notation very much, and I wouldn't be at all surprised if others didn't understand it, either)Yaw
@PasserBy: If you call them arcane, you have to add the integer promotions and the UB for overflow of signed types arcane, too. And the very common sizeof operator returns an unsigned anyway, so you do have to know about them. Said that: if you don't want to learn the language details, just don't use C or C++! Considering google promotes go, maybe that#s exactly their goal. The days of "don't be evil" are long gone …Dean
@YetAnotherUser: Onl if you don't understand the language. They are well defined, as opposed to certain behaviour of signed types. sigh I wish people knew a language before discussing about language features.Dean
@Jarod42: That's was my point. Most programmers know about this issue (like you do as well), so it is not a strong point. Sure, beginners will make this mistake, but after some time, they learn it. It's just one of the thousand rules that C/C++ programmer has to keep in mind (Note: I use signed for size, I have my own containers which returns signed for size, so I don't have any problems with signed/unsigned comparison. The only problem is that sizeof returns unsigned: #47777471).Paradis
Guidelines are there to have consistent way to write code (so easier to read) and to help to avoid mistakes, Depending of team size and its "level", and code legacy, some rules apply more or less.Braziel
@Jarod42: Good developers of any area know the rules they have to follow and those which are plain nonsense and everything in-between. A good coding style respects this and leaves some freedom for them. Much more relevant is a good team structure and some way of peer working. Peer review is a very good thing and safes a lot of time for debugging/testing, but most companies didn't accept it and prefer nonsense automated (unit)tests.Dean
I realized my initial example is almost identical to yours, down to the choice of vector and the variable name v. You'll have to believe me that I didn't copy, it's just a good example that I've "used" before, including "using it" by introducing this bug in code I've written.Club
So there are about a dozen good developers in the world? No, honestly, good developers have developped good habits and they know what to stay away from. That's why I'm sure 90% of the experienced developers (including basically all high-profile programmers on the internet) will tell you to just use signed whenever possible for arithmetic, which is most pedestrian code. Use unsigned for bit twiddling (to avoid lots of signed UB issues) and storage (especially unsigned char).Actress
@toohonestforthissite Check the accepted answer above, which was written person who has overcome self deception and accepted real world evidence in order to avoid making mistakesActress
And to the argument that sizeof is unsigned so we have to use unsigned values also for arithmetic: You can just cast a sizeof expression to an int or assign it to an int variable or pass it to a function that takes an int, and the compiler will not issue a warning because it notices that (of course) the value fits inside int just fine. It's unfortunate that sizeof is unsigned, and by know almost everybody agrees that it is, but it is not the end of the world.Actress
P
21

One of the most hair-raising examples of an error is when you MIX signed and unsigned values:

#include <iostream>
int main()  {
    auto qualifier = -1 < 1u ? "makes" : "does not make";
    std::cout << "The world " << qualifier << " sense" << std::endl;
}

The output:

The world does not make sense

Unless you have a trivial application, it's inevitable you'll end up with either dangerous mixes between signed and unsigned values (resulting in runtime errors) or if you crank up warnings and make them compile-time errors, you end up with a lot of static_casts in your code. That's why it's best to strictly use signed integers for types for math or logical comparison. Only use unsigned for bitmasks and types representing bits.

Modeling a type to be unsigned based on the expected domain of the values of your numbers is a Bad Idea. Most numbers are closer to 0 than they are to 2 billion, so with unsigned types, a lot of your values are closer to the edge of the valid range. To make things worse, the final value may be in a known positive range, but while evaluating expressions, intermediate values may underflow and if they are used in intermediate form may be VERY wrong values. Finally, even if your values are expected to always be positive, that doesn't mean that they won't interact with other variables that can be negative, and so you end up with a forced situation of mixing signed and unsigned types, which is the worst place to be.

Promiscuity answered 3/8, 2018 at 18:45 Comment(16)
Modeling a type to be unsigned based on the expected domain of the values of your numbers is a Bad Idea *if you don't treat implicit conversions as warnings and are too lazy to use proper type casts.* Modeling your types on their expected valid values is completely reasonable, just not in C/C++ with built-in types.Breton
@villasv, " just not in C/C++ with built-in types" interesting. Could you elaborate on that please?Capparidaceous
@Capparidaceous It's a good practice to make invalid data impossible to instantiate, so having positive-only variables for sizes is perfectly reasonable. But you can't fine tune C/C++ built-in types to disallow by default bad casts like the one in this answer and the validity ends up being responsibility of someone else. If you're in a language with stricter casts (even between built-ins), expected-domain modeling is a pretty good idea.Breton
Note, I did mention cranking up warnings and setting them to errors, but not everyone does. I still disagree @Breton with your statement about modeling values. By choosing unsigned, you are ALSO implicitly modeling every other value it may come into contact with without having much foresight of what that will be. And almost certainly getting it wrong.Promiscuity
@ChrisUzdavinis you're not modeling future interactions if you disallow implicit casts by default. You're forbidding random future interactions, which is in most general cases a good thing because types shouldn't be interacting with each other without explicit design. Note that I did upvote and agree to the most part, except that modeling with domain in mind is a bad idea. It's not, the reasons shown here apply specifically to C/C++ built-ins.Breton
Modeling with the domain in mind is a good thing. Using unsigned to model the domain is NOT. (Signed vs unsigned should be chosen based on types of usage, not range of values, unless it's impossible to do otherwise.)Promiscuity
@Breton A warning on any implicit conversion? That's crazyDetrition
Once your codebase has a mix of signed and unsigned values, when you turn up warnings and promote them to errors, the code ends up littered with static_casts to make the conversions explicit (because the math still needs to be done.) Even when correct, it's error-prone, harder to work with, and harder to read.Promiscuity
I know of at least one style guide that require the use of unsigneds as the variable in for loops. The thinking was that you couldn't then underflow an array you'd no doubt be indexing. I didn't like it - not every for loop indexes an array, and unsigned wrap around was just as likely to result in indexing non-existent memory as calculating a negative index.Herriot
Very useful analysis and suggestions about the mix of signed and unsigned values. (except for the example:-) Chose BeeOnRope's answer mainly because it is a more general summery.Capparidaceous
@chqrlie: Especially when you consider that a comparison like that will behave differently in ILP32 and LLP64 (only long longs and maybe pointers are 64 bits) and LP64 dialects (longs and pointers are 64 bits).Croatia
@supercat: That's exactly what I hinted at... the result of -1L < 1U is implementation defined while -1 < 1U is defined and somewhat counterintuitive.Ferrand
@chqrlie: Things get even more fun if sizeof or pointer-difference expressions are included in the mix. Interestingly, on many 16-bit platforms where ptrdiff_t was 16 bits, p1-p2 >= 0x8000 would correctly identify whether the pointers were 32769 or more bytes apart, but p1-p2 >= 32768 would not, and nor would p1-p2 > 0x7FFF. IMHO, C would be a nicer language if the Standard had deprecated many kinds of operations involving mixed signed and unsigned types. and neither required compilers to accept them nor made them constraint violations.Croatia
@chqrlie: Even if it would have been awkward to describe in standardese, I would have like to have seen the Standard encourage compilers to accept operations involving mixed signed and unsigned types in cases where evaluation as unsigned or signed would yield equivalent behavior, and reject mixed-signedness operations in other cases. It's easier to add a cast in cases where a compiler is squawking than in cases where a code occasionally acts goofy because of overflow on values that got promoted to signed type before computations whose result would be placed into an unsigned container.Croatia
@Croatia From Herb Sutter's blog, "For example, we are now on a path to pursue deprecating or outright removing the ability to compare two unrelated enumerations (which is just plain suspicious code), and to require -1 < 0u to give the mathematically correct answer even when in the worst case that means generating two comparison machine instructions instead of just one (any code that does such ill-advised things would change from being a correctness problem into “just” a one-extra-instruction performance problem)."Promiscuity
@ChrisUzdavinis: What's needed to actually solve the problem is to have a set of "number types" and a set of "algrebraic ring" types, in addition to the old types that sometimes behave as numbers and sometimes as members of an algebraic ring.Croatia
A
12

Why is using an unsigned int more likely to cause bugs than using a signed int?

Using an unsigned type is not more likely to cause bugs than using a signed type with certain classes of tasks.

Use the right tool for the job.

What is wrong with modular arithmetic? Isn't that the expected behaviour of an unsigned int?
Why is using an unsigned int more likely to cause bugs than using a signed int?

If the task if well-matched: nothing wrong. No, not more likely.

Security, encryption, and authentication algorithm count on unsigned modular math.

Compression/decompression algorithms too as well as various graphic formats benefit and are less buggy with unsigned math.

Any time bit-wise operators and shifts are used, the unsigned operations do not get messed up with the sign-extension issues of signed math.


Signed integer math has an intuitive look and feel readily understood by all including learners to coding. C/C++ was not targeted originally nor now should be an intro-language. For rapid coding that employs safety nets concerning overflow, other languages are better suited. For lean fast code, C assumes that coders knows what they are doing (they are experienced).

A pitfall of signed math today is the ubiquitous 32-bit int that with so many problems is well wide enough for the common tasks without range checking. This leads to complacency that overflow is not coded against. Instead, for (int i=0; i < n; i++) int len = strlen(s); is viewed as OK because n is assumed < INT_MAX and strings will never be too long, rather than being full ranged protected in the first case or using size_t, unsigned or even long long in the 2nd.

C/C++ developed in an era that included 16-bit as well as 32-bit int and the extra bit an unsigned 16-bit size_t affords was significant. Attention was needed in regard to overflow issues be it int or unsigned.

With 32-bit (or wider) applications of Google on non-16 bit int/unsigned platforms, affords the lack of attention to +/- overflow of int given its ample range. This makes sense for such applications to encourage int over unsigned. Yet int math is not well protected.

The narrow 16-bit int/unsigned concerns apply today with select embedded applications.

Google's guidelines apply well for code they write today. It is not a definitive guideline for the larger wide scope range of C/C++ code.


One reason that I can think of using signed int over unsigned int, is that if it does overflow (to negative), it is easier to detect.

In C/C++, signed int math overflow is undefined behavior and so not certainly easier to detect than defined behavior of unsigned math.


As @Chris Uzdavinis well commented, mixing signed and unsigned is best avoided by all (especially beginners) and otherwise coded carefully when needed.

And answered 4/8, 2018 at 3:53 Comment(3)
You make a good point that an int doesn't model the behavior of an "actual" integer either. Undefined behavior on overflow is not how a mathematician thinks of integers: they're no possibility of "overflow" with an abstract integer. But these are machine storage units, not a math guy's numbers.Liverpool
@tchrist: Unsigned behavior on overflow is how a mathematician would think about an abstract algebraic ring of integers congruent mod (type_MAX+1).Croatia
If you're using gcc, signed int overflow is easy to detect (with -ftrapv), while unsigned "overflow" is hard to detect.Tribesman
B
5

I have some experience with Google's style guide, AKA the Hitchhiker's Guide to Insane Directives from Bad Programmers Who Got into the Company a Long Long Time Ago. This particular guideline is just one example of the dozens of nutty rules in that book.

Errors only occur with unsigned types if you try to do arithmetic with them (see Chris Uzdavinis example above), in other words if you use them as numbers. Unsigned types are not intended to be used to store numeric quantities, they are intended to store counts such as the size of containers, which can never be negative, and they can and should be used for that purpose.

The idea of using arithmetical types (like signed integers) to store container sizes is idiotic. Would you use a double to store the size of a list, too? That there are people at Google storing container sizes using arithmetical types and requiring others to do the same thing says something about the company. One thing I notice about such dictates is that the dumber they are, the more they need to be strict do-it-or-you-are-fired rules because otherwise people with common sense would ignore the rule.

Bharal answered 4/8, 2018 at 20:23 Comment(7)
While I get your drift, the blanket statements made would virtually eliminate bitwise operations if unsigned types could only hold counts and not be used in arithmetic. So the "Insane Directives from Bad Programmers" part makes more sense.Denier
@DavidC.Rankin Please don't take it as a "blanket" statement. Obviously there are multiple legitimate uses for unsigned integers (like storing bitwise values).Bharal
Yes, yes -- I didn't, that's why I said "I get your drift."Denier
Counts are often compared to things which have arithmetic done on them, such as indices. The way C handles comparisons involving signed and unsigned numbers can lead to many weird quirks. Except in the situations where the top value of a count would fit in an unsigned but not the corresponding signed type (common in the days where int was 16 bits, but far less so today) it's better to have counts that behave like numbers.Croatia
"Errors only occur with unsigned types if you try to do arithmetic with them" - Which happens all the time. "The idea of using arithmetical types (like signed integers) to store container sizes is idiotic" - It isn't and the C++ committee now considers it a historical mistake to use size_t. The reason? Implicit conversions.Window
@ÁtilaNeves First of all, I share Linus Torvalds opinions about C++, so if you are citing some C++ guy as an authority on language design, that is not a winning argument with me. Secondly, If you are passing a signed value to an unsigned interface, that is a type error as far as I am concerned, not something that should be converted.Bharal
@TylerDurden Not "some C++ guy" - the committee. A signed value being passed to an unsigned interface should be a type error, but unfortunately isn't in the C type system (and by extension C++ and D).Window
O
1

Using unsigned types to represent non-negative values...

  • is more likely to cause bugs involving type promotion, when using signed and unsigned values, as other answer demonstrate and discuss in depth, but
  • is less likely to cause bugs involving choice of types with domains capable of representing undersirable/disallowed values. In some places you'll assume the value is in the domain, and may get unexpected and potentially hazardous behavior when other value sneak in somehow.

The Google Coding Guidelines puts emphasis on the first kind of consideration. Other guideline sets, such as the C++ Core Guidelines, put more emphasis on the second point. For example, consider Core Guideline I.12:

I.12: Declare a pointer that must not be null as not_null

Reason

To help avoid dereferencing nullptr errors. To improve performance by avoiding redundant checks for nullptr.

Example

int length(const char* p);            // it is not clear whether length(nullptr) is valid
length(nullptr);                      // OK?
int length(not_null<const char*> p);  // better: we can assume that p cannot be nullptr
int length(const char* p);            // we must assume that p can be nullptr

By stating the intent in source, implementers and tools can provide better diagnostics, such as finding some classes of errors through static analysis, and perform optimizations, such as removing branches and null tests.

Of course, you could argue for a non_negative wrapper for integers, which avoids both categories of errors, but that would have its own issues...

Omen answered 4/8, 2018 at 23:6 Comment(0)
H
0

The google statement is about using unsigned as a size type for containers. In contrast, the question appears to be more general. Please keep that in mind, while you read on.

Since most answers so far reacted to the google statement, less so to the bigger question, I will start my answer about negative container sizes and subsequently try to convince anyone (hopeless, I know...) that unsigned is good.

Signed container sizes

Lets assume someone coded a bug, which results in a negative container index. The result is either undefined behavior or an exception / access violation. Is that really better than getting undefined behavior or an exception / access violation when the index type was unsigned? I think, no.

Now, there is a class of people who love to talk about mathematics and what is "natural" in this context. How can an integral type with negative number be natural to describe something, which is inherently >= 0? Using arrays with negative sizes much? IMHO, especially mathematically inclined people would find this mismatch of semantics (size/index type says negative is possible, while a negative sized array is hard to imagine) irritating.

So, the only question, remaining on this matter is if - as stated in the google comment - a compiler could actually actively assist in finding such bugs. And even better than the alternative, which would be underflow protected unsigned integers (x86-64 assembly and probably other architectures have means to achieve that, only C/C++ does not use those means). The only way I can fathom is if the compiler automatically added run time checks (if (index < 0) throwOrWhatever) or in case of compile time actions produce a lot of potentially false positive warnings/errors "The index for this array access could be negative." I have my doubts, this would be helpful.

Also, people who actually write runtime checks for their array/container indices, it is more work dealing with signed integers. Instead of writing if (index < container.size()) { ... } you now have to write: if (index >= 0 && index < container.size()) { ... }. Looks like forced labor to me and not like an improvement...

Languages without unsigned types suck...

Yes, this is a stab at java. Now, I come from embedded programming background and we worked a lot with field buses, where binary operations (and,or,xor,...) and bit wise composition of values is literally the bread and butter. For one of our products, we - or rather a customer - wanted a java port... and I sat opposite to the luckily very competent guy who did the port (I refused...). He tried to stay composed... and suffer in silence... but the pain was there, he could not stop cursing after a few days of constantly dealing with signed integral values, which SHOULD be unsigned... Even writing unit tests for those scenarios is painful and me, personally I think java would have been better off if they had omitted signed integers and just offered unsigned... at least then, you do not have to care about sign extensions etc... and you can still interpret numbers as 2s complement.

Those are my 5 cents on the matter.

Halfprice answered 17/9, 2020 at 22:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.