Why is int rather than unsigned int used for C and C++ for loops?

K

11

53

This is a rather silly question but why is int commonly used instead of unsigned int when defining a for loop for an array in C or C++?

for(int i;i<arraySize;i++){}
for(unsigned int i;i<arraySize;i++){}

I recognize the benefits of using int when doing something other than array indexing and the benefits of an iterator when using C++ containers. Is it just because it does not matter when looping through an array? Or should I avoid it all together and use a different type such as size_t?

Kidder answered 20/9, 2011 at 16:57 Comment(6)

It's less to write. – Apocryphal 20/9, 2011 at 17:0

It's like why do you not call other person with first and last names all the time but just call them by their first name? – Hallam 20/9, 2011 at 17:2

Actually for indexes I prefer to use size_t, which is guaranteed to be big enough and is much less typing than unsigned int. – Oppression 20/9, 2011 at 17:4

A very good article that explains why do we need size_t and ptrdiff_t: viva64.com/en/a/0050 – Glenoid 20/9, 2011 at 18:30

@Blagovest: The motivation part of that article is good, but the rest of it is full of misinformation (equivalence of types, ability to store pointers in size_t, etc.) and completely glosses over the signed overflow and range issues with ptrdiff_t. I would hesitate to call it "very good".. – Bogbean 20/9, 2011 at 18:53

unsigned is the same as usigned int. – Pergolesi 22/1, 2016 at 8:45

T

32

This is a more general phenomenon, often people don't use the correct types for their integers. Modern C has semantic typedefs that are much preferable over the primitive integer types. E.g everything that is a "size" should just be typed as size_t. If you use the semantic types systematically for your application variables, loop variables come much easier with these types, too.

And I have seen several bugs that where difficult to detect that came from using int or so. Code that all of a sudden crashed on large matrixes and stuff like that. Just coding correctly with correct types avoids that.

Tret answered 20/9, 2011 at 17:29 Comment(8)

The correct type for a size is size_t, unfortunately size_t has been defined using a wrong type itself (unsigned) and this is the source of a big number of bugs. I prefer using semantically correct types for the code (e.g. int) than using formally correct but semantically wrong types. With ints you may run into bugs for very large (incredibly large) values... with unsigned values the crazy behavior is much closer to everyday use (0). – Carlcarla 8/6, 2014 at 12:4

@6502, opinions seems to vary a lot on that. You could have a look at my blog post about that: gustedt.wordpress.com/2013/07/15/… – Tret 8/6, 2014 at 12:47

@JensGustedt: that the semantic is wrong is not an opinion, unless you think it's correct that a.size() - b.size() should be about four billions when b has one element and a has none. That someone thinks that unsigned is a fantastic idea for non-negative numbers you're correct, but my impression is that they put too much weight on the name rather on the real meaning. Among the ones that think that unsigned is a bad idea for counters and indexes is Bjarne Stroustrup... see https://mcmap.net/q/75098/-why-is-size_t-unsigned/320726 – Carlcarla 8/6, 2014 at 13:18

@6502, as I said, opinions vary a lot. And SO shouldn't be a place to discuss opinions, especially of people not involved in the discussion themselves. Stroustrup is certainly a reference for many things, but not for C. – Tret 8/6, 2014 at 17:32

@Carlcarla Sorry, but the semantics you think are correct are not. size_t - size_t should be off_t, not size_t. – Tristis 8/3, 2018 at 13:16

@MilesRout: not sure what you're talking about. Are you saying that is logically correct that a.size()-b.size() should be four billions when a has two elements and b has three elements? This happens in C++ simply because size_t is an unsigned type and choosing such a type for container size was a logical mistake that cannot be fixed now. If you say that the difference of two "non-negative" values should be a "possibly negative" value then I agree... but non-negative and unsigned are very different concepts and in C++ the difference of two unsigned is unsigned. – Carlcarla 8/3, 2018 at 15:29

@Carlcarla No, I'm saying that it's absurd to claim that 'unsigned' and 'nonnegative' are different concepts. They're the same concept. The issue is not unsigned. The issue is that the subtraction of two unsigned values should be signed. a.size() should be size_t, but a.size() - b.size() should be ptrdiff_t, just as the subtraction of two pointers doesn't give you a pointer, but a ptrdiff_t. A pointer is, after all, basically the same as a size_t. – Tristis 9/3, 2018 at 10:42

@MilesRout: unsigned has a very precise meaning in C++ and that meaning is unrelated to "non-negative". May be you like the name of the type, but the name is irrelevant and what counts is the semantic. unsigned means "modulo integer" or "bitmask" ... and saying that the size of a container should be a modulo integer or a bitmask is the error that was made long ago and that unfortunately there's no way to fix now. More details on why unsigned is different from "non-negative" on this video... youtu.be/4afySnY-XgY – Carlcarla 9/3, 2018 at 11:43

C

44

Using int is more correct from a logical point of view for indexing an array.

unsigned semantic in C and C++ doesn't really mean "not negative" but it's more like "bitmask" or "modulo integer".

To understand why unsigned is not a good type for a "non-negative" number please consider these totally absurd statements:

Adding a possibly negative integer to a non-negative integer you get a non-negative integer
The difference of two non-negative integers is always a non-negative integer
Multiplying a non-negative integer by a negative integer you get a non-negative result

Obviously none of the above phrases make any sense... but it's how C and C++ unsigned semantic indeed works.

Actually using an unsigned type for the size of containers is a design mistake of C++ and unfortunately we're now doomed to use this wrong choice forever (for backward compatibility). You may like the name "unsigned" because it's similar to "non-negative" but the name is irrelevant and what counts is the semantic... and unsigned is very far from "non-negative".

For this reason when coding most loops on vectors my personally preferred form is:

for (int i=0,n=v.size(); i<n; i++) {
    ...
}

(of course assuming the size of the vector is not changing during the iteration and that I actually need the index in the body as otherwise the for (auto& x : v)... is better).

This running away from unsigned as soon as possible and using plain integers has the advantage of avoiding the traps that are a consequence of unsigned size_t design mistake. For example consider:

// draw lines connecting the dots
for (size_t i=0; i<pts.size()-1; i++) {
    drawLine(pts[i], pts[i+1]);
}

the code above will have problems if the pts vector is empty because pts.size()-1 is a huge nonsense number in that case. Dealing with expressions where a < b-1 is not the same as a+1 < b even for commonly used values is like dancing in a minefield.

Historically the justification for having size_t unsigned is for being able to use the extra bit for the values, e.g. being able to have 65535 elements in arrays instead of just 32767 on 16-bit platforms. In my opinion even at that time the extra cost of this wrong semantic choice was not worth the gain (and if 32767 elements are not enough now then 65535 won't be enough for long anyway).

Unsigned values are great and very useful, but NOT for representing container size or for indexes; for size and index regular signed integers work much better because the semantic is what you would expect.

Unsigned values are the ideal type when you need the modulo arithmetic property or when you want to work at the bit level.

Carlcarla answered 8/6, 2014 at 9:36 Comment(15)

I think you are right because java (a "improved" c++) not support unsigned int. Also I think the correct way of write that line is: size_t arr_index; for (size_t i=1; i<=pts.size(); i++) { arr_index = i - 1; } – Nancynandor 5/12, 2014 at 17:33

@carlos: No. That would be the correct way if size_t would have been defined correctly. Unfortunately a design error made size_t an unsigned and therefore those values ended up having bitmask semantic. Unless you think that's correct that the size of a container is a bitmask then using size_t is the wrong choice. A choice that unfortunately was made by the standard C++ library, but no one forces me to repeat the same error in my code. My suggestion is to just run away from size_t and use regular ints as soon as you can instead of beding logic so that it works also with size_t. – Carlcarla 5/12, 2014 at 17:44

It's not just 16-bit platforms. With current size_t you can use e.g. vector<char> of size e.g. 2.1G on IA-32 Linux with 3G/1G memory split. If size_t were signed, what would be if you increase your vector from <2G to a bit more? Suddenly the size will become negative. This just doesn't make any sense. The language should not impose such artificial limits. – Abohm 29/7, 2015 at 19:8

@Ruslan: it's amazing how this very weak argument can stick even with reasonably good programmers: the idea of a single array containing single bytes eating up most of your address space is totally absurd and I'm sure not something that faces up often yet apparently is considered very important by "unsigned for size" zealots. It would have been nice to have a data type with the ability to use all the bits and with the semantic of a "non-negative" integer, but unfortunately no such a type exist in C++ and using unsigned instead of that is nonsense. – Carlcarla 27/3, 2018 at 10:42

@Carlcarla see this code for an example of hack one needs to implement when an offset type is signed (here off_t/off64_t – not exactly C++, but related: POSIX). Not too inefficient, of course, but ugly. And this code is not of theoretical need there: it was implemented out of necessity. – Abohm 27/3, 2018 at 10:56

@Ruslan: you mean that code was used to seek into a file bigger than 8 exabytes and so they found a bug? – Carlcarla 27/3, 2018 at 17:54

@Carlcarla 18 exabytes actually. You see, /proc/<pid>/mem file on x86_64 Linux for 64-bit processes does have valid pages in the "negative" address range – namely, [vsyscall] at 0xffffffffff600000. See the original EDB issue that was fixed by the code. – Abohm 27/3, 2018 at 18:51

@Ruslan: using an unsigned type of the exact size for addresses in an address space may make sense indeed. However it does NOT make sense having an unsigned type to represent the number of elements inside a container (what std:: containers do). – Carlcarla 28/3, 2018 at 12:21

I know you invested a lot in this answer and I appreciate your effort and examples. Unfortunately the answer is incorrect and the unsigned size_t design was the logical and obvious choice (and not a mistake). I added my own answer to explain further. Of course, you're free to disagree and downvote my answer). – Mistassini 2/12, 2022 at 18:38

@Myst: I'm old enough to tolerate knowing that someone is wrong on the internet and still sleep well. You are wrong and apparently don't even understand what overflow is for C++. A few years ago I've made a (bad) video to explain in more detail why "non-negative" and "unsigned" are two very different concepts that apparently you are confusing and about what "overflow" means. If you can tolerate strong italian accent and terrible audio this is the link youtube.com/watch?v=4afySnY-XgY – Carlcarla 2/12, 2022 at 22:9

I loved the video and the analogy to the analog clock. However, this ignores the fact that all mathematics on a limited number of bits (i.e., 32 bits) is modular. The only difference is that on signed integers you get undefined behavior once you hid the modular limit (overflow) and on unsigned integers you know what you're dealing with (wraparound) and you can handle the edge cases. I personally agree with the school of thought that believes that Bjarne was mistaken about this one. – Mistassini 3/12, 2022 at 1:19

@Myst: You still apparently do not understand what undefined behavior means. Signed arithmetic is NOT modular for C++. It's modular for most processors, that's true, but not for C++. For example a C++ compiler will assume that (a < a+1) is always true for signed integers, because it's true when you're not overflowing and what happens when you do is irrelevant because that is UB. This is not a bug, it's C++. UB is kind of a weird beast (and getting worse every year) and can even travel back in time... devblogs.microsoft.com/oldnewthing/20140627-00/?p=633 – Carlcarla 3/12, 2022 at 7:28

I agree and this is exactly the problem. Signed arithmetics is not necessarily modular, even though it is limited to the number of bits that the type holds... which means that once the limit is reached, you get UB. The compiler might optimize away the code, or you get infinite loops. Even if the code runs, on some CPUs you get a TRAP on others modular behavior. With signed integers you do not know what you get. That's why they should be avoided for all arithmetic operations. – Mistassini 4/12, 2022 at 13:34

@Myst: "crazy" behavior of signed integers happens at around 2 billions, a number that is rarely found in code. "crazy" behavior of unsigned integers happens around 0, probably the numeric value that is used most in programming (especially with containers). It would have been nice if C++ had a "non-negative" type for sizes, with proper semantic (e.g. that the subtraction of two non-negative integers is a possibly-negative integer), unfortunately no such a type exist in C++. Wrap-around is ok and useful when you're looking for wrap-around, not for container size. – Carlcarla 4/12, 2022 at 20:35

I do not know if 2Gb is such a rare indexing space for the type of systems that C++ code is used on... for me it sounds silly if I opened a file and my index would cause UB at 2Gb. Besides, it's not about what's "rare" it's about what's correct. All computers are designed to use modular math. The illusion that math is not modular of a limited precision machine is a cause for errors and part of the reason people mislabel C++ as unsafe, causing things like this. – Mistassini 8/12, 2022 at 19:9