Why is int rather than unsigned int used for C and C++ for loops?
Asked Answered
K

11

53

This is a rather silly question but why is int commonly used instead of unsigned int when defining a for loop for an array in C or C++?

for(int i;i<arraySize;i++){}
for(unsigned int i;i<arraySize;i++){}

I recognize the benefits of using int when doing something other than array indexing and the benefits of an iterator when using C++ containers. Is it just because it does not matter when looping through an array? Or should I avoid it all together and use a different type such as size_t?

Kidder answered 20/9, 2011 at 16:57 Comment(6)
It's less to write.Apocryphal
It's like why do you not call other person with first and last names all the time but just call them by their first name?Hallam
Actually for indexes I prefer to use size_t, which is guaranteed to be big enough and is much less typing than unsigned int.Oppression
A very good article that explains why do we need size_t and ptrdiff_t: viva64.com/en/a/0050Glenoid
@Blagovest: The motivation part of that article is good, but the rest of it is full of misinformation (equivalence of types, ability to store pointers in size_t, etc.) and completely glosses over the signed overflow and range issues with ptrdiff_t. I would hesitate to call it "very good"..Bogbean
unsigned is the same as usigned int.Pergolesi
T
32

This is a more general phenomenon, often people don't use the correct types for their integers. Modern C has semantic typedefs that are much preferable over the primitive integer types. E.g everything that is a "size" should just be typed as size_t. If you use the semantic types systematically for your application variables, loop variables come much easier with these types, too.

And I have seen several bugs that where difficult to detect that came from using int or so. Code that all of a sudden crashed on large matrixes and stuff like that. Just coding correctly with correct types avoids that.

Tret answered 20/9, 2011 at 17:29 Comment(8)
The correct type for a size is size_t, unfortunately size_t has been defined using a wrong type itself (unsigned) and this is the source of a big number of bugs. I prefer using semantically correct types for the code (e.g. int) than using formally correct but semantically wrong types. With ints you may run into bugs for very large (incredibly large) values... with unsigned values the crazy behavior is much closer to everyday use (0).Carlcarla
@6502, opinions seems to vary a lot on that. You could have a look at my blog post about that: gustedt.wordpress.com/2013/07/15/…Tret
@JensGustedt: that the semantic is wrong is not an opinion, unless you think it's correct that a.size() - b.size() should be about four billions when b has one element and a has none. That someone thinks that unsigned is a fantastic idea for non-negative numbers you're correct, but my impression is that they put too much weight on the name rather on the real meaning. Among the ones that think that unsigned is a bad idea for counters and indexes is Bjarne Stroustrup... see https://mcmap.net/q/75098/-why-is-size_t-unsigned/320726Carlcarla
@6502, as I said, opinions vary a lot. And SO shouldn't be a place to discuss opinions, especially of people not involved in the discussion themselves. Stroustrup is certainly a reference for many things, but not for C.Tret
@Carlcarla Sorry, but the semantics you think are correct are not. size_t - size_t should be off_t, not size_t.Tristis
@MilesRout: not sure what you're talking about. Are you saying that is logically correct that a.size()-b.size() should be four billions when a has two elements and b has three elements? This happens in C++ simply because size_t is an unsigned type and choosing such a type for container size was a logical mistake that cannot be fixed now. If you say that the difference of two "non-negative" values should be a "possibly negative" value then I agree... but non-negative and unsigned are very different concepts and in C++ the difference of two unsigned is unsigned.Carlcarla
@Carlcarla No, I'm saying that it's absurd to claim that 'unsigned' and 'nonnegative' are different concepts. They're the same concept. The issue is not unsigned. The issue is that the subtraction of two unsigned values should be signed. a.size() should be size_t, but a.size() - b.size() should be ptrdiff_t, just as the subtraction of two pointers doesn't give you a pointer, but a ptrdiff_t. A pointer is, after all, basically the same as a size_t.Tristis
@MilesRout: unsigned has a very precise meaning in C++ and that meaning is unrelated to "non-negative". May be you like the name of the type, but the name is irrelevant and what counts is the semantic. unsigned means "modulo integer" or "bitmask" ... and saying that the size of a container should be a modulo integer or a bitmask is the error that was made long ago and that unfortunately there's no way to fix now. More details on why unsigned is different from "non-negative" on this video... youtu.be/4afySnY-XgYCarlcarla
C
44

Using int is more correct from a logical point of view for indexing an array.

unsigned semantic in C and C++ doesn't really mean "not negative" but it's more like "bitmask" or "modulo integer".

To understand why unsigned is not a good type for a "non-negative" number please consider these totally absurd statements:

  • Adding a possibly negative integer to a non-negative integer you get a non-negative integer
  • The difference of two non-negative integers is always a non-negative integer
  • Multiplying a non-negative integer by a negative integer you get a non-negative result

Obviously none of the above phrases make any sense... but it's how C and C++ unsigned semantic indeed works.

Actually using an unsigned type for the size of containers is a design mistake of C++ and unfortunately we're now doomed to use this wrong choice forever (for backward compatibility). You may like the name "unsigned" because it's similar to "non-negative" but the name is irrelevant and what counts is the semantic... and unsigned is very far from "non-negative".

For this reason when coding most loops on vectors my personally preferred form is:

for (int i=0,n=v.size(); i<n; i++) {
    ...
}

(of course assuming the size of the vector is not changing during the iteration and that I actually need the index in the body as otherwise the for (auto& x : v)... is better).

This running away from unsigned as soon as possible and using plain integers has the advantage of avoiding the traps that are a consequence of unsigned size_t design mistake. For example consider:

// draw lines connecting the dots
for (size_t i=0; i<pts.size()-1; i++) {
    drawLine(pts[i], pts[i+1]);
}

the code above will have problems if the pts vector is empty because pts.size()-1 is a huge nonsense number in that case. Dealing with expressions where a < b-1 is not the same as a+1 < b even for commonly used values is like dancing in a minefield.

Historically the justification for having size_t unsigned is for being able to use the extra bit for the values, e.g. being able to have 65535 elements in arrays instead of just 32767 on 16-bit platforms. In my opinion even at that time the extra cost of this wrong semantic choice was not worth the gain (and if 32767 elements are not enough now then 65535 won't be enough for long anyway).

Unsigned values are great and very useful, but NOT for representing container size or for indexes; for size and index regular signed integers work much better because the semantic is what you would expect.

Unsigned values are the ideal type when you need the modulo arithmetic property or when you want to work at the bit level.

Carlcarla answered 8/6, 2014 at 9:36 Comment(15)
I think you are right because java (a "improved" c++) not support unsigned int. Also I think the correct way of write that line is: size_t arr_index; for (size_t i=1; i<=pts.size(); i++) { arr_index = i - 1; }Nancynandor
@carlos: No. That would be the correct way if size_t would have been defined correctly. Unfortunately a design error made size_t an unsigned and therefore those values ended up having bitmask semantic. Unless you think that's correct that the size of a container is a bitmask then using size_t is the wrong choice. A choice that unfortunately was made by the standard C++ library, but no one forces me to repeat the same error in my code. My suggestion is to just run away from size_t and use regular ints as soon as you can instead of beding logic so that it works also with size_t.Carlcarla
It's not just 16-bit platforms. With current size_t you can use e.g. vector<char> of size e.g. 2.1G on IA-32 Linux with 3G/1G memory split. If size_t were signed, what would be if you increase your vector from <2G to a bit more? Suddenly the size will become negative. This just doesn't make any sense. The language should not impose such artificial limits.Abohm
@Ruslan: it's amazing how this very weak argument can stick even with reasonably good programmers: the idea of a single array containing single bytes eating up most of your address space is totally absurd and I'm sure not something that faces up often yet apparently is considered very important by "unsigned for size" zealots. It would have been nice to have a data type with the ability to use all the bits and with the semantic of a "non-negative" integer, but unfortunately no such a type exist in C++ and using unsigned instead of that is nonsense.Carlcarla
@Carlcarla see this code for an example of hack one needs to implement when an offset type is signed (here off_t/off64_t – not exactly C++, but related: POSIX). Not too inefficient, of course, but ugly. And this code is not of theoretical need there: it was implemented out of necessity.Abohm
@Ruslan: you mean that code was used to seek into a file bigger than 8 exabytes and so they found a bug?Carlcarla
@Carlcarla 18 exabytes actually. You see, /proc/<pid>/mem file on x86_64 Linux for 64-bit processes does have valid pages in the "negative" address range – namely, [vsyscall] at 0xffffffffff600000. See the original EDB issue that was fixed by the code.Abohm
@Ruslan: using an unsigned type of the exact size for addresses in an address space may make sense indeed. However it does NOT make sense having an unsigned type to represent the number of elements inside a container (what std:: containers do).Carlcarla
I know you invested a lot in this answer and I appreciate your effort and examples. Unfortunately the answer is incorrect and the unsigned size_t design was the logical and obvious choice (and not a mistake). I added my own answer to explain further. Of course, you're free to disagree and downvote my answer).Mistassini
@Myst: I'm old enough to tolerate knowing that someone is wrong on the internet and still sleep well. You are wrong and apparently don't even understand what overflow is for C++. A few years ago I've made a (bad) video to explain in more detail why "non-negative" and "unsigned" are two very different concepts that apparently you are confusing and about what "overflow" means. If you can tolerate strong italian accent and terrible audio this is the link youtube.com/watch?v=4afySnY-XgYCarlcarla
I loved the video and the analogy to the analog clock. However, this ignores the fact that all mathematics on a limited number of bits (i.e., 32 bits) is modular. The only difference is that on signed integers you get undefined behavior once you hid the modular limit (overflow) and on unsigned integers you know what you're dealing with (wraparound) and you can handle the edge cases. I personally agree with the school of thought that believes that Bjarne was mistaken about this one.Mistassini
@Myst: You still apparently do not understand what undefined behavior means. Signed arithmetic is NOT modular for C++. It's modular for most processors, that's true, but not for C++. For example a C++ compiler will assume that (a < a+1) is always true for signed integers, because it's true when you're not overflowing and what happens when you do is irrelevant because that is UB. This is not a bug, it's C++. UB is kind of a weird beast (and getting worse every year) and can even travel back in time... devblogs.microsoft.com/oldnewthing/20140627-00/?p=633Carlcarla
I agree and this is exactly the problem. Signed arithmetics is not necessarily modular, even though it is limited to the number of bits that the type holds... which means that once the limit is reached, you get UB. The compiler might optimize away the code, or you get infinite loops. Even if the code runs, on some CPUs you get a TRAP on others modular behavior. With signed integers you do not know what you get. That's why they should be avoided for all arithmetic operations.Mistassini
@Myst: "crazy" behavior of signed integers happens at around 2 billions, a number that is rarely found in code. "crazy" behavior of unsigned integers happens around 0, probably the numeric value that is used most in programming (especially with containers). It would have been nice if C++ had a "non-negative" type for sizes, with proper semantic (e.g. that the subtraction of two non-negative integers is a possibly-negative integer), unfortunately no such a type exist in C++. Wrap-around is ok and useful when you're looking for wrap-around, not for container size.Carlcarla
I do not know if 2Gb is such a rare indexing space for the type of systems that C++ code is used on... for me it sounds silly if I opened a file and my index would cause UB at 2Gb. Besides, it's not about what's "rare" it's about what's correct. All computers are designed to use modular math. The illusion that math is not modular of a limited precision machine is a cause for errors and part of the reason people mislabel C++ as unsafe, causing things like this.Mistassini
T
32

This is a more general phenomenon, often people don't use the correct types for their integers. Modern C has semantic typedefs that are much preferable over the primitive integer types. E.g everything that is a "size" should just be typed as size_t. If you use the semantic types systematically for your application variables, loop variables come much easier with these types, too.

And I have seen several bugs that where difficult to detect that came from using int or so. Code that all of a sudden crashed on large matrixes and stuff like that. Just coding correctly with correct types avoids that.

Tret answered 20/9, 2011 at 17:29 Comment(8)
The correct type for a size is size_t, unfortunately size_t has been defined using a wrong type itself (unsigned) and this is the source of a big number of bugs. I prefer using semantically correct types for the code (e.g. int) than using formally correct but semantically wrong types. With ints you may run into bugs for very large (incredibly large) values... with unsigned values the crazy behavior is much closer to everyday use (0).Carlcarla
@6502, opinions seems to vary a lot on that. You could have a look at my blog post about that: gustedt.wordpress.com/2013/07/15/…Tret
@JensGustedt: that the semantic is wrong is not an opinion, unless you think it's correct that a.size() - b.size() should be about four billions when b has one element and a has none. That someone thinks that unsigned is a fantastic idea for non-negative numbers you're correct, but my impression is that they put too much weight on the name rather on the real meaning. Among the ones that think that unsigned is a bad idea for counters and indexes is Bjarne Stroustrup... see https://mcmap.net/q/75098/-why-is-size_t-unsigned/320726Carlcarla
@6502, as I said, opinions vary a lot. And SO shouldn't be a place to discuss opinions, especially of people not involved in the discussion themselves. Stroustrup is certainly a reference for many things, but not for C.Tret
@Carlcarla Sorry, but the semantics you think are correct are not. size_t - size_t should be off_t, not size_t.Tristis
@MilesRout: not sure what you're talking about. Are you saying that is logically correct that a.size()-b.size() should be four billions when a has two elements and b has three elements? This happens in C++ simply because size_t is an unsigned type and choosing such a type for container size was a logical mistake that cannot be fixed now. If you say that the difference of two "non-negative" values should be a "possibly negative" value then I agree... but non-negative and unsigned are very different concepts and in C++ the difference of two unsigned is unsigned.Carlcarla
@Carlcarla No, I'm saying that it's absurd to claim that 'unsigned' and 'nonnegative' are different concepts. They're the same concept. The issue is not unsigned. The issue is that the subtraction of two unsigned values should be signed. a.size() should be size_t, but a.size() - b.size() should be ptrdiff_t, just as the subtraction of two pointers doesn't give you a pointer, but a ptrdiff_t. A pointer is, after all, basically the same as a size_t.Tristis
@MilesRout: unsigned has a very precise meaning in C++ and that meaning is unrelated to "non-negative". May be you like the name of the type, but the name is irrelevant and what counts is the semantic. unsigned means "modulo integer" or "bitmask" ... and saying that the size of a container should be a modulo integer or a bitmask is the error that was made long ago and that unfortunately there's no way to fix now. More details on why unsigned is different from "non-negative" on this video... youtu.be/4afySnY-XgYCarlcarla
B
5

It's purely laziness and ignorance. You should always use the right types for indices, and unless you have further information that restricts the range of possible indices, size_t is the right type.

Of course if the dimension was read from a single-byte field in a file, then you know it's in the range 0-255, and int would be a perfectly reasonable index type. Likewise, int would be okay if you're looping a fixed number of times, like 0 to 99. But there's still another reason not to use int: if you use i%2 in your loop body to treat even/odd indices differently, i%2 is a lot more expensive when i is signed than when i is unsigned...

Bogbean answered 20/9, 2011 at 18:45 Comment(2)
see #3 on my answer, its not "purely" laziness/ignoranceEmigration
That doesn't change the fact that the code is wrong. Here's one way to fix it: for (size_t i=10; i-->0; )Bogbean
J
4

Not much difference. One benefit of int is it being signed. Thus int i < 0 makes sense, while unsigned i < 0 doesn't much.

If indexes are calculated, that may be beneficial (for example, you might get cases where you will never enter a loop if some result is negative).

And yes, it is less to write :-)

Jungle answered 20/9, 2011 at 17:1 Comment(4)
typedef unsigned us; and it's more to write.Cootch
@WTP - you're one of those who will not understand sarcasm even with the ":-)" right next to it? Well, I guess no cure there...Jungle
A negative size or negative index makes no senseTristis
@MilesRout: An attempt to operate on a negative number of items will generally have different implications from an attempt to operate on a really large positive number of items. If a function which is supposed to operate all but the last item of a collection is passed a collection with no items, having the number of items to process be recognizable as -1 seems cleaner than having it be SIZE_MAX.Crandall
G
2

Using int to index an array is legacy, but still widely adopted. int is just a generic number type and does not correspond to the addressing capabilities of the platform. In case it happens to be shorter or longer than that, you may encounter strange results when trying to index a very large array that goes beyond.

On modern platforms, off_t, ptrdiff_t and size_t guarantee much more portability.

Another advantage of these types is that they give context to someone who reads the code. When you see the above types you know that the code will do array subscripting or pointer arithmetic, not just any calculation.

So, if you want to write bullet-proof, portable and context-sensible code, you can do it at the expense of a few keystrokes.

GCC even supports a typeof extension which relieves you from typing the same typename all over the place:

typeof(arraySize) i;

for (i = 0; i < arraySize; i++) {
  ...
}

Then, if you change the type of arraySize, the type of i changes automatically.

Glenoid answered 20/9, 2011 at 17:3 Comment(2)
Though to be fair, on all but the most obscure 32- and 64-bit platforms, you'd need at least 4 billion elements for such issues to show up. And platforms with smaller ints typically have far less memory as well, making int sufficient in general.Thurlough
@delnan: It's not so simple. This kind of reasoning has led to very serious vulnerabilities in the past, even by folks who think of themselves as security gods like DJB...Bogbean
M
1

It really depends on the coder. Some coders prefer type perfectionism, so they'll use whatever type they're comparing against. For example, if they're iterating through a C string, you might see:

size_t sz = strlen("hello");
for (size_t i = 0; i < sz; i++) {
    ...
}

While if they're just doing something 10 times, you'll probably still see int:

for (int i = 0; i < 10; i++) {
    ...
}
Mesenchyme answered 20/9, 2011 at 17:3 Comment(0)
C
0

I use int cause it requires less physical typing and it doesn't matter - they take up the same amount of space, and unless your array has a few billion elements you won't overflow if you're not using a 16-bit compiler, which I'm usually not.

Carmon answered 20/9, 2011 at 17:0 Comment(1)
Not using an int also gives more context about the variable and can be regarded as self-documenting code. Also have a read here: viva64.com/en/a/0050Glenoid
O
0

Because unless you have an array with size bigger than two gigabyts of type char, or 4 gigabytes of type short or 8 gigabytes of type int etc, it doesn't really matter if the variable is signed or not.

So, why type more when you can type less?

Organicism answered 20/9, 2011 at 17:3 Comment(5)
But then, if arraySize is variable and you want to write bullet-proof code, off_t, ptrdiff_t and size_t still carry some significance.Glenoid
Yes, that is absolutely necessary if you MAY have such super huge arrays, but since people normally don't, then they just use the simple-to-write int. For example, if you are sorting an array of int with O(n^2), you basically have to wait forever for the array to be sorted if there are more than 2M elements, that is given if you even have 8GB of memory. So you see, usually even if you make the indexing right, most programs are useless when given input THAT large. So why make them bullet-proof?Organicism
@Shahbaz: Most of us would find it just unfortunate if passing a giant array made the sort take weeks to complete, but would find it completely unacceptable when passing a giant array yields a root shell.Bogbean
@R.. don't get me wrong, I'm not saying this is good, I'm answering the question that asks why people use int all the time.Organicism
I was responding to your most recent comment.Bogbean
D
0

Aside from the issue that it's shorter to type, the reason is that it allows negative numbers.

Since we can't say in advance whether a value can ever be negative, most functions that take integer arguments take the signed variety. Since most functions use signed integers, it is often less work to use signed integers for things like loops. Otherwise, you have the potential of having to add a bunch of typecasts.

As we move to 64-bit platforms, the unsigned range of a signed integer should be more than enough for most purposes. In these cases, there's not much reason not to use a signed integer.

Darksome answered 20/9, 2011 at 17:5 Comment(1)
Negative values are a key point, and yours is the only answer that makes more than a token mention of that. But, sadly there're implicit Standard conversions between signed and unsigned parameter types that mean mixing them can just stuff up rather than the inconvenient but safe scenario you describe of "having to add a bunch of typecasts". And "As we move to 64-bit platforms, the unsigned range of a signed integer..." isn't actually growing for most compiler/OSes - ints still tend to be 32 bits, with longs moving from 32 to 64.Hawthorn
C
0

Consider the following simple example:

int max = some_user_input; // or some_calculation_result
for(unsigned int i = 0; i < max; ++i)
    do_something;

If max happens to be a negative value, say -1, the -1 will be regarded as UINT_MAX (when two integers with the sam rank but different sign-ness are compared, the signed one will be treated as an unsigned one). On the other hand, the following code would not have this issue:

int max = some_user_input;
for(int i = 0; i < max; ++i)
    do_something;

Give a negative max input, the loop will be safely skipped.

Contravention answered 24/5, 2017 at 1:17 Comment(0)
M
0

Using a signed int is - in most cases - a mistake that could easily result in potential bugs as well as undefined behavior.

Using size_t matches the system's word size (64 bits on 64 bit systems and 32 bits on 32 bit systems), always allowing for the correct range for the loop and minimizing the risk of an integer overflow.

The int recommendation comes to solve an issue where reverse for loops were often written incorrectly by unexperienced programmers (of course, int might not be in the correct range for the loop):

/* a correct reverse for loop */
for (size_t i = count; i > 0;) {
   --i; /* note that this is not part of the `for` statement */
   /* code for loop where i is for zero based `index` */
}
/* an incorrect reverse for loop (bug on count == 0) */
for (size_t i = count - 1; i > 0; --i) {
   /* i might have overflowed and undefined behavior occurs */
}

In general, signed and unsigned variables shouldn't be mixed together, so at times using an int in unavoidable. However, the correct type for a for loop is as a rule size_t.

There's a nice talk about this misconception that signed variables are better than unsigned variables, you can find it on YouTube (Signed Integers Considered Harmful by Robert Seacord).

TL;DR;: Signed variables are more dangerous and require more code than unsigned variables (which should be preferred almost in all cases and definitely whenever negative values aren't logically expected).

With unsigned variables the only concern is the overflow boundary which has a strictly defined behavior (wrap-around) and uses clearly defined modular mathematics.

This allows a single edge case test to catch an overflow and that test can be performed after the mathematical operation was executed.

However, with signed variables the overflow behavior is undefined (UB) and the negative range is actually larger than the positive range - things that add edge cases that must be tested for and explicitly handled before the mathematical operation can be executed.

i.e., how much INT_MIN * -1? (the pre-processor will protect you, but without it you're in a jam).

P.S.

As for the example offered by @6502 in their answer, the whole thing is again an issue of trying to cut corners and a simple missing if statement.

When a loop assumes at least 2 elements in an array, this assumption should be tested beforehand. i.e.:

// draw lines connecting the dots - forward loop
if(pts.size() > 1) { // first make sure there's enough dots
  for (size_t i=0; i < pts.size()-1; i++) { // then loop
    drawLine(pts[i], pts[i+1]);
  }
}
// or test against i + 1 : which tests the desired pts[i+1]
for (size_t i = 0; i + 1 < pts.size(); i++) { // then loop
  drawLine(pts[i], pts[i+1]);
}
// or start i as 1 : but note that `-` is slower than `+`
for (size_t i = 1; i < pts.size(); i++) { // then loop
  drawLine(pts[i - 1], pts[i]);
}
Mistassini answered 2/12, 2022 at 18:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.