Accessing an array out of bounds gives no error, why?
Asked Answered
B

18

243

I am assigning values in a C++ program out of the bounds like this:

#include <iostream>
using namespace std;
int main()
{
    int array[2];
    array[0] = 1;
    array[1] = 2;
    array[3] = 3;
    array[4] = 4;
    cout << array[3] << endl;
    cout << array[4] << endl;
    return 0;
}

The program prints 3 and 4. It should not be possible. I am using g++ 4.3.3

Here is compile and run command

$ g++ -W -Wall errorRange.cpp -o errorRange
$ ./errorRange
3
4

Only when assigning array[3000]=3000 does it give me a segmentation fault.

If gcc doesn't check for array bounds, how can I be sure if my program is correct, as it can lead to some serious issues later?

I replaced the above code with

vector<int> vint(2);
vint[0] = 0;
vint[1] = 1;
vint[2] = 2;
vint[5] = 5;
cout << vint[2] << endl;
cout << vint[5] << endl;

and this one also produces no error.

Badgett answered 6/8, 2009 at 16:12 Comment(24)
Related question: #672203Literature
The code is buggy, of course, but it generates undefined behavior. Undefined means it may or may not run to completion. There is no guarantee of a crash.Proudman
"If gcc doesnt check for array bounds, how I can be sure if my program if full correct ?" This is your problem. Welcome to close-to-the-metal programming. But with c++ there are at least ways to finnesse the issue, plain old c gives you even less semantic support for avoiding this problem.Proudman
You can be sure your program is correct by not screwing around with raw arrays. C++ programmers should use container classes instead, except in embedded/OS programming. Read this for reasons to user containers. parashift.com/c++-faq-lite/containers.htmlGorgias
@Hooked: With all due respect, I believe your advice may be too strong. While it's a good idea to use List<> and other containers, there's still a place for raw arrays, even in applications development.Transmigrant
@Hooked: I should have said vector<>, not List<>. Sorry, C# on the brain.Transmigrant
As to your edit: verifying the correctness of a program is (and has been) a very hard problem that many have spent much time researching. In the end (especially in memory unsafe languages like C/C++) the best you can do is use the static/dynamic analysis tools available to you and program defensively.Microclimatology
Bear in mind that vectors do not necessarily range-check using []. Using .at() does the same thing as [] but does range-check.Lithea
That's fine, it is just my opinion. I've never worked on low-level projects where speed was REALLY necessary, so consider me an armchair general.Gorgias
@David: Use iterators! Please don't encourage indices with vectors =(Gorgias
Also OP, you got quite lucky in this example. In most other functions, you probably would've clobbered your return address (at least on x86) with those writes, which would've caused a segfault on function return. gcc tends to allocate some extra space for main's stack frame to align it to 16 bytes.Microclimatology
It seems STL/containers are also not 100 % perfect :) ... Oh yeah they are! Your std::vector is resizing itself to handle the extra elements.Seppala
@plastic: if they are, why it doesnt give me error when I am using [] in above code.Badgett
@plastic: I did not know that a vector resizes with out of bound accesses. @seg: The same reason it doesn't give you an error when using [] in a regular array.Gorgias
@Hooked: In one project, I recoded a method to use an array instead of concatenating to a string, and this gave a three-fold improvement in performance. Raw can be much faster than cooked.Transmigrant
Downvotes: Can you please explain what was wrong in my question to get downvotes, so that I can improve the mistake ??Badgett
@Steven: I understand that raw arrays are generally faster. Just balance readability with efficiency. Use a profiler and stick with std::vector whenever possible.Gorgias
@Steven: In the tests I've done, I haven't ever seen a measurable performance difference between vectors and arrays. (assuming heap-allocated arrays, of course. For stack-allocated, vectors are at a disadvantage of course). A decent vector implementation on a decent compiler will inline everything resulting in runtime overhead equivalent to accessing a heap-allocated array. @plastic chris: No, a vector does not automatically resize itself if you write out of bounds, if that's what you mean. It resizes itself if you use push_back to insert elements of course.Eakin
A vector does not auto-resize when accessing out-of-bounds elements! It's just U.B.!Dutra
For stack-allocated arrays with bound checking, one may use std::tr1::array (and use Boost implementation if it doesn't come with your compiler out of the box).Dutra
Also, some STL implementations offer "checked" container and iterator operations, catching things such as out-of-bounds indices and iterators, or using an iterator invalidated by modifying the container. One example of such is VC++2005 and above, which does that by default in debug builds. A very handy feature.Dutra
@Pavel: Less handy is, of course, the ODR violations that come with the feature unless you control all libraries you might want to link with. And the insane slowdown it causes in some cases. ;)Eakin
@jalf: In the example I gave, a char[] was being contrasted against a string, not a vector<char>, and most of the speed boost came from reusing a fixed-size buffer instead of reallocating.Transmigrant
@Steven: Ah, fair enough then. :)Eakin
E
486

Welcome to every C/C++ programmer's bestest friend: Undefined Behavior.

There is a lot that is not specified by the language standard, for a variety of reasons. This is one of them.

In general, whenever you encounter undefined behavior, anything might happen. The application may crash, it may freeze, it may eject your CD-ROM drive or make demons come out of your nose. It may format your harddrive or email all your porn to your grandmother.

It may even, if you are really unlucky, appear to work correctly.

The language simply says what should happen if you access the elements within the bounds of an array. It is left undefined what happens if you go out of bounds. It might seem to work today, on your compiler, but it is not legal C or C++, and there is no guarantee that it'll still work the next time you run the program. Or that it hasn't overwritten essential data even now, and you just haven't encountered the problems, that it is going to cause — yet.

As for why there is no bounds checking, there are a couple aspects to the answer:

  • An array is a leftover from C. C arrays are about as primitive as you can get. Just a sequence of elements with contiguous addresses. There is no bounds checking because it is simply exposing raw memory. Implementing a robust bounds-checking mechanism would have been almost impossible in C.
  • In C++, bounds-checking is possible on class types. But an array is still the plain old C-compatible one. It is not a class. Further, C++ is also built on another rule which makes bounds-checking non-ideal. The C++ guiding principle is "you don't pay for what you don't use". If your code is correct, you don't need bounds-checking, and you shouldn't be forced to pay for the overhead of runtime bounds-checking.
  • So C++ offers the std::vector class template, which allows both. operator[] is designed to be efficient. The language standard does not require that it performs bounds checking (although it does not forbid it either). A vector also has the at() member function which is guaranteed to perform bounds-checking. So in C++, you get the best of both worlds if you use a vector. You get array-like performance without bounds-checking, and you get the ability to use bounds-checked access when you want it.
Eakin answered 6/8, 2009 at 16:18 Comment(13)
@Jaif : we have been using this array thing for so long, but still why are there no test to check such simple error ?Badgett
C++ design principle was that it shouldn't be slower than the equivalent C code, and C doesn't do array bound checking. C design principle was basically speed as it was aimed for system programming. Array bound checking takes time, and so is not done. For most uses in C++, you should be using a container rather than array anyway, and you can have your choice of bound check or no bound check by either accessing an element via .at() or [] respectively.Capitular
@seg Such a check costs something. If you write correct code, you don't want to pay that price. Having said that, I've become a complete convert to std::vector's at() method, which IS checked. Using it has exxposed quite a few errors in what I thought was "correct" code.Mailand
@seg.server.fault: Because c++ lets you program close to the metal, and automatic bounds checking 1) costs cycles and 2) gets in the way of some clever hack that you should avoid using. This is a feature, not a bug. If you want a bounds checking array-like object, create one.Proudman
Because C++ lets the programmer decide what is an error. I don't know how other languages do bounds checking, but if it is at runtime, there would be an associated overhead. C++ is often used for application, game, and OS development, where speed is necessary.Gorgias
@Hooked: Actually, C++ is very clear on what is an error. Out of bounds accesses are clear errors. The C++ standard just does not specify the consequences of writing erroneous code.Eakin
I believe old versions of GCC actually launched Emacs and an a simulation of Towers of Hanoi in it, when it encountered certain types of undefined behavior. Like I said, anything may happen. ;)Eakin
Everythings already been said, so this only warrants a small addendum. Debug builds can be very forgiving in these circumstances when compared to release builds. Due to debug information being included in debug binaries, there's less of a chance that something vital is overwritten. That's sometimes why the debug builds seem to work fine whilst the release build crash.Exhortation
undefined behavior is evil. I am wondering if cpp committee could give some Supplementary Provisions. At least for debug version, sometimes when we debug the code, we don't need performance, but we do need the expected behavior.Juback
@Juback compiler vendors are perfectly at liberty to define particular behaviours for particular cases of UB. Many do have bounds checked [] in debug builds, which is perfectly compliant. The standard does not have to care about the existence of "debug" and "release" configurationsSepticemia
can this cause code to work on one machine, and not work on a different machine?Townsley
There are security features for hardening your code that will result in more expected behavior. At the cost of speed and resources. This is used when hardening a source-based Linux distribution. Of course, they are compiler dependent. And you shouldn't rely on these features because they can be circumvented. I wish they were often turned on by default while debugging.Oenone
I just tried accessing arr[3] in a native array of length 3 and all my porn got emailed to my 👵Douville
R
42

Using g++, you can add the command line option: -fstack-protector-all.

On your example it resulted in the following:

> g++ -o t -fstack-protector-all t.cc
> ./t
3
4
/bin/bash: line 1: 15450 Segmentation fault      ./t

It doesn't really help you find or solve the problem, but at least the segfault will let you know that something is wrong.

Reorganization answered 6/8, 2009 at 17:20 Comment(5)
I just found even a better option: -fmudflapStadler
@Hi-Angel: Modern equivalent is -fsanitize=address which catches this bug both at compile time (if optimizing) and at runtime.Evans
@NateEldredge +1, nowadays I even use -fsanitize=undefined,address. But it's worth noting that there are rare corner cases with std library, when out of bounds access is not detected by sanitizer. For this reason I'd recommend to additionally use -D_GLIBCXX_DEBUG option, which adds even more checks.Stadler
Thank you Hi-Angel. when -fmudflap and -fsanitize=address didn't work for me, -fsanitize=undefined,address found not only a function that wasn't returning an value, it also found the array assignment that was happening out of bounds.Schrimsher
Do we have this on other compilers, i.e apple clang?Hippel
T
15

g++ does not check for array bounds, and you may be overwriting something with 3,4 but nothing really important, if you try with higher numbers you'll get a crash.

You are just overwriting parts of the stack that are not used, you could continue till you reach the end of the allocated space for the stack and it'd crash eventually

EDIT: You have no way of dealing with that, maybe a static code analyzer could reveal those failures, but that's too simple, you may have similar(but more complex) failures undetected even for static analyzers

Transfix answered 6/8, 2009 at 16:15 Comment(2)
Where do you get if from that at the address of array[3] and array[4], there is "nothing really important"??Kapor
What numbers might be considered higher numbers?Hippel
G
10

It's undefined behavior as far as I know. Run a larger program with that and it will crash somewhere along the way. Bounds checking is not a part of raw arrays (or even std::vector).

Use std::vector with std::vector::iterator's instead so you don't have to worry about it.

Edit:

Just for fun, run this and see how long until you crash:

int main()
{
   int arr[1];

   for (int i = 0; i != 100000; i++)
   {
       arr[i] = i;
   }

   return 0; //will be lucky to ever reach this
}

Edit2:

Don't run that.

Edit3:

OK, here is a quick lesson on arrays and their relationships with pointers:

When you use array indexing, you are really using a pointer in disguise (called a "reference"), that is automatically dereferenced. This is why instead of *(array+1), array[1] automatically returns the value at that index.

When you have a pointer to an array, like this:

int arr[5];
int *ptr = arr;

Then the "array" in the second declaration is really decaying to a pointer to the first array. This is equivalent behavior to this:

int *ptr = &arr[0];

When you try to access beyond what you allocated, you are really just using a pointer to other memory (which C++ won't complain about). Taking my example program above, that is equivalent to this:

int main()
{
   int arr[1];
   int *ptr = arr;

   for (int i = 0; i != 100000; i++, ptr++)
   {
       *ptr++ = i;
   }

   return 0; //will be lucky to ever reach this
}

The compiler won't complain because in programming, you often have to communicate with other programs, especially the operating system. This is done with pointers quite a bit.

Gorgias answered 6/8, 2009 at 16:16 Comment(4)
I think you forgot to increment "ptr" in your last example there. You've accidentally produced some well-defined code.Pander
Haha, see why you shouldn't be using raw arrays?Gorgias
"This is why instead of *(array[1]), array[1] automatically returns the value at that value." Are you sure *(array[1]) will work properly? I think it should be *(array + 1). p.s : Lol, it is like sending a message to the past. But, anyway:Littles
@Littles lol, you spoke to the past and the past responded. Edited with your suggested changes.Gorgias
S
5

Run this through Valgrind and you might see an error.

As Falaina pointed out, valgrind does not detect many instances of stack corruption. I just tried the sample under valgrind, and it does indeed report zero errors. However, Valgrind can be instrumental in finding many other types of memory problems, it's just not particularly useful in this case unless you modify your bulid to include the --stack-check option. If you build and run the sample as

g++ --stack-check -W -Wall errorRange.cpp -o errorRange
valgrind ./errorRange

valgrind will report an error.

Sunbonnet answered 6/8, 2009 at 16:19 Comment(3)
Actually, Valgrind is quite poor at determining incorrect array accesses on the stack. (and rightfully so, the best it can do is mark the entire stack as a valid write location )Microclimatology
@Microclimatology - good point, but Valgrind can detect at least some stack errors.Sunbonnet
And valgrind will see nothing wrong with the code because the compiler is smart enough to optimize the array away and simply output a literal 3 and 4. That optimization happens before gcc checks the array bounds which is why the out-of-bounds warning gcc does have is not shown.Gibbs
R
5

Hint

If you want to have fast constraint size arrays with range error check, try using boost::array, (also std::tr1::array from <tr1/array> it will be standard container in next C++ specification). It's much faster then std::vector. It reserve memory on heap or inside class instance, just like int array[].
This is simple sample code:

#include <iostream>
#include <boost/array.hpp>
int main()
{
    boost::array<int,2> array;
    array.at(0) = 1; // checking index is inside range
    array[1] = 2;    // no error check, as fast as int array[2];
    try
    {
       // index is inside range
       std::cout << "array.at(0) = " << array.at(0) << std::endl;

       // index is outside range, throwing exception
       std::cout << "array.at(2) = " << array.at(2) << std::endl; 

       // never comes here
       std::cout << "array.at(1) = " << array.at(1) << std::endl;  
    }
    catch(const std::out_of_range& r)
    {
        std::cout << "Something goes wrong: " << r.what() << std::endl;
    }
    return 0;
}

This program will print:

array.at(0) = 1
Something goes wrong: array<>: index out of range
Ricker answered 6/8, 2009 at 18:11 Comment(1)
Note for readers: Outdated answer. Since C++11 it should be #include<array> and std::array from the standard library instead of the boost equivalents.Ternate
S
4

C or C++ will not check the bounds of an array access.

You are allocating the array on the stack. Indexing the array via array[3] is equivalent to *(array + 3), where array is a pointer to &array[0]. This will result in undefined behavior.

One way to catch this sometimes in C is to use a static checker, such as splint. If you run:

splint +bounds array.c

on,

int main(void)
{
    int array[1];

    array[1] = 1;

    return 0;
}

then you will get the warning:

array.c: (in function main) array.c:5:9: Likely out-of-bounds store: array[1] Unable to resolve constraint: requires 0 >= 1 needed to satisfy precondition: requires maxSet(array @ array.c:5:9) >= 1 A memory write may write to an address beyond the allocated buffer.

Stevenstevena answered 6/8, 2009 at 16:18 Comment(2)
Correction: it's already been allocated by the OS or another program. He is overwriting other memory.Gorgias
Saying that "C/C++ will not check the bounds" isn't entirely correct - there's nothing precluding a particular compliant implementation from doing so, either by default, or with some compilation flags. It's just that none of them bother.Dutra
A
3

You are certainly overwriting your stack, but the program is simple enough that effects of this go unnoticed.

Atmometer answered 6/8, 2009 at 16:17 Comment(1)
Whether the stack is overwritten or not depends on the platform.Dearly
H
2

libstdc++, which is part of gcc, has a special debug mode for error checking. It is enabled by compiler flag -D_GLIBCXX_DEBUG. Among other things it does bounds checking for std::vector at the cost of performance. Here is online demo with recent version of gcc.

So actually you can do bounds checking with libstdc++ debug mode but you should do it only when testing because it costs notable performance compared to normal libstdc++ mode.

Holmes answered 10/6, 2017 at 19:48 Comment(2)
Do we have this on other compilers, i.e apple clang?Hippel
Yes, LLVM implementation of C++ Standard Library also has a similar debug mode, see libcxx.llvm.org/DesignDocs/DebugMode.html.Holmes
K
1

Undefined behavior working in your favor. Whatever memory you're clobbering apparently isn't holding anything important. Note that C and C++ do not do bounds checking on arrays, so stuff like that isn't going to be caught at compile or run time.

Kibosh answered 6/8, 2009 at 16:18 Comment(2)
No, Undefined behavior "works in your favor" when it crashes cleanly. When it appears to work, that's about the worst possible scenario.Eakin
@JohnBode: Then it would be better if you correct wording as per jalf's commentAurist
A
1

When you write 'array[index]' in C it translates it to machine instructions.

The translation is goes something like:

  1. 'get the address of array'
  2. 'get the size of the type of objects array is made up of'
  3. 'multiply the size of the type by index'
  4. 'add the result to the address of array'
  5. 'read what's at the resulting address'

The result addresses something which may, or may not, be part of the array. In exchange for the blazing speed of machine instructions you lose the safety net of the computer checking things for you. If you're meticulous and careful it's not a problem. If you're sloppy or make a mistake you get burnt. Sometimes it might generate an invalid instruction that causes an exception, sometimes not.

Amiss answered 6/8, 2009 at 19:34 Comment(0)
T
0

When you initialize the array with int array[2], space for 2 integers is allocated; but the identifier array simply points to the beginning of that space. When you then access array[3] and array[4], the compiler then simply increments that address to point to where those values would be, if the array was long enough; try accessing something like array[42] without initializing it first, you'll end up getting whatever value happened to already be in memory at that location.

Edit:

More info on pointers/arrays: http://home.netcom.com/~tjensen/ptr/pointers.htm

Telluride answered 6/8, 2009 at 16:21 Comment(0)
T
0

As I understand, local variables are allocated on stack, so going out of bounds on your own stack can only overwrite some other local variable, unless you go oob too much and exceed your stack size. Since you have no other variables declared in your function - it does not cause any side effects. Try declaring another variable/array right after your first one and see what will happen with it.

Though answered 6/8, 2009 at 19:25 Comment(0)
B
0

A nice approach that i have seen often and I had been used actually is to inject some NULL type element (or a created one, like uint THIS_IS_INFINITY = 82862863263;) at end of the array.

Then at the loop condition check, TYPE *pagesWords is some kind of pointer array:

int pagesWordsLength = sizeof(pagesWords) / sizeof(pagesWords[0]);

realloc (pagesWords, sizeof(pagesWords[0]) * (pagesWordsLength + 1);

pagesWords[pagesWordsLength] = MY_NULL;

for (uint i = 0; i < 1000; i++)
{
  if (pagesWords[i] == MY_NULL)
  {
    break;
  }
}

This solution won't word if array is filled with struct types.

Bludge answered 9/10, 2013 at 18:12 Comment(0)
L
0

As mentioned now in the question using std::vector::at will solve the problem and make a bound check before accessing.

If you need a constant size array that is located on the stack as your first code use the C++11 new container std::array; as vector there is std::array::at function. In fact the function exists in all standard containers in which it have a meaning,i.e, where operator[] is defined :( deque, map, unordered_map) with the exception of std::bitset in which it is called std::bitset::test.

Lindquist answered 10/5, 2015 at 21:17 Comment(0)
S
0

If you change your program slightly:

#include <iostream>
using namespace std;
int main()
{
    int array[2];
    INT NOTHING;
    CHAR FOO[4];
    STRCPY(FOO, "BAR");
    array[0] = 1;
    array[1] = 2;
    array[3] = 3;
    array[4] = 4;
    cout << array[3] << endl;
    cout << array[4] << endl;
    COUT << FOO << ENDL;
    return 0;
}

(Changes in capitals -- put those in lower case if you're going to try this.)

You will see that the variable foo has been trashed. Your code will store values into the nonexistent array[3] and array[4], and be able to properly retrieve them, but the actual storage used will be from foo.

So you can "get away" with exceeding the bounds of the array in your original example, but at the cost of causing damage elsewhere -- damage which may prove to be very hard to diagnose.

As to why there is no automatic bounds checking -- a correctly written program does not need it. Once that has been done, there is no reason to do run-time bounds checking and doing so would just slow down the program. Best to get that all figured out during design and coding.

C++ is based on C, which was designed to be as close to assembly language as possible.

Shawntashawwal answered 8/11, 2017 at 15:26 Comment(1)
There is no guarantee that this will happen, but it may happen.Ternate
B
-1

when you declare int array[2]; you reserve 2 memory spaces of 4 bytes each(32bit program). if you type array[4] in your code it still corresponds to a valid call but only at run time will it throw an unhandled exception. C++ uses manual memory management. This is actually a security flaw that was used for hacking programs

this can help understanding:

int * somepointer;

somepointer[0]=somepointer[5];

Bejewel answered 6/8, 2009 at 16:44 Comment(0)
C
-1

The behavior can depend on your system. Typically, you will have a margin for out of bounds, sometimes with value of 0 or garbage values. For the details you can check with memory allocation mechanism used in your OS. On top of that, if you use the programming language like c/c++, it will not check the bounds when you using some containers, like array. So, you will meet "undefined event" because you do not know what the OS did below the surface. But like the programming language Java, it will check the bound. If you step outside of the bound, you will get an exception.

Condemnation answered 15/4, 2021 at 0:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.