Is there a max array length limit in C++?

W

12

210

Is there a max length for an array in C++?

Is it a C++ limit or does it depend on my machine? Is it tweakable? Does it depend on the type the array is made of?

Can I break that limit somehow or do I have to search for a better way of storing information? And what should be the simplest way?

What I have to do is storing long long int on an array, I'm working in a Linux environment. My question is: what do I have to do if I need to store an array of N long long integers with N > 10 digits?

I need this because I'm writing some cryptographic algorithm (as for example the p-Pollard) for school, and hit this wall of integers and length of arrays representation.

Waverly answered 19/10, 2008 at 10:39 Comment(0)

E

174

There are two limits, both not enforced by C++ but rather by the hardware.

The first limit (should never be reached) is set by the restrictions of the size type used to describe an index in the array (and the size thereof). It is given by the maximum value the system's std::size_t can take. This data type is large enough to contain the size in bytes of any object

The other limit is a physical memory limit. The larger your objects in the array are, the sooner this limit is reached because memory is full. For example, a vector<int> of a given size n typically takes multiple times as much memory as an array of type vector<char> (minus a small constant value), since int is usually bigger than char. Therefore, a vector<char> may contain more items than a vector<int> before memory is full. The same counts for raw C-style arrays like int[] and char[].

Additionally, this upper limit may be influenced by the type of allocator used to construct the vector because an allocator is free to manage memory any way it wants. A very odd but nontheless conceivable allocator could pool memory in such a way that identical instances of an object share resources. This way, you could insert a lot of identical objects into a container that would otherwise use up all the available memory.

Apart from that, C++ doesn't enforce any limits.

Exerciser answered 19/10, 2008 at 10:44 Comment(11)

Also you can normally easily hit stack size limits, especially if using threads which again is implementation specific (but able to be changed). – Haplology 19/10, 2008 at 10:49

@Alaric: True. I didn't want to go too deep into system specifics because they differ very much and I'm no expert in any of them. – Exerciser 19/10, 2008 at 10:53

@Konrad, interesting point about allocator types and not something I was aware of. Thanks for the info. – Kropotkin 19/10, 2008 at 11:58

std::size_t is usually (always?) the size of a pointer, not the size of the biggest integer that has native hardware support in the integer math unit. On every x86 OS I've used, size_t is 32-bits for a 32-bit OS and 64-bits for a 64-bit OS. – Bruni 18/12, 2009 at 22:36

My understanding is that the maximum limit of an array is the maximum value of the processor's word. This is due to the indexing operator. For example, a machine may have a word size of 16 bits but an addressing register of 32 bits. A chunk of memory is limited in size by the parameter passed to new or malloc. A chunk of memory larger than an array can be accessed via pointer. – Martlet 18/12, 2009 at 23:24

Assuming wikipedia is correct ( en.wikipedia.org/wiki/Word_(computing) ), on x86, a word is 16-bits (and this is consistent with Microsoft's terminology at very least). I wonder what size_t would be for real mode 8088 (since the address space is 20 bits, it uses a 32-bit representation, but standard pointer arithmetic only works in 16 bit chunks). – Bruni 19/12, 2009 at 0:4

sorry for commenting on ancient reply, but take a look at this: Oct 15, 2011 at 8:06 PM comment : channel9.msdn.com/Shows/Checking-In-with-Erik-Meijer/… basically ptrdiff_t is signed, so that limits the size of array in 32b apps to 2^31, while size_t can support up to 2^32 – Systole 13/3, 2012 at 17:26

@Systole Is the array size / index encoded as a ptrdiff_t, then? – Exerciser 13/3, 2012 at 17:36

AFAIK no, but if I understand the link correctly you cant do ptr arithmetics on char arrays that have 2+GElements. Unfortunately all my OSes are 64b so I cant try to allocate 2^31+1 char array and do the pointer diff. – Systole 13/3, 2012 at 18:27

@Gabriel In 64 bit system, address of a location is stored in 8 bytes. So theoretically, maximum index of a byte array in a process would be (2^64)-1 isn't it? If I have more hardware memory than that, then also a process can't cross that limit. Is this assumption correct? Also, this limit is implied by the system itself, and not by the language right? – Sp 12/7, 2021 at 18:48

@SouravKannanthaB Yes, the limit is implied by the system rather than by the language (to the extent that the language limits itself to use natively supported data types for integers). And yes, if your hypothetical computer has more hardware than that, you won’t be able to address it. But neither will the operating system. And your computer won’t have more hardware than that: that’s 10 billion gigabytes. The world’s largest supercomputer has a tiny fraction of that. – Exerciser 13/7, 2021 at 8:5

T

195

Nobody mentioned the limit on the size of the stack frame.

There are two places memory can be allocated:

On the heap (dynamically allocated memory).
The size limit here is a combination of available hardware and the OS's ability to simulate space by using other devices to temporarily store unused data (i.e. move pages to hard disk).
On the stack (Locally declared variables).
The size limit here is compiler defined (with possible hardware limits). If you read the compiler documentation you can often tweak this size.

Thus if you allocate an array dynamically (the limit is large and described in detail by other posts.

int* a1 = new int[SIZE];  // SIZE limited only by OS/Hardware

Alternatively if the array is allocated on the stack then you are limited by the size of the stack frame. N.B. vectors and other containers have a small presence in the stack but usually the bulk of the data will be on the heap.

int a2[SIZE]; // SIZE limited by COMPILER to the size of the stack frame

Theorist answered 19/10, 2008 at 17:52 Comment(12)

Preferred allocation of large arrays is not on a stack or globally defined but rather through dynamic allocation (via new or malloc). – Martlet 18/12, 2009 at 23:13

@Thomas Matthews: Not in my world. Dynamically allocated objects require management. If it needs to dynamically allocated I would use a stack object that representes the dynamically allocated memoory, like a std::vector. – Theorist 18/12, 2009 at 23:16

@LokiAstari int* a1 = new int[SIZE] – Frolic 25/1, 2013 at 21:37

There is one cornor case missing: Global Arrays, while not a beauty and best avoided, these do not fall under the restrictions of the stack, and you do not need malloc/free do work with them. – Subirrigate 25/8, 2013 at 15:50

@ted, why should global arrays be "best avoided"? To be more precise I think you mean statically allocated arrays. Their scope does not have to be global. I would argue they are better than dynamic arrays because you can use absolute addressing with them (at least on Linux) which you can't do with dynamically allocated arrays. – Massingill 23/10, 2014 at 11:42

@Zboson: NO I believe he means global accessible arrays. Because global mutable state is usually bad. Functions no longer depend on just their inputs but also on global state (that can be mutated elsewhere in the code). This makes unit test exceedingly hard to write and functional tests harder still. programmers.stackexchange.com/questions/148108/… – Theorist 23/10, 2014 at 14:1

@Zboson see Loki Astari s comment above, he explains it. Why do you think absolute addressing is such a benefit? What ever way you put it an array access with index is basically pointer to base plus index and once you have that base pointer loaded there is (afaik) no more performance penalty. – Subirrigate 23/10, 2014 at 16:15

@ted, in that case your "one corner case" was a good point but is too limited. Instead of saying a global array you should say a statically allocated array. foo() { static int x[10]; } is a statically allocated array the only difference between it and a global array is the scope. So although a global array may best be avoided a statically allocated array with function scope is a interesting case which is often overlooked. – Massingill 23/10, 2014 at 19:46

@ted, i regards to absolute addressing some ports such as port 7 on Haswell only work with [register + constant] and not [register + register]. Also microp-fusion as far as I can tell requires [register + constant]. – Massingill 23/10, 2014 at 19:48

@Zboson what do you mean by port? And yes while there are static arrays and I overlooked those you should mind the difference between global and static, which includes as you pointed out correctly, the different scope. – Subirrigate 24/10, 2014 at 12:41

@ted, port7 is described at anandtech.com/show/6355/intels-haswell-architecture/8. What other difference is there between a global array and one declared with the static keyword than scope? – Massingill 24/10, 2014 at 12:45

Very important point. I recently came across a "production-quality" open-source project that provided a configurable max-buffer size. All of the buffers were allocated on the stack, so configuring a large enough value would cause the program to immediately segfault on launch. – Chitkara 9/12, 2014 at 5:15