Why is stack memory size so limited?

Asked 7/5, 2012 at 13:29 Answered 19/12, 2023 at 16:19

126

When you allocate memory on the heap, the only limit is free RAM (or virtual memory). It makes GB of memory.

So why is stack size so limited (around 1 MB)? What technical reason prevents you to create really big objects on the stack?

Update: My intent might not be clear, I do not want to allocate huge objects on the stack and I do not need a bigger stack. This question is just pure curiosity!

Muskogee answered 7/5, 2012 at 13:29 Comment(8)

Why would it be practical to create large objects on the heap? (Call chains typically go on the stack.) – Layoff 7/5, 2012 at 13:32

I think the real answer is simpler than most of the answers portray: "because it's how we've always done it, and it's been all right so far so why change?" – Essential 7/5, 2012 at 14:7

@JerryCoffin Have you read any of the answers posted so far? There is more insight into this question. – Carper 7/5, 2012 at 14:8

@user1202136: I've read all of them -- but people are guessing, and my guess is that many of the factors they're citing probably weren't even considered in making the original decisions on the subject. To coin a phrase, "sometimes a cigar is only a cigar." – Essential 7/5, 2012 at 14:16

@JerryCoffin: True, nobody was able to mention the original reasons, which are most likely related to some legacy architecture. Still, people did come up with arguments why a too big stack size would be harmful. – Carper 7/5, 2012 at 14:36

The biggest advantage of the stack is its excellent locality of reference, anything you read or write is almost always going to be present in the L1 cache and TLB. Allocating huge arrays on the stack just throws that advantage out of the window. – Tani 7/5, 2012 at 16:27

"How big should we make the default stack?" "Oh, I dunno, how many threads can we run?" "It blows up somewhere over a K" "OK, then, we'll call it 2K, we've got 2 Gig of virtual, so how about 1 meg?" "Yeah, OK, what's the next issue?" – Unroof 7/5, 2012 at 18:50

I find it really strange that there is no other reason than "it works fine this way". But maybe that's the good one after all, since no one seems to come up with a good documented answer. – Muskogee 8/5, 2012 at 9:37

My intuition is the following. The stack is not as easy to manage as the heap. The stack need to be stored in continuous memory locations. This means that you cannot randomly allocate the stack as needed, but you need to at least reserve virtual addresses for that purpose. The larger the size of the reserved virtual address space, the fewer threads you can create.

For example, a 32-bit application generally has a virtual address space of 2GB. This means that if the stack size is 2MB (as default in pthreads), then you can create a maximum of 1024 threads. This can be small for applications such as web servers. Increasing the stack size to, say, 100MB (i.e., you reserve 100MB, but do not necessarily allocated 100MB to the stack immediately), would limit the number of threads to about 20, which can be limiting even for simple GUI applications.

A interesting question is, why do we still have this limit on 64-bit platforms. I do not know the answer, but I assume that people are already used to some "stack best practices": be careful to allocate huge objects on the heap and, if needed, manually increase the stack size. Therefore, nobody found it useful to add "huge" stack support on 64-bit platforms.

Carper answered 7/5, 2012 at 13:42 Comment(5)

Many 64-bit machines have only 48-bit addresses (granted a large gain over 32-bit, but still limited). Even with additional space you have to worry about how the reservation with respect to page tables -- that is, there is always overhead in having more space. It's probably just as a cheap, if not cheaper, to allocate a new segment (mmap) instead of reserving huge stack spaces for each thread. – Lupien 7/5, 2012 at 14:7

@edA-qamort-ora-y: This answer isn't talking about allocation, it's talking about virtual memory reserveration, which is almost free, and certainly much faster than mmap. – Colombo 7/5, 2012 at 15:23

This answer is mostly incorrect. The stack is continuous in virtual memory, but it doesn't have to be in physical memory. And the memory for the stack isn't physically allocated straight away (especially if you have a 2MB stack). It's allocated as you start using it en.wikipedia.org/wiki/Demand_paging – Seamanship 15/4, 2022 at 6:43

@Seamanship I thought the answer is clear that the issue is reserving virtual address space, which -- as you highlight -- is orthogonal to allocating physical memory. – Carper 16/4, 2022 at 7:24

'My intuition is the following. The stack is not as easy to manage as the heap.' - good intuition: stack is kept non-fragmented, while heap is left more "messy". Of course defragmenting stack costs more. – Discrimination 13/10, 2023 at 15:20

One aspect that nobody has mentioned yet:

A limited stack size is an error detection and containment mechanism.

Generally, the main job of the stack in C and C++ is to keep track of the call stack and local variables, and if the stack grows out of bounds, it is almost always an error in the design and/or the behaviour of the application.

If the stack would be allowed to grow arbitrarily large, these errors (like infinite recursion) would be caught very late, only after the operating systems resources are exhausted. This is prevented by setting an arbitrary limit to the stack size. The actual size is not that important, apart from it being small enough to prevent system degradation.

Acrilan answered 16/12, 2014 at 9:14 Comment(1)

You might have similar issue with allocated objects (as some way to replace recursion is to handle a stack manually). That limitation forces to use other ways (which are not necessary safer/simpler/..) (Note the number of remark about (toy) list implementation with std::unique_ptr to write a destructor (and not relying on the smart pointer)). – Enceladus 12/6, 2020 at 20:45

It is just a default size. If you need more, you can get more - most often by telling the linker to allocate extra stack space.

The downside to having large stacks is that if you create many threads, they will need one stack each. If all the stacks are allocating multi-MBs, but not using it, the space will be wasted.

You have to find the proper balance for your program.

Some people, like @BJovke, believe that virtual memory is essentially free. It is true that you don't need to have physical memory backing all the virtual memory. You do have to be able to at least give out addresses to the virtual memory.

However, on a typical 32-bit PC the size of the virtual memory is the same as the size of the physical memory - because we only have 32 bits for any address, virtual or not.

Because all threads in a process share the same address space, they have to divide it between them. And after the operating system has taken its part, there is "only" 2-3 GB left for an application. And that size is the limit for both the physical and the virtual memory, because there just aren't any more addresses.

Roundel answered 7/5, 2012 at 13:46 Comment(4)

The biggest threading issue is that you cannot easily signal stack objects to other threads. Either the producer thread has to synchronously wait for the consumer thread to release the object or expensive and contention-generating deep copies have to be made. – Unroof 7/5, 2012 at 13:52

@MartinJames: Nobody is saying that all objects should be on the stack, we're discussing why the default stack size is small. – Colombo 7/5, 2012 at 15:24

Space will not be wasted, stack size is just an reservation of continuous virtual address space. So if you set stack size of 100 MB, RAM amount that will actually be used depends on stack consumption in threads. – Rheinlander 13/4, 2017 at 11:13

@Rheinlander - But the virtual address space will still be used up. In a 32-bit process this is limited to a few GB, so just reserving 20*100MB will cause you problems. – Roundel 13/4, 2017 at 11:17

For one thing, the stack is continuous, so if you allocate 12MB, you must remove 12MB when you want to go below whatever you created. Also moving objects around becomes much harder. Here is a real world example that may make things easier to understand:

Say you are stacking boxes around a room. Which is easier to manage:

stacking boxes of any weight on top of each other, but when you need to get something on the bottom you have to undo your entire pile. If you want to take a item out of the pile and give it to someone else you must take off all of the boxes and move the box to the other person's pile (Stack only)
You put all of your boxes (except for really small boxes) over in a special area where you do not stack stuff on top of other stuff and write down where you put it on a piece of paper (a pointer) and put the paper on the pile. If you need to give the box to someone else you just hand them the slip of paper from your pile, or just give them a photocopy of the paper and leave the original where it was in your pile. (Stack + heap)

Those two examples are gross generalizations and there are some points that are blatantly wrong in the analogy but it is close enough that it hopefully will help you see the advantages in both cases.

Pt answered 7/5, 2012 at 13:41 Comment(4)

@MooingDuck Yes, but you are working in virtual memory in your program, If I enter a subroutine, put something on the stack, then return from the subroutine I will need to either de-allocate or move the object I created before I can unwind the stack to get back to where I came from. – Pt 7/5, 2012 at 15:23

though my comment was due to a misinterpretation (and I deleted it), I still don't agree with this answer. Removing 12MB off the top of the stack is literally one opcode. It's basically free. Also compilers can and do cheat the "stack" rule, so no, they don't have to copy/move the object before unwinding to return it. So I think your comment is also incorrect. – Colombo 7/5, 2012 at 15:28

Well, it usually doesn't matter much that deallocating 12MB takes one opcode on stack over 100 on heap - it's probably below the noise level of actually processing the 12MB buffer. If compilers want to cheat when they notice that a ridiculously large object is being returned, (eg. by moving the SP before the call to make the object space part of the callers stack), then that's fine though, TBH, developers who return such objects, (rather than pointers/refs), are somewhat programming-challenged. – Unroof 7/5, 2012 at 16:3

@MartinJames: The C++ spec also says that the function can usually put the data directly into the destination buffer and not use the temporary, so if you're careful, there's no overhead to returning a 12MB buffer by value. – Colombo 7/5, 2012 at 16:13

Think of the stack in the order of near to far. Registers are close to the CPU (fast), the stack is a bit further (but still relatively close) and the heap is far away (slow access).

The stack lives on the heap ofcourse, but still, since it's being used continuously, it probably never leaves the CPU cache(s), making it faster than just the average heap access. This is a reason to keep the stack reasonably sized; to keep it cached as much as possible. Allocating big stack objects (possibly automatically resizing the stack as you get overflows) goes against this principle.

So it's a good paradigm for performance, not just a left-over from old times.

Talkington answered 17/3, 2014 at 9:39 Comment(4)

While I do believe that caching plays a big role in the reason of artificially reducing the stack size, I must correct you on the statement "the stack lives on the heap". Both the stack and the heap live in the memory (virtually or physically). – Lati 26/6, 2014 at 11:4

How is "near or far" related to access speed? – Geriatric 18/12, 2019 at 11:42

@MinhNghĩa Well, variables in RAM gets cached in L2 memory, then that is cached in L1 memory, and then even those get cached in the registers. Access to RAM is slow, to L2 is faster, L1 is faster still, and register is fastest. What I think OP meant is that variables stored in stack are supposed to be accessed quickly, so the CPU will attempts its best to keep stack variables close to it, hence you want to make it small, hence the CPU can access the variables faster. – Grout 24/9, 2020 at 8:20

@157239n under "stack variables" you mean any stack as a whole ? because if it's values can be chached separately then there is no point in the size limit in regard to this answer . – Zurkow 16/4, 2023 at 10:37

Allocating large objects in a, say, 100MB stack would make it impossible on most machines to have them loaded at once into cache, which pretty much defeats the purpose of the stack.

The point of the stack is to have small objects that belong to the same scope (and are, therefore, usually needed together or close to each other) stored together in contiguous memory addresses, so that the program can have them all loaded into cache at the same time, minimizing cache misses and, in general, the time CPU has to wait until it gets some missing piece of data from the slower RAM.

A 50MB object stored in the stack would not fit into the cache, meaning after every cache line there would be a CPU waiting time until the next piece of data is brought from RAM, meaning one would be clogging the call stack and not getting any significant benefit (in terms of speed) as compared to loading from the heap.

Setula answered 7/4, 2021 at 16:11 Comment(0)

Many of the things you think you need a big stack for, can be done some other way.

Sedgewick's "Algorithms" has a couple good examples of "removing" recursion from recursive algorithms such as QuickSort, by replacing the recursion with iteration. In reality, the algorithm is still recursive, and there is still as stack, but you allocate the sorting stack on the heap, rather than using the runtime stack.

(I favor the second edition, with algorithms given in Pascal. It can be had used for eight bucks.)

Another way to look at it, is if you think you need a big stack, your code is inefficient. There is a better way that uses less stack.

Amoeboid answered 16/12, 2014 at 10:28 Comment(0)

If you could have an infinte stack then every virtual address could potentially be used by the stack. If the stack can use evey address, then there is no place for the heap to go. Every address you picked for a heap variable could be overwritten by a growing stack.

To put it another way, variables on the stack and variables on the heap occupy the same virtual address space. We need some way of preventing the heap allocator from allocating data where the stack might grow into. A stack size is an easy way to do it. The heap allocator knows that the stack addresses are taken and so it uses something else.

Chutney answered 23/2, 2022 at 4:38 Comment(0)

Stacks in the past were expensive, since there wasn't much memory. And stacks generally use a bit of local variables per function call, large objects generally have a larger lifespan so are not local variables but live at a higher scope instead (application- or thread-wide). Then, recursively diving into functions hardly ever go really deep, not much further than 30 functions or so. So 30x a reasonable amount of stack space and you have covered 99% of your needs really.

Other than that, a stack is supersimple with a stack pointer. Reserving 16 bytes? Just move the stack pointer by 16 bytes, nothing else needs to be done (except potentially check it, but an implementation could use page misses for that).

People keeping large objects on the stack are just implicitly using memcpy() more than they should, and create slower code.

Talkington answered 19/12, 2023 at 16:19 Comment(0)

-10

I don't think there is any technical reason, but it would be a strange app that just created just one huge super-object on the stack. Stack objects lack flexibility that becomes more problematic with increasing size - you cannot return without destroying them and you cannot queue them to other threads.

Unroof answered 7/5, 2012 at 13:48 Comment(6)

Nobody is saying that all objects should be on the stack, we're discussing why the default stack size is small. – Colombo 7/5, 2012 at 15:24

It's is not small! Just how many function calls would you have to get through to use up 1MB of stack? The defaults are anyway easily changed in the linker and so, we are left with 'why use stack instead of heap?' – Unroof 7/5, 2012 at 15:45

one function call. int main() { char buffer[1048576]; } It's a very common newbie problem. Sure there's an easy workaround, but why should we have to workaround the stack size? – Colombo 7/5, 2012 at 15:50

Well, for one thing, I woudn't want the 12MB, (or indeed, 1MB), of stack requirement inflicted on the stack of every thread that calls the afflicted function. That said, I have to agree that 1MB is a little stingy. I would be happy with a default 100MB, after all, there's nothing stopping me turning it down to 128K in just the same way as there's nothing stopping other developers turning it up. – Unroof 7/5, 2012 at 16:13

Why wouldn't you want to inflict 12MB of stack on your thread? The only reason for that is because stacks are small. That's a recursive argument. – Colombo 7/5, 2012 at 16:15

Stacks are not small. 128K max size works just fine on my apps - that's getting towards small. 1MB is far too much for most apps but, well, only matters with apps that run ~2K threads. I have never felt the slightest inkling to allocate even 100K of stack space for anything. Anyway I answered the OP question - I know of no technical reason to limit stack space to 1MB. It's not even a firm limit - it's per-app configurable. – Unroof 7/5, 2012 at 18:47

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags