(Added 26 Sept. 2020)
On 24 Oct. 2009, as @pixelbeat first pointed out here, Bruno Haible empirically discovered the following default thread stack sizes for several systems. He said that in a multithreaded program, "the default thread stack size is" as follows. I added in the "Actual" size column because @Peter.Cordes indicates in his comments below my answer, however, that the odd tested numbers shown below do not include all of the thread stack, since some of it was used in initialization. If I run ulimit -s
to see "the maximum stack size" that my Linux computer is configured for, it outputs 8192
kB, which is exactly 8 MB, not the odd 7.4 MB listed in the table below for my x86-64 computer with the gcc compiler and glibc. So, you can probably add a little to the numbers in the table below to get the actual full stack size for a given thread.
Note also that the below "Tested" column units are all in MB and KB (base 1000 numbers), NOT MiB and KiB (base 1024 numbers). I've proven this to myself by verifying the 7.4 MB case.
Thread stack sizes
System and std library Tested Actual
---------------------- ------ ------
- glibc i386, x86_64 7.4 MB 8 MiB (8192 KiB, as shown by `ulimit -s`)
- Tru64 5.1 5.2 MB ?
- Cygwin 1.8 MB ?
- Solaris 7..10 1 MB ?
- MacOS X 10.5 460 KB ?
- AIX 5 98 KB ?
- OpenBSD 4.0 64 KB ?
- HP-UX 11 16 KB ?
Bruno Haible also stated that:
32 KB is more than you can safely allocate on the stack in a multithreaded program
And he said:
And the default stack size for sigaltstack, SIGSTKSZ, is
- only 16 KB on some platforms: IRIX, OSF/1, Haiku.
- only 8 KB on some platforms: glibc, NetBSD, OpenBSD, HP-UX, Solaris.
- only 4 KB on some platforms: AIX.
Bruno
He wrote the following simple Linux C program to empirically determine the above values. You can run it on your system today to quickly see what your maximum thread stack size is, or you can run it online on GDBOnline here: https://onlinegdb.com/rkO9JnaHD.
Explanation: It simply creates a single new thread, so as to check the thread stack size and NOT the program stack size, in case they differ, then it has that thread repeatedly allocate 128 bytes of memory on the stack (NOT the heap), using the Linux alloca()
call, after which it writes a 0 to the first byte of this new memory block, and then it prints out how many total bytes it has allocated. It repeats this process, allocating 128 more bytes on the stack each time, until the program crashes with a Segmentation fault (core dumped)
error. The last value printed is the estimated maximum thread stack size allowed for your system.
Important note: alloca()
allocates on the stack: even though this looks like dynamic memory allocation onto the heap, similar to a malloc()
call, alloca()
does NOT dynamically allocate onto the heap. Rather, alloca()
is a specialized Linux function to "pseudo-dynamically" (I'm not sure what I'd call this, so that's the term I chose) allocate directly onto the stack as though it was statically-allocated memory. Stack memory used and returned by alloca()
is scoped at the function-level, and is therefore "automatically freed when the function that called alloca()
returns to its caller." That's why its static scope isn't exited and memory allocated by alloca()
is NOT freed each time a for
loop iteration is completed and the end of the for
loop scope is reached. See man 3 alloca
for details. Here's the pertinent quote (emphasis added):
DESCRIPTION
The alloca()
function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca()
returns to its caller.
RETURN VALUE
The alloca()
function returns a pointer to the beginning of the allocated space. If the allocation causes stack overflow, program behavior is undefined.
Here is Bruno Haible's program from 24 Oct. 2009, copied directly from the GNU mailing list here:
Again, you can run it live online here.
// By Bruno Haible
// 24 Oct. 2009
// Source: https://lists.gnu.org/archive/html/bug-coreutils/2009-10/msg00262.html
// =============== Program for determining the default thread stack size =========
#include <alloca.h>
#include <pthread.h>
#include <stdio.h>
void* threadfunc (void*p) {
int n = 0;
for (;;) {
printf("Allocated %d bytes\n", n);
fflush(stdout);
n += 128;
*((volatile char *) alloca(128)) = 0;
}
}
int main()
{
pthread_t thread;
pthread_create(&thread, NULL, threadfunc, NULL);
for (;;) {}
}
When I run it on GDBOnline using the link above, I get the exact same results each time I run it, as both a C and a C++17 program. It takes about 10 seconds or so to run. Here are the last several lines of the output:
Allocated 7449856 bytes
Allocated 7449984 bytes
Allocated 7450112 bytes
Allocated 7450240 bytes
Allocated 7450368 bytes
Allocated 7450496 bytes
Allocated 7450624 bytes
Allocated 7450752 bytes
Allocated 7450880 bytes
Segmentation fault (core dumped)
So, the thread stack size is ~7.45 MB for this system, as Bruno mentioned above (7.4 MB).
I've made a few changes to the program, mostly just for clarity, but also for efficiency, and a bit for learning.
Summary of my changes:
[learning] I passed in BYTES_TO_ALLOCATE_EACH_LOOP
as an argument to the threadfunc()
just for practice passing in and using generic void*
arguments in C.
- Note: This is also the required function prototype, as required by the
pthread_create()
function, for the callback function (threadfunc()
in my case) passed to pthread_create()
. See: https://www.man7.org/linux/man-pages/man3/pthread_create.3.html.
[efficiency] I made the main thread sleep instead of wastefully spinning.
[clarity] I added more-verbose variable names, such as BYTES_TO_ALLOCATE_EACH_LOOP
and bytes_allocated
.
[clarity] I changed this:
*((volatile char *) alloca(128)) = 0;
to this:
volatile uint8_t * byte_buff =
(volatile uint8_t *)alloca(BYTES_TO_ALLOCATE_EACH_LOOP);
byte_buff[0] = 0;
Here is my modified test program, which does exactly the same thing as Bruno's, and even has the same results:
You can run it online here, or download it from my repo here. If you choose to run it locally from my repo, here's the build and run commands I used for testing:
Build and run it as a C program:
mkdir -p bin && \
gcc -Wall -Werror -g3 -O3 -std=c11 -pthread -o bin/tmp \
onlinegdb--empirically_determine_max_thread_stack_size_GS_version.c && \
time bin/tmp
Build and run it as a C++ program:
mkdir -p bin && \
g++ -Wall -Werror -g3 -O3 -std=c++17 -pthread -o bin/tmp \
onlinegdb--empirically_determine_max_thread_stack_size_GS_version.c && \
time bin/tmp
It takes < 0.5 seconds to run locally on a fast computer with a thread stack size of ~7.4 MB.
Here's the program:
// =============== Program for determining the default thread stack size =========
// Modified by Gabriel Staples, 26 Sept. 2020
// Originally by Bruno Haible
// 24 Oct. 2009
// Source: https://lists.gnu.org/archive/html/bug-coreutils/2009-10/msg00262.html
#include <alloca.h>
#include <pthread.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <unistd.h> // sleep
/// Thread function to repeatedly allocate memory within a thread, printing
/// the total memory allocated each time, until the program crashes. The last
/// value printed before the crash indicates how big a thread's stack size is.
///
/// Note: passing in a `uint32_t` as a `void *` type here is for practice,
/// to learn how to pass in ANY type to a func by using a `void *` parameter.
/// This is also the required function prototype, as required by the
/// `pthread_create()` function, for the callback function (this function)
/// passed to `pthread_create()`. See:
/// https://www.man7.org/linux/man-pages/man3/pthread_create.3.html
void* threadfunc(void* bytes_to_allocate_each_loop)
{
const uint32_t BYTES_TO_ALLOCATE_EACH_LOOP =
*(uint32_t*)bytes_to_allocate_each_loop;
uint32_t bytes_allocated = 0;
while (true)
{
printf("bytes_allocated = %u\n", bytes_allocated);
fflush(stdout);
// NB: it appears that you don't necessarily need `volatile` here,
// but you DO definitely need to actually use (ex: write to) the
// memory allocated by `alloca()`, as we do below, or else the
// `alloca()` call does seem to get optimized out on some systems,
// making this whole program just run infinitely forever without
// ever hitting the expected segmentation fault.
volatile uint8_t * byte_buff =
(volatile uint8_t *)alloca(BYTES_TO_ALLOCATE_EACH_LOOP);
byte_buff[0] = 0;
bytes_allocated += BYTES_TO_ALLOCATE_EACH_LOOP;
}
}
int main()
{
const uint32_t BYTES_TO_ALLOCATE_EACH_LOOP = 128;
pthread_t thread;
pthread_create(&thread, NULL, threadfunc,
(void*)(&BYTES_TO_ALLOCATE_EACH_LOOP));
while (true)
{
const unsigned int SLEEP_SEC = 10000;
sleep(SLEEP_SEC);
}
return 0;
}
Sample output (same results as Bruno Haible's original program):
bytes_allocated = 7450240
bytes_allocated = 7450368
bytes_allocated = 7450496
bytes_allocated = 7450624
bytes_allocated = 7450752
bytes_allocated = 7450880
Segmentation fault (core dumped)