Measuring stack usage for Linux multi-threaded application

Asked 12/9, 2008 at 10:19 Answered 18/11, 2023 at 15:46

I'm developing a multi-threaded application for a Linux embedded platform.

At the moment I'm setting the stack size for each thread (via pthread_set_attr) to a fairly large default value. I would like to fine tune that value for each thread to something smaller to reduce my application's memory usage. I could go through the trial and error route of setting each thread's stack size to progressively smaller values until the program crashed, but the application uses ~15 threads each with completely different functionality/attributes so that approach would be extremely time consuming.

I would much rather prefer being able to directly measure each thread's stack usage. Is there some utility people can recommend to do this? (For example, I come from a vxWorks background and using the 'ti' command from the vxWorks shell directly gives statistics on the stack usage as well as other useful info on the task status.)

Coheman answered 12/9, 2008 at 10:19 Comment(0)

Here are two tools that measure (native pthreads) stack usage in Linux applications:

Valgrind

Usage:

valgrind --tool=drd --show-stack-usage=yes PROG

Valgrind is a stable and powerful tool, useful not only for measuring stack usage. It may not support all embedded CPU models though.

Stackusage

Usage:

stackusage PROG

Stackusage is a light-weight tool specifically designed for measuring thread stack usage which should be portable for most embedded Linux platforms equipped with glibc. It is likely not as well-tested or mature as Valgrind/drd at this point.

Full disclosure: I'm the author of Stackusage.

Signal answered 4/3, 2015 at 13:44 Comment(0)

I do not know any good tools but as last resort you could include some code in your application to check it, similar to the following:

__thread void* stack_start;
__thread long stack_max_size = 0L;

void check_stack_size() {
  // address of 'nowhere' approximates end of stack
  char nowhere;
  void* stack_end = (void*)&nowhere;
  // may want to double check stack grows downward on your platform
  long stack_size = (long)stack_start - (long)stack_end;
  // update max_stack_size for this thread
  if (stack_size > stack_max_size)
    stack_max_size = stack_size;
}

The check_stack_size() function would have to be called in some of the functions that are most deeply nested.

Then as last statement in the thread you could output stack_max_size to somewhere.

The stack_start variable would have to be initialized at start of your thread:

void thread_proc() {
  char nowhere;
  stack_start = (void*)&nowhere;
  // do stuff including calls to check_stack_size()
  // in deeply nested functions
  // output stack_max_size here
}

Nombles answered 12/9, 2008 at 10:30 Comment(0)

Referencing Tobi's answer: You can use pthread_attr_getstackaddr to get the base of the stack at any time, if setting a variable at thread initialization is difficult. You can then get the address of an automatic variable in your own function to determine how deep the stack is at that moment.

Piker answered 12/9, 2008 at 13:38 Comment(0)

In Linux/GLIB/multithreaded environment, the default stack size of the threads is got by the pthread library from getrlimit() and the RLIMIT_STACK parameter. In a shell you can get this value with a command like:

$ ulimit -s
8192

The above result is in kilobytes. Hence, the default thread stack size on my system is 8 MB of virtual memory.
Let's consider the following program creating one thread:

#include <pthread.h>
#include <unistd.h>

static void *thd(void *p)
{

  pause();

  return NULL;
}

int main(){
  pthread_t tid;

  pthread_create(&tid, NULL, thd, NULL);

  pthread_join(tid, NULL);
  
  return 0;
}

Let's compile and run it:

$ gcc pg.c -o pg -lpthread
$ ./pg &

As a stack is by default preceded by a red zone page (i.e. a page with no read/write access rights) to detect stacks overflows, the stack of the threads can be viewed in /proc/<pid>/smaps:

$ cat /proc/`pidof pg`/smaps
[...]
7fd503787000-7fd503788000 ---p 00000000 00:00 0 
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
[...]
7fd503788000-7fd503f88000 rw-p 00000000 00:00 0 
Size:               8192 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   8 kB
Pss:                   8 kB
[...]

The above snippet of the output shows first the 4 KB long red zone and right after the 8 MB long thread's stack. The RSS field shows the actual RAM consumed by the thread as according to the lazy allocation feature of Linux: only touched pages in virtual memory trigger actual RAM pages allocations. Here it is 8 KB. This consumption is the internal pthread's task control block (TCB) of the thread + miscellaneous internal information. Let's kill the preceding program:

$ kill `pidof pg`
[2]+  Terminated              ./pg

Let's add some local variable in the thread. We write into them to trigger an actual RAM allocation:

#include <pthread.h>
#include <unistd.h>
#include <string.h>

static void *thd(void *p)
{
  char buffer[8192];

  // Force the physical allocation of the corresponding stack space 
  memset(buffer, 0, sizeof(buffer));

  pause();

  return NULL;
}

int main(){
  pthread_t tid;

  pthread_create(&tid, NULL, thd, NULL);

  pthread_join(tid, NULL);
  
  return 0;
}

Compilation and run:

$ gcc pg.c -o pg -lpthread
$ ./pg &
[2] 38167

The memory map shows a bigger RSS equal to 16 KB:

$ cat /proc/`pidof pg`/smaps
[...]
7f7e61244000-7f7e61245000 ---p 00000000 00:00 0 
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
[...]
7f7e61245000-7f7e61a45000 rw-p 00000000 00:00 0 
Size:               8192 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  16 kB
Pss:                  16 kB
[...]

The 16 KB are actually the 8 KB seen above for the internal pthread information plus the 8 KB of the buffer local variable of the thread.

Hence we have seen a method to catch the actual stack consumption of the threads of a process: look at the RSS of the corresponding memory zone in the process memory map.

PS: When sizing the stack of a thread, don't forget to allocate space for the signal handling as the handler is executed on the receiving thread's stack. The value MINSIGSTKSZ is defined to be the minimum stack size for a signal handler (cf. <signal.h>). Otherwise define an alternate stack for signals: cf. sigalstack()

Eteocles answered 18/11, 2023 at 15:46 Comment(0)

Recommended topics

Hot tags