"Segmentation fault (core dumped)" for: "No such file or directory" for libioP.h, printf-parse.h, vfprintf-internal.c, etc
Asked Answered
P

2

2

Sample errors in the core dump files:

1289    vfprintf-internal.c: No such file or directory.
111 printf-parse.h: No such file or directory.
948 libioP.h: No such file or directory.
948 libioP.h: No such file or directory.

I'm working on a fast_malloc() implementation, but getting segmentation faults for unknown reasons once I override malloc() and free() with my own implementations, but NOT before that (meaning, if I call fast_malloc() it's fine, but if I want to be able to call malloc() to get my implementation, it seems to be broken).

Why the segfault?

Sample output, before ANYTHING can be printed, including the print statement at the start of main(), and some debug prints inside my fast_malloc():

Segmentation fault (core dumped)

I have turned on core dumps as I explain here.

So, gdb path/to/my/executable core shows some of the following core file info. Note that each run may result in a different statement for what file is missing in "No such file or directory."

  1. One run:
Reading symbols from build/fast_malloc_unit_tests...

warning: core file may not match specified executable file.
[New LWP 1257155]
Core was generated by `build/fast_malloc_unit_tests'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fd50fc7ba01 in __vfprintf_internal (s=0x7fd50fdee6a0 <_IO_2_1_stdout_>, 
    format=0x5622fd1b8010 "DEBUG: %s():\n", ap=ap@entry=0x7ffec28300a0, 
    mode_flags=mode_flags@entry=0) at vfprintf-internal.c:1289
1289    vfprintf-internal.c: No such file or directory.
(gdb) bt
#0  0x00007fd50fc7ba01 in __vfprintf_internal (s=0x7fd50fdee6a0 <_IO_2_1_stdout_>, 
    format=0x5622fd1b8010 "DEBUG: %s():\n", ap=ap@entry=0x7ffec28300a0, 
    mode_flags=mode_flags@entry=0) at vfprintf-internal.c:1289
#1  0x00007fd50fc66ebf in __printf (format=<optimized out>) at printf.c:33
#2  0x00005622fd1b53eb in fast_malloc (num_bytes=1024) at src/fast_malloc.c:225
#3  0x00005622fd1b5b66 in malloc (num_bytes=1024) at src/fast_malloc.c:496
#4  0x00007fd50fc86e84 in __GI__IO_file_doallocate (fp=0x7fd50fdee6a0 <_IO_2_1_stdout_>)
    at filedoalloc.c:101
#5  0x00007fd50fc97050 in __GI__IO_doallocbuf (fp=fp@entry=0x7fd50fdee6a0 <_IO_2_1_stdout_>)
    at libioP.h:948
#6  0x00007fd50fc960b0 in _IO_new_file_overflow (f=0x7fd50fdee6a0 <_IO_2_1_stdout_>, ch=-1)
    at fileops.c:745
#7  0x00007fd50fc94835 in _IO_new_file_xsputn (n=7, data=<optimized out>, f=<optimized out>)
    at libioP.h:948
#8  _IO_new_file_xsputn (f=0x7fd50fdee6a0 <_IO_2_1_stdout_>, data=<optimized out>, n=7)
    at fileops.c:1197
#9  0x00007fd50fc7baf2 in __vfprintf_internal (s=0x7fd50fdee6a0 <_IO_2_1_stdout_>, 
    format=0x5622fd1b8010 "DEBUG: %s():\n", ap=ap@entry=0x7ffec28308e0, 
    mode_flags=mode_flags@entry=0) at ../libio/libioP.h:948
#10 0x00007fd50fc66ebf in __printf (format=<optimized out>) at printf.c:33
#11 0x00005622fd1b53eb in fast_malloc (num_bytes=1024) at src/fast_malloc.c:225
#12 0x00005622fd1b5b66 in malloc (num_bytes=1024) at src/fast_malloc.c:496
--Type <RET> for more, q to quit, c to continue without paging--q
Quit
(gdb) q

  1. Another one:
Reading symbols from build/fast_malloc_unit_tests...

warning: core file may not match specified executable file.
[New LWP 1257787]
Core was generated by `build/fast_malloc_unit_tests'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f20b0bbba80 in __find_specmb (
    format=0x5644c516d108 "DEBUG:   block_map_i = %zu (num_bytes requested to allocate = %zu; smallest user block size large enough = %zu)\n") at printf-parse.h:111
111 printf-parse.h: No such file or directory.
(gdb) bt
#0  0x00007f20b0bbba80 in __find_specmb (
    format=0x5644c516d108 "DEBUG:   block_map_i = %zu (num_bytes requested to allocate = %zu; smallest user block size large enough = %zu)\n") at printf-parse.h:111
#1  __vfprintf_internal (s=0x7f20b0d2e6a0 <_IO_2_1_stdout_>, 
    format=0x5644c516d108 "DEBUG:   block_map_i = %zu (num_bytes requested to allocate = %zu; smallest user block size large enough = %zu)\n", ap=ap@entry=0x7ffe7f6ea580, mode_flags=mode_flags@entry=0)
    at vfprintf-internal.c:1365
#2  0x00007f20b0ba6ebf in __printf (format=<optimized out>) at printf.c:33
#3  0x00005644c516a47d in fast_malloc (num_bytes=1024) at src/fast_malloc.c:244
#4  0x00005644c516ab4e in malloc (num_bytes=1024) at src/fast_malloc.c:496
#5  0x00007f20b0bc6e84 in __GI__IO_file_doallocate (fp=0x7f20b0d2e6a0 <_IO_2_1_stdout_>)
    at filedoalloc.c:101
#6  0x00007f20b0bd7050 in __GI__IO_doallocbuf (fp=fp@entry=0x7f20b0d2e6a0 <_IO_2_1_stdout_>)
    at libioP.h:948
#7  0x00007f20b0bd60b0 in _IO_new_file_overflow (f=0x7f20b0d2e6a0 <_IO_2_1_stdout_>, ch=-1)
    at fileops.c:745
#8  0x00007f20b0bd4835 in _IO_new_file_xsputn (n=23, data=<optimized out>, f=<optimized out>)
    at libioP.h:948
#9  _IO_new_file_xsputn (f=0x7f20b0d2e6a0 <_IO_2_1_stdout_>, data=<optimized out>, n=23)
    at fileops.c:1197
#10 0x00007f20b0bbbaf2 in __vfprintf_internal (s=0x7f20b0d2e6a0 <_IO_2_1_stdout_>, 
    format=0x5644c516d108 "DEBUG:   block_map_i = %zu (num_bytes requested to allocate = %zu; smallest--Type <RET> for more, q to quit, c to continue without paging--q
Quit
(gdb) q

  1. another:
Reading symbols from build/fast_malloc_unit_tests...

warning: core file may not match specified executable file.
[New LWP 1258037]
Core was generated by `build/fast_malloc_unit_tests'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f901ef65e4d in __GI__IO_file_doallocate (fp=0x7f901f0cd6a0 <_IO_2_1_stdout_>)
    at libioP.h:948
948 libioP.h: No such file or directory.
(gdb) q
  1. another
Reading symbols from build/fast_malloc_unit_tests...

warning: core file may not match specified executable file.
[New LWP 1258336]
Core was generated by `build/fast_malloc_unit_tests'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f5e4b551a80 in __find_specmb (
    format=0x562fac6d7108 "DEBUG:   block_map_i = %zu (num_bytes requested to allocate = %zu; smallest user block size large enough = %zu)\n") at printf-parse.h:111
111 printf-parse.h: No such file or directory.
(gdb) q

My gcc build options at the moment:

-Wall -Wextra -Werror -O0 -ggdb -std=c11 -save-temps=obj -DDEBUG

Possibly related to this DEBUG_PRINTF() macro I have, which I call inside fast_malloc().

#ifdef DEBUG
    /// Debug printf function.
    /// See: https://mcmap.net/q/18199/-debug-print-macro-in-c
    #define DEBUG_PRINTF(...) printf("DEBUG: "__VA_ARGS__)
#else
    #define DEBUG_PRINTF(...) \
        do                    \
        {                     \
        } while (0)
#endif

Why is malloc() getting called before the program starts anyway? I don't call it anywhere. But, notice you can see malloc() getting called with 1024 bytes as visible in the stack traces in runs 1 and 2 (though it happens every run, those are the ones I have pasted enough you can see it in).

My malloc() and free() overrides look like this:

inline void* malloc(size_t num_bytes)
{
    return fast_malloc(num_bytes);
}

inline void free(void* ptr)
{
    fast_free(ptr);
}

Is my single-threaded program where malloc() is mysteriously getting called without me calling it somehow multi-threaded at startup? Does some weird program initialization stuff take place? My fast_malloc() implementation is currently NOT thread safe, so if Linux is doing some weird multi-threaded malloc() calls during some kind of program initialization or something, that could be the cause of the corruption, as again, fast_malloc(), which overrides malloc(), is NOT yet threadsafe.

It seems to be related to printing inside malloc(). Is printing inside malloc() forbidden?

Here is the bottom (first call is at very bottom) of a recent stack trace from a core dump:

#127471 0x00005626d43dca28 in malloc (num_bytes=1024) at src/fast_malloc.c:494
#127472 0x00007faa222a7e84 in __GI__IO_file_doallocate (fp=0x7faa2240f6a0 <_IO_2_1_stdout_>) at filedoalloc.c:101
#127473 0x00007faa222b8050 in __GI__IO_doallocbuf (fp=fp@entry=0x7faa2240f6a0 <_IO_2_1_stdout_>) at libioP.h:948
#127474 0x00007faa222b70b0 in _IO_new_file_overflow (f=0x7faa2240f6a0 <_IO_2_1_stdout_>, ch=-1) at fileops.c:745
#127475 0x00007faa222b5835 in _IO_new_file_xsputn (n=13, data=<optimized out>, f=<optimized out>) at libioP.h:948
#127476 _IO_new_file_xsputn (f=0x7faa2240f6a0 <_IO_2_1_stdout_>, data=<optimized out>, n=13) at fileops.c:1197
#127477 0x00007faa222aa678 in __GI__IO_puts (str=0x5626d43df227 '=' <repeats 13 times>) at libioP.h:948
#127478 0x00005626d43dca28 in malloc (num_bytes=1024) at src/fast_malloc.c:494
#127479 0x00007faa222a7e84 in __GI__IO_file_doallocate (fp=0x7faa2240f6a0 <_IO_2_1_stdout_>) at filedoalloc.c:101
#127480 0x00007faa222b8050 in __GI__IO_doallocbuf (fp=fp@entry=0x7faa2240f6a0 <_IO_2_1_stdout_>) at libioP.h:948
#127481 0x00007faa222b70b0 in _IO_new_file_overflow (f=0x7faa2240f6a0 <_IO_2_1_stdout_>, ch=-1) at fileops.c:745
#127482 0x00007faa222b5835 in _IO_new_file_xsputn (n=13, data=<optimized out>, f=<optimized out>) at libioP.h:948
#127483 _IO_new_file_xsputn (f=0x7faa2240f6a0 <_IO_2_1_stdout_>, data=<optimized out>, n=13) at fileops.c:1197
#127484 0x00007faa222aa678 in __GI__IO_puts (str=0x5626d43df227 '=' <repeats 13 times>) at libioP.h:948
#127485 0x00005626d43dca28 in malloc (num_bytes=1024) at src/fast_malloc.c:494
#127486 0x00007faa222a7e84 in __GI__IO_file_doallocate (fp=0x7faa2240f6a0 <_IO_2_1_stdout_>) at filedoalloc.c:101
#127487 0x00007faa222b8050 in __GI__IO_doallocbuf (fp=fp@entry=0x7faa2240f6a0 <_IO_2_1_stdout_>) at libioP.h:948
#127488 0x00007faa222b70b0 in _IO_new_file_overflow (f=0x7faa2240f6a0 <_IO_2_1_stdout_>, ch=-1) at fileops.c:745
#127489 0x00007faa222b5835 in _IO_new_file_xsputn (n=49, data=<optimized out>, f=<optimized out>) at libioP.h:948
#127490 _IO_new_file_xsputn (f=0x7faa2240f6a0 <_IO_2_1_stdout_>, data=<optimized out>, n=49) at fileops.c:1197
#127491 0x00007faa222aa678 in __GI__IO_puts (str=0x5626d43df238 "Running UNIT tests for the \"fast_malloc\" module.\n") at libioP.h:948
#127492 0x00005626d43dca98 in main () at src/fast_malloc_unit_tests.c:35
(gdb) 

What are __GI__IO_puts and _IO_new_file_xsputn and those other function calls as you move up? Are they calls in other threads? Are they calling malloc() behind-the-scenes? It appears __GI__IO_file_doallocate is...

Parse answered 30/6, 2021 at 5:29 Comment(1)
I added my own answer here. HUGE thanks to @Employed Russian for his answer and insight first though, which was absolutely critical to my ability to begin to understand and solve this problem. I had absolutely no idea what was going on or why, at first, and could not have solved this alone.Parse
P
1

To follow up and answer my own question: @Employed Russian's answer appears to be correct.

To be more-specific: I have two main problems:

  1. Infinite recursion between malloc() and printf().
  2. Data corruption by freeing and reusing memory the system thinks it has exclusive access to.

The 1st problem: infinite recursion

I call printf() to do some debug prints inside my fast_malloc() implementation. So long as I do NOT override malloc() with my fast_malloc(), this is fine (so long as I protect the print with a mutex to make it multi-threaded-safe). BUT, once I do override malloc() with my fast_malloc(), this is NOT fine, because printf() calls malloc() to create a buffer into which it can place formatted string data. So, once malloc() becomes overridden by fast_malloc(), we end up with infinite recursion: prior to main() even being run, the system calls malloc() to prepare some things. This calls printf(), which calls malloc(), which calls printf()...forever until stack overflow...all before it has even entered my main() function.

So, I see zero of my prints, and main() doesn't even get entered. You can see from my last stack trace I posted in my answer that I had 127492 stack frames on my stack at the time of the crash...at which point the stack overflowed. Sanity check: for a stack size of ~7.4 MB, that equates to about 7400000/127492 = ~58 bytes per stack frame, which seems reasonable.

The 2nd problem: I'm freeing and reusing memory that the system (glibc) thinks it has safely acquired and still controls

The code I'm running is my fast_malloc_unit_tests.c program, which, among other things, re-initializes the memory pools I'm using under-the-hood many times. Each time it does this, it considers prior-allocated memory to be freed, and it reallocates it when needed. BUT, printf() and other system calls run prior to main() even being entered have already called malloc() and think they still own this memory. So, we end up with me mistakenly reusing the memory they are using, causing data corruption and crashes.

After disabling all prints inside my malloc() implementation, thereby removing the infinite recursion problem, I was able to see this behavior. In this case, the code did enter my main() function, I did see up to a few dozen of my prints before the crash, and there were only 2 calls (stack frames) on my stack at the time of the crash (rather than 127492 frames). They were:

#0  0x000055555555589d in fast_malloc_print_stats () at src/fast_malloc.c:464
#1  0x0000555555556228 in main () at src/fast_malloc_unit_tests.c:129

Full output:

Program received signal SIGSEGV, Segmentation fault.
0x000055555555589d in fast_malloc_print_stats () at src/fast_malloc.c:464
464             block = block->next_free_block;
(gdb) bt
#0  0x000055555555589d in fast_malloc_print_stats () at src/fast_malloc.c:464
#1  0x0000555555556228 in main () at src/fast_malloc_unit_tests.c:129

where fast_malloc.c line 464 contains:

while (block != NULL)
{
    free_block_cnt_walked++;
    block = block->next_free_block;   <==== line 464
}

which as far as I can tell has nothing wrong whatsoever, as it's a simple copy and block was already guaranteed NOT to be NULL, so calling block->next_free_block couldn't possibly be dereferencing a NULL ptr. I think the segmentation fault must therefore be due to corrupted memory because that memory is being double-used, so the block ptr probably is a corrupted address which is outside the valid bounds for us to read--hence the seg fault.


That's it (I think). Now I've got to go do proper fixes and continue work on this. Big thanks goes out to @Employed Russian!

See also:

  1. [my answer: a safe_printf() function which never calls malloc(), thereby solving the infinite recursion problem!] Which print calls in C do NOT ever call malloc() under the hood?
Parse answered 30/6, 2021 at 20:15 Comment(0)
P
3

You are calling printf within your malloc implementation. That is not going to end well.

In the stack trace, you can clearly see that printf itself calls malloc.

If your malloc is not prepared to to be called while in the middle of manipulating its data structures, it will crash (possibly that's what's happening here).

Alternatively, you can also end up with infinite recursion, when malloc calls printf, which calls malloc, which calls printf, etc.

TL;DR: when implementing something as low level as malloc, you must stick to either low-level functions which don't themselves allocate anything, or to direct system calls.

Why is malloc() getting called before the program starts anyway?

Because low-level functions in e.g. dynamic loader need to allocate memory during their own initialization.

Your malloc must work very early in the process lifetime; long before main.

Is printing inside malloc() forbidden?

Everything that might allocate memory is forbidden.

In practice, you need to call only async-signal safe routines, because non-async-signal safe ones may allocate, if not now then in the future.

Polariscope answered 30/6, 2021 at 6:11 Comment(7)
I think I have infinite recursion. My stack trace is 127492 calls long and looks repetitive.Parse
I never knew printf() called malloc(). That never occurred to me. I suppose it makes sense though, as I've implemented printf() on safety-critical microcontrollers in the past, now that I think about it, and had to choose a static array large enough to do formatting into it, rather than using malloc() in printf().Parse
This project is taking weeks, not days. :-. I'll come back to this--hopefully tomorrow. I think you got it though. 2 things: 1) the infinite recursion between malloc() and printf(). 2) the fact that malloc() gets called BEFORE main(). Inside main(), since this is my unit test code, one of the first things I do is re-initialize and free all previously-allocated memory, which means it instantly corrupts all the memory the system just thought it had safely acquired before entering main().Parse
Which print calls do NOT allocate? ie: which can I use inside malloc(), for debugging, and while avoiding infinite recursion?Parse
My follow-up question: Which print calls in C do NOT ever call malloc() under the hood?.Parse
Any idea how to duplicate this infinite recursion problem between malloc() and printf() in a short program? I tried, but failed, here: github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world/blob/…. Feel free to take a hack at it.Parse
For my own understanding and to help others: "An async-signal-safe function is one that can be safely called from within a signal handler." See: signal-safety(7) — Linux manual pageParse
P
1

To follow up and answer my own question: @Employed Russian's answer appears to be correct.

To be more-specific: I have two main problems:

  1. Infinite recursion between malloc() and printf().
  2. Data corruption by freeing and reusing memory the system thinks it has exclusive access to.

The 1st problem: infinite recursion

I call printf() to do some debug prints inside my fast_malloc() implementation. So long as I do NOT override malloc() with my fast_malloc(), this is fine (so long as I protect the print with a mutex to make it multi-threaded-safe). BUT, once I do override malloc() with my fast_malloc(), this is NOT fine, because printf() calls malloc() to create a buffer into which it can place formatted string data. So, once malloc() becomes overridden by fast_malloc(), we end up with infinite recursion: prior to main() even being run, the system calls malloc() to prepare some things. This calls printf(), which calls malloc(), which calls printf()...forever until stack overflow...all before it has even entered my main() function.

So, I see zero of my prints, and main() doesn't even get entered. You can see from my last stack trace I posted in my answer that I had 127492 stack frames on my stack at the time of the crash...at which point the stack overflowed. Sanity check: for a stack size of ~7.4 MB, that equates to about 7400000/127492 = ~58 bytes per stack frame, which seems reasonable.

The 2nd problem: I'm freeing and reusing memory that the system (glibc) thinks it has safely acquired and still controls

The code I'm running is my fast_malloc_unit_tests.c program, which, among other things, re-initializes the memory pools I'm using under-the-hood many times. Each time it does this, it considers prior-allocated memory to be freed, and it reallocates it when needed. BUT, printf() and other system calls run prior to main() even being entered have already called malloc() and think they still own this memory. So, we end up with me mistakenly reusing the memory they are using, causing data corruption and crashes.

After disabling all prints inside my malloc() implementation, thereby removing the infinite recursion problem, I was able to see this behavior. In this case, the code did enter my main() function, I did see up to a few dozen of my prints before the crash, and there were only 2 calls (stack frames) on my stack at the time of the crash (rather than 127492 frames). They were:

#0  0x000055555555589d in fast_malloc_print_stats () at src/fast_malloc.c:464
#1  0x0000555555556228 in main () at src/fast_malloc_unit_tests.c:129

Full output:

Program received signal SIGSEGV, Segmentation fault.
0x000055555555589d in fast_malloc_print_stats () at src/fast_malloc.c:464
464             block = block->next_free_block;
(gdb) bt
#0  0x000055555555589d in fast_malloc_print_stats () at src/fast_malloc.c:464
#1  0x0000555555556228 in main () at src/fast_malloc_unit_tests.c:129

where fast_malloc.c line 464 contains:

while (block != NULL)
{
    free_block_cnt_walked++;
    block = block->next_free_block;   <==== line 464
}

which as far as I can tell has nothing wrong whatsoever, as it's a simple copy and block was already guaranteed NOT to be NULL, so calling block->next_free_block couldn't possibly be dereferencing a NULL ptr. I think the segmentation fault must therefore be due to corrupted memory because that memory is being double-used, so the block ptr probably is a corrupted address which is outside the valid bounds for us to read--hence the seg fault.


That's it (I think). Now I've got to go do proper fixes and continue work on this. Big thanks goes out to @Employed Russian!

See also:

  1. [my answer: a safe_printf() function which never calls malloc(), thereby solving the infinite recursion problem!] Which print calls in C do NOT ever call malloc() under the hood?
Parse answered 30/6, 2021 at 20:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.