Valgrind errors with boost::thread_specific_ptr on GCC 8.3 + Linux
Asked Answered
C

3

11
  • Ubuntu 19 running inside Docker
  • GCC 8.3
  • Boost 1.69
  • Valgrind 3.14.0

When the application is shutting down Valgrind reports these 3 issues:

==70== Mismatched free() / delete / delete []
==70==    at 0x483997B: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==70==    by 0x4870C89: check_free (dlerror.c:202)
==70==    by 0x4870C89: check_free (dlerror.c:186)
==70==    by 0x4870C89: free_key_mem (dlerror.c:221)
==70==    by 0x4870C89: __dlerror_main_freeres (dlerror.c:239)
==70==    by 0x4B59711: __libc_freeres (in /usr/lib/x86_64-linux-gnu/libc-2.29.so)
==70==    by 0x482E19E: _vgnU_freeres (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_core-amd64-linux.so)
==70==    by 0x4A0A3A9: __run_exit_handlers (exit.c:132)
==70==    by 0x4A0A3D9: exit (exit.c:139)
==70==    by 0x49E9B71: (below main) (libc-start.c:342)
==70==  Address 0x4f6a570 is 0 bytes inside a block of size 312 alloc'd
==70==    at 0x4838DBF: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==70==    by 0x303D6D: boost::detail::make_external_thread_data() (in /build-context/bin/debug/setmatch-tests)
==70==    by 0x305424: boost::detail::add_new_tss_node(void const*, boost::shared_ptr<boost::detail::tss_cleanup_function>, void*) (in /build-context/bin/debug/setmatch-tests)
==70==    by 0x3054ED: boost::detail::set_tss_data(void const*, 

[...]

==70== Invalid free() / delete / delete[] / realloc()
==70==    at 0x483997B: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==70==    by 0x4870BB4: free_key_mem (dlerror.c:223)
==70==    by 0x4870BB4: __dlerror_main_freeres (dlerror.c:239)
==70==    by 0x4B59711: __libc_freeres (in /usr/lib/x86_64-linux-gnu/libc-2.29.so)
==70==    by 0x482E19E: _vgnU_freeres (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_core-amd64-linux.so)
==70==    by 0x4A0A3A9: __run_exit_handlers (exit.c:132)
==70==    by 0x4A0A3D9: exit (exit.c:139)
==70==    by 0x49E9B71: (below main) (libc-start.c:342)
==70==  Address 0x4f6a570 is 0 bytes inside a block of size 312 free'd
==70==    at 0x483997B: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==70==    by 0x4870C89: check_free (dlerror.c:202)
==70==    by 0x4870C89: check_free (dlerror.c:186)
==70==    by 0x4870C89: free_key_mem (dlerror.c:221)
==70==    by 0x4870C89: __dlerror_main_freeres (dlerror.c:239)
==70==    by 0x4B59711: __libc_freeres (in /usr/lib/x86_64-linux-gnu/libc-2.29.so)
==70==    by 0x482E19E: _vgnU_freeres (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_core-amd64-linux.so)
==70==    by 0x4A0A3A9: __run_exit_handlers (exit.c:132)
==70==    by 0x4A0A3D9: exit (exit.c:139)
==70==    by 0x49E9B71: (below main) (libc-start.c:342)
==70==  Block was alloc'd at
==70==    at 0x4838DBF: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==70==    by 0x303D6D: boost::detail::make_external_thread_data() (in /build-context/bin/debug/setmatch-tests)
==70==    by 0x305424: boost::detail::add_new_tss_node(void const*, boost::shared_ptr<boost::detail::tss_cleanup_function>, void*) (in /build-context/bin/debug/setmatch-tests)
==70==    by 0x3054ED: boost::detail::set_tss_data(void const*, boost::shared_ptr<boost::detail::tss_cleanup_function>, void*, bool) (in /build-context/bin/debug/setmatch-tests)
==70==    by 0x188841: boost::thread_specific_ptr<burningmime::setmatch::MatchState>::reset(burningmime::setmatch::MatchState*) (tss.hpp:105)

[...]

==70== 24 bytes in 1 blocks are definitely lost in loss record 1 of 2
==70==    at 0x4838DBF: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==70==    by 0x303F50: boost::detail::make_external_thread_data() (in /build-context/bin/debug/setmatch-tests)
==70==    by 0x305424: boost::detail::add_new_tss_node(void const*, boost::shared_ptr<boost::detail::tss_cleanup_function>, void*) (in /build-context/bin/debug/setmatch-tests)
==70==    by 0x3054ED: boost::detail::set_tss_data(void const*, boost::shared_ptr<boost::detail::tss_cleanup_function>, void*, bool) (in /build-context/bin/debug/setmatch-tests)

[...]

It looks like boost is allocating its thread data in the same place that dlerror has allocated its own thread data. A quick search points to a (slightly different?) version of dlerror here

A quick glance over at bosot's code looks to me like it's just allocating the TSS block on the heap.

This was not a problem with GCC 7.3.0 + Ubuntu 18 (same Boost version)

Anyone have some insight here?

EDIT: Maybe it's the double-free that was fixed in this commit? Still I don't see why Boost would be using that at all.

Cavanaugh answered 26/9, 2019 at 6:46 Comment(1)
Is there any reason that you aren't using the latest Valgrind 3.15.0?Tallyman
L
2

If I modify the glibc upstream test case around the pthread_setspecific call like this (and compile it with g++):

    void *ptr = new char;
    printf("Setting thread local to ptr.\n");
    if (pthread_setspecific(key, ptr) != 0) {
      perror("pthread_setspecific");
      exit(1);
    }
    delete ptr;

I get this error when running against glibc from right before the fix (at commit 5b06f538c5aee0389ed034f60d90a8884d6d54de, using ./testrun.sh --tool=valgrind /path/to/test from the glibc build tree):

==14143== Invalid read of size 8
==14143==    at 0x483B550: check_free (dlerror.c:188)
==14143==    by 0x483BA21: free_key_mem (dlerror.c:221)
==14143==    by 0x483BA21: __dlerror_main_freeres (dlerror.c:239)
==14143==    by 0x4D06AD1: __libc_freeres (in /home/fweimer/src/gnu/glibc/build/libc.so)
==14143==    by 0x48031DE: _vgnU_freeres (vg_preloaded.c:77)
==14143==    by 0x4BDD331: __run_exit_handlers (exit.c:132)
==14143==    by 0x4BDD3C9: exit (exit.c:139)
==14143==    by 0x4BC7E21: (below main) (libc-start.c:342)
==14143==  Address 0x4d750d8 is 23 bytes after a block of size 1 free'd
==14143==    at 0x480CEFC: operator delete(void*) (vg_replace_malloc.c:586)
==14143==    by 0x401344: main (t.c:93)
==14143==  Block was alloc'd at
==14143==    at 0x480BE86: operator new(unsigned long) (vg_replace_malloc.c:344)
==14143==    by 0x4012F4: main (t.c:87)
==14143== 
==14143== Invalid free() / delete / delete[] / realloc()
==14143==    at 0x480CA0C: free (vg_replace_malloc.c:540)
==14143==    by 0x483BA29: free_key_mem (dlerror.c:223)
==14143==    by 0x483BA29: __dlerror_main_freeres (dlerror.c:239)
==14143==    by 0x4D06AD1: __libc_freeres (in /home/fweimer/src/gnu/glibc/build/libc.so)
==14143==    by 0x48031DE: _vgnU_freeres (vg_preloaded.c:77)
==14143==    by 0x4BDD331: __run_exit_handlers (exit.c:132)
==14143==    by 0x4BDD3C9: exit (exit.c:139)
==14143==    by 0x4BC7E21: (below main) (libc-start.c:342)
==14143==  Address 0x4d750c0 is 0 bytes inside a block of size 1 free'd
==14143==    at 0x480CEFC: operator delete(void*) (vg_replace_malloc.c:586)
==14143==    by 0x401344: main (t.c:93)
==14143==  Block was alloc'd at
==14143==    at 0x480BE86: operator new(unsigned long) (vg_replace_malloc.c:344)
==14143==    by 0x4012F4: main (t.c:87)

This is pretty much the same error that you got, minus the nesting of the operator new allocation within Boost. So it looks indeed like the two bugs are the same.

This makes sense: Due to bug 24476, libdl uses an uninitialized pthread_key_t value (without previously calling pthread_key_create on it). For the data segment (where internal key for libdl is stored0, uninitialized means zero, of course, and as you can see from the diagnostic output in the test, the key allocated by the test (and Boost in your case) was in fact key 0:

key = 0

This libdl code is rather convoluted, and I posted a patch which moves dlerror into libc (from libdl) and also avoids using POSIX threads thread-local storage altogether.

To summarize: Whoever maintains the glibc version you use needs to backport the upstream fix into their source tree and release an update. We had to do this as well. On the plus side, this bug only happens when you run your application under valgrind and similar tools because during regular process shutdown, __libc_freeres is not invoked: the process will exit soon anyway, and the kernel cleans up all the resources for us. Unless you use valgrind in production, this means that you will never encounter this bug there. Of course, it's still an annoying issue when you are using valgrind for debugging. Sorry about that.

Luteolin answered 8/10, 2019 at 19:21 Comment(3)
Thanks!! It's great to hear from someone who actually worked on this stuff.Cavanaugh
I'm so scared of functions moving into glibc. clock_gettime() moving from librt to libc broke so much in the way of backward compatibility.Cavanaugh
@RobertFraser Please file a bug or post to libc-help if something is broken.Luteolin
S
4

Please check the version of all the tools you had use. It seems like there is some version compatibility issue in this. Try using 3.15.0 version of valgrind.

See here for the usage of valgrind.

Swift answered 8/10, 2019 at 7:9 Comment(0)
L
2

If I modify the glibc upstream test case around the pthread_setspecific call like this (and compile it with g++):

    void *ptr = new char;
    printf("Setting thread local to ptr.\n");
    if (pthread_setspecific(key, ptr) != 0) {
      perror("pthread_setspecific");
      exit(1);
    }
    delete ptr;

I get this error when running against glibc from right before the fix (at commit 5b06f538c5aee0389ed034f60d90a8884d6d54de, using ./testrun.sh --tool=valgrind /path/to/test from the glibc build tree):

==14143== Invalid read of size 8
==14143==    at 0x483B550: check_free (dlerror.c:188)
==14143==    by 0x483BA21: free_key_mem (dlerror.c:221)
==14143==    by 0x483BA21: __dlerror_main_freeres (dlerror.c:239)
==14143==    by 0x4D06AD1: __libc_freeres (in /home/fweimer/src/gnu/glibc/build/libc.so)
==14143==    by 0x48031DE: _vgnU_freeres (vg_preloaded.c:77)
==14143==    by 0x4BDD331: __run_exit_handlers (exit.c:132)
==14143==    by 0x4BDD3C9: exit (exit.c:139)
==14143==    by 0x4BC7E21: (below main) (libc-start.c:342)
==14143==  Address 0x4d750d8 is 23 bytes after a block of size 1 free'd
==14143==    at 0x480CEFC: operator delete(void*) (vg_replace_malloc.c:586)
==14143==    by 0x401344: main (t.c:93)
==14143==  Block was alloc'd at
==14143==    at 0x480BE86: operator new(unsigned long) (vg_replace_malloc.c:344)
==14143==    by 0x4012F4: main (t.c:87)
==14143== 
==14143== Invalid free() / delete / delete[] / realloc()
==14143==    at 0x480CA0C: free (vg_replace_malloc.c:540)
==14143==    by 0x483BA29: free_key_mem (dlerror.c:223)
==14143==    by 0x483BA29: __dlerror_main_freeres (dlerror.c:239)
==14143==    by 0x4D06AD1: __libc_freeres (in /home/fweimer/src/gnu/glibc/build/libc.so)
==14143==    by 0x48031DE: _vgnU_freeres (vg_preloaded.c:77)
==14143==    by 0x4BDD331: __run_exit_handlers (exit.c:132)
==14143==    by 0x4BDD3C9: exit (exit.c:139)
==14143==    by 0x4BC7E21: (below main) (libc-start.c:342)
==14143==  Address 0x4d750c0 is 0 bytes inside a block of size 1 free'd
==14143==    at 0x480CEFC: operator delete(void*) (vg_replace_malloc.c:586)
==14143==    by 0x401344: main (t.c:93)
==14143==  Block was alloc'd at
==14143==    at 0x480BE86: operator new(unsigned long) (vg_replace_malloc.c:344)
==14143==    by 0x4012F4: main (t.c:87)

This is pretty much the same error that you got, minus the nesting of the operator new allocation within Boost. So it looks indeed like the two bugs are the same.

This makes sense: Due to bug 24476, libdl uses an uninitialized pthread_key_t value (without previously calling pthread_key_create on it). For the data segment (where internal key for libdl is stored0, uninitialized means zero, of course, and as you can see from the diagnostic output in the test, the key allocated by the test (and Boost in your case) was in fact key 0:

key = 0

This libdl code is rather convoluted, and I posted a patch which moves dlerror into libc (from libdl) and also avoids using POSIX threads thread-local storage altogether.

To summarize: Whoever maintains the glibc version you use needs to backport the upstream fix into their source tree and release an update. We had to do this as well. On the plus side, this bug only happens when you run your application under valgrind and similar tools because during regular process shutdown, __libc_freeres is not invoked: the process will exit soon anyway, and the kernel cleans up all the resources for us. Unless you use valgrind in production, this means that you will never encounter this bug there. Of course, it's still an annoying issue when you are using valgrind for debugging. Sorry about that.

Luteolin answered 8/10, 2019 at 19:21 Comment(3)
Thanks!! It's great to hear from someone who actually worked on this stuff.Cavanaugh
I'm so scared of functions moving into glibc. clock_gettime() moving from librt to libc broke so much in the way of backward compatibility.Cavanaugh
@RobertFraser Please file a bug or post to libc-help if something is broken.Luteolin
P
1

Perhaps you should upgrade the valgrind version to 3.15.0 , It should help.

i think here should help you.

Palestine answered 8/10, 2019 at 7:19 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.