Having a weird issue here: if I dlsym() something in a constructor, but don't use the result; the shared library segfaults at load. bug-main.c: #define _GNU_SOURCE #include <dlfcn.h> int main() { void *foo = dlopen("./shared.so", RTLD_NOW); void (*some_exported_function)() = dlsym(foo, "some_exported_function"); some_exported_function(); return 0; } bug-shared.c: #define _GNU_SOURCE #include <dlfcn.h> #include <stdio.h> static int (*pointer) (); static int stub () { return 0; } __attribute__((constructor)) static void some_init() { if ((pointer = dlsym(RTLD_DEFAULT, "anything")) == NULL) { pointer = stub; } } void some_exported_function() { printf("in some_exported_function\n"); printf("%p\n", pointer); } Compile with: gcc -shared -O2 -fPIC bug-shared.c -ldl -o shared.so gcc -O2 -fPIC bug-main.c -ldl Now comment out the `printf` of the pointer in bug-shared.c; and it'll get a segfault. Backtrace: #0 0x00007ffff7de4b77 in _dl_lookup_symbol_x () from /lib64/ld-linux-x86-64.so.2 #1 0x00007ffff7951b91 in do_sym () from /usr/lib/libc.so.6 #2 0x00007ffff7bd80f4 in ?? () from /usr/lib/libdl.so.2 #3 0x00007ffff7de9f94 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2 #4 0x00007ffff7bd85e1 in ?? () from /usr/lib/libdl.so.2 #5 0x00007ffff7bd8148 in dlsym () from /usr/lib/libdl.so.2 #6 0x00007ffff7dea0ea in call_init.part () from /lib64/ld-linux-x86-64.so.2 #7 0x00007ffff7dea1fb in _dl_init () from /lib64/ld-linux-x86-64.so.2 #8 0x00007ffff7dee627 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2 #9 0x00007ffff7de9f94 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2 #10 0x00007ffff7dede01 in _dl_open () from /lib64/ld-linux-x86-64.so.2 #11 0x00007ffff7bd7fc9 in ?? () from /usr/lib/libdl.so.2 #12 0x00007ffff7de9f94 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2 #13 0x00007ffff7bd85e1 in ?? () from /usr/lib/libdl.so.2 #14 0x00007ffff7bd8061 in dlopen () from /usr/lib/libdl.so.2 #15 0x0000000000400622 in main () $ gcc -dumpversion 5.1.0 Works fine with -O0 Works fine with clang
nalaginrut (from #gcc irc channel) confirmed that the issue only occurs in GCC 5 at -O2 and higher. (the bug is *not* present in GCC 4.9 or at -O1). Note: only the compilation of shared.so matters (the main binary can be compiled with any setting or compiler and the same behaviour is shown.)
I recompiled glibc with debug symbols. Program received signal SIGSEGV, Segmentation fault. _dl_lookup_symbol_x (undef_name=0x7ffff7634775 "anything", undef_map=0x7ffff7ffd998 <_rtld_local+2456>, ref=0x7fffffffdca8, symbol_scope=0x0, version=0x0, type_class=0, flags=3, skip_map=0x0) at dl-lookup.c:769 769 while ((*scope)->r_list[i] != skip_map) (gdb) bt #0 _dl_lookup_symbol_x (undef_name=0x7ffff7634775 "anything", undef_map=0x7ffff7ffd998 <_rtld_local+2456>, ref=0x7fffffffdca8, symbol_scope=0x0, version=0x0, type_class=0, flags=3, skip_map=0x0) at dl-lookup.c:769 #1 0x00007ffff7951b91 in do_sym (handle=0x0, name=0x7ffff7634775 "anything", who=<optimized out>, vers=0x0, flags=2) at dl-sym.c:161 #2 0x00007ffff7bd80f4 in ?? () from /usr/lib/libdl.so.2 #3 0x00007ffff7de9f94 in _dl_catch_error (objname=0x7ffff7dda0f0, errstring=0x7ffff7dda0f8, mallocedp=0x7ffff7dda0e8, operate=0x7ffff7bd80e0, args=0x7fffffffdea0) at dl-error.c:187 #4 0x00007ffff7bd85e1 in ?? () from /usr/lib/libdl.so.2 #5 0x00007ffff7bd8148 in dlsym () from /usr/lib/libdl.so.2 #6 0x00007ffff7dea0ea in call_init (l=<optimized out>, argc=argc@entry=1, argv=argv@entry=0x7fffffffe498, env=env@entry=0x7fffffffe4a8) at dl-init.c:72 #7 0x00007ffff7dea1fb in call_init (env=0x7fffffffe4a8, argv=0x7fffffffe498, argc=1, l=<optimized out>) at dl-init.c:30 #8 _dl_init (main_map=main_map@entry=0x601030, argc=1, argv=0x7fffffffe498, env=0x7fffffffe4a8) at dl-init.c:120 #9 0x00007ffff7dee627 in dl_open_worker (a=a@entry=0x7fffffffe168) at dl-open.c:568 #10 0x00007ffff7de9f94 in _dl_catch_error (objname=objname@entry=0x7fffffffe158, errstring=errstring@entry=0x7fffffffe160, mallocedp=mallocedp@entry=0x7fffffffe157, operate=operate@entry=0x7ffff7dee260 <dl_open_worker>, args=args@entry=0x7fffffffe168) at dl-error.c:187 #11 0x00007ffff7dede01 in _dl_open (file=0x4006e4 "./shared.so", mode=-2147483646, caller_dlopen=0x400535 <main+21>, nsid=-2, argc=1, argv=<optimized out>, env=0x7fffffffe4a8) at dl-open.c:652 #12 0x00007ffff7bd7fc9 in ?? () from /usr/lib/libdl.so.2 #13 0x00007ffff7de9f94 in _dl_catch_error (objname=0x7ffff7dda0f0, errstring=0x7ffff7dda0f8, mallocedp=0x7ffff7dda0e8, operate=0x7ffff7bd7f70, args=0x7fffffffe380) at dl-error.c:187 #14 0x00007ffff7bd85e1 in ?? () from /usr/lib/libdl.so.2 #15 0x00007ffff7bd8061 in dlopen () from /usr/lib/libdl.so.2 #16 0x0000000000400535 in main () (gdb) info locals old_hash = 4294967295 current_value = {s = 0x0, m = 0x0} scope = 0x0 __PRETTY_FUNCTION__ = "_dl_lookup_symbol_x" i = 0 protected = <optimized out> (gdb) info args undef_name = 0x7ffff7634775 "anything" undef_map = 0x7ffff7ffd998 <_rtld_local+2456> ref = 0x7fffffffdca8 symbol_scope = 0x0 version = 0x0 type_class = 0 flags = 3 skip_map = 0x0
This is a very funny bug but not related to GCC per se. Firstly, let's consider a miminal repro: __attribute__((constructor)) static void some_init() { dlsym(RTLD_DEFAULT, "anything"); } (segfaults just as well). Under -O0 this produces a normal call: call dlsym@PLT ... ret but with -O2 GCC is clever enough to tail-call-optimize it to a plain jump: jmp dlsym@PLT Now dlsym (and other dl-functions) secretly take shadow parameter - return address on stack: void * __dlsym (void *handle, const char *name DL_CALLER_DECL) { ... struct dlsym_args args; args.who = DL_CALLER; args.handle = handle; args.name = name; (from dlsym.c). As in our case return address is missing, args.who argument is missing which causes segfault during symbol resolution (dynamic linker is lame on checks).
As this is not a GCC bug I suggest you * close this issue (as not-a-bug?) * report to Glibc folks (perhaps they could do more checking of return address or at least document their calling convention assumptions in manpages)
I think the issue is more complicated. Even if glibc were fixed not to crash, code like the following: return dlsym(RTLD_NEXT, "whatever"); would return the wrong result under tco when the caller's caller is in a different dso. GCC probably needs a "notailcall" attribute to fix this, but maybe there are workarounds glibc could do to prevent tco without needing a new attribute...
(In reply to Rich Felker from comment #5) > maybe there are workarounds glibc could do to prevent tco without needing a > new attribute... X-posted to Glibc BZ: https://sourceware.org/bugzilla/show_bug.cgi?id=21050
https://gist.github.com/daurnimator/a468e01800752d11cd15 Confirmed.