Bug 66826 - tail call to dlsym in constructor results in a segfault due to dlsym needing to know which shared library it is called from
Summary: tail call to dlsym in constructor results in a segfault due to dlsym needing ...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: unknown
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-10 06:12 UTC by Daurnimator
Modified: 2024-03-10 06:05 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.9.2
Known to fail: 5.1.0
Last reconfirmed: 2021-09-02 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Daurnimator 2015-07-10 06:12:58 UTC
Having a weird issue here: if I dlsym() something in a constructor, but don't use the result; the shared library segfaults at load.

bug-main.c:

    #define _GNU_SOURCE
    #include <dlfcn.h>
    
    int main() {
        void *foo = dlopen("./shared.so", RTLD_NOW);
        void (*some_exported_function)() = dlsym(foo, "some_exported_function");
        some_exported_function();
        return 0;
    }

bug-shared.c:

    #define _GNU_SOURCE
    #include <dlfcn.h>
    #include <stdio.h>
    
    static int (*pointer) ();
    static int stub () { return 0; }
    __attribute__((constructor)) static void some_init() {
        if ((pointer = dlsym(RTLD_DEFAULT, "anything")) == NULL) {
            pointer = stub;
        }
    }
    
    void some_exported_function() {
        printf("in some_exported_function\n");
        printf("%p\n", pointer);
    }


Compile with:

    gcc -shared -O2 -fPIC bug-shared.c -ldl -o shared.so
    gcc -O2 -fPIC bug-main.c -ldl


Now comment out the `printf` of the pointer in bug-shared.c; and it'll get a segfault.

Backtrace:
#0  0x00007ffff7de4b77 in _dl_lookup_symbol_x () from /lib64/ld-linux-x86-64.so.2
#1  0x00007ffff7951b91 in do_sym () from /usr/lib/libc.so.6
#2  0x00007ffff7bd80f4 in ?? () from /usr/lib/libdl.so.2
#3  0x00007ffff7de9f94 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#4  0x00007ffff7bd85e1 in ?? () from /usr/lib/libdl.so.2
#5  0x00007ffff7bd8148 in dlsym () from /usr/lib/libdl.so.2
#6  0x00007ffff7dea0ea in call_init.part () from /lib64/ld-linux-x86-64.so.2
#7  0x00007ffff7dea1fb in _dl_init () from /lib64/ld-linux-x86-64.so.2
#8  0x00007ffff7dee627 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#9  0x00007ffff7de9f94 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#10 0x00007ffff7dede01 in _dl_open () from /lib64/ld-linux-x86-64.so.2
#11 0x00007ffff7bd7fc9 in ?? () from /usr/lib/libdl.so.2
#12 0x00007ffff7de9f94 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#13 0x00007ffff7bd85e1 in ?? () from /usr/lib/libdl.so.2
#14 0x00007ffff7bd8061 in dlopen () from /usr/lib/libdl.so.2
#15 0x0000000000400622 in main ()

$ gcc -dumpversion
5.1.0

Works fine with -O0
Works fine with clang
Comment 1 Daurnimator 2015-07-10 13:11:16 UTC
nalaginrut (from #gcc irc channel) confirmed that the issue only occurs in GCC 5 at -O2 and higher. (the bug is *not* present in GCC 4.9 or at -O1).

Note: only the compilation of shared.so matters (the main binary can be compiled with any setting or compiler and the same behaviour is shown.)
Comment 2 Daurnimator 2015-07-12 14:44:50 UTC
I recompiled glibc with debug symbols.

Program received signal SIGSEGV, Segmentation fault.
_dl_lookup_symbol_x (undef_name=0x7ffff7634775 "anything", undef_map=0x7ffff7ffd998 <_rtld_local+2456>, ref=0x7fffffffdca8, 
    symbol_scope=0x0, version=0x0, type_class=0, flags=3, skip_map=0x0) at dl-lookup.c:769
769	    while ((*scope)->r_list[i] != skip_map)
(gdb) bt
#0  _dl_lookup_symbol_x (undef_name=0x7ffff7634775 "anything", undef_map=0x7ffff7ffd998 <_rtld_local+2456>, ref=0x7fffffffdca8, 
    symbol_scope=0x0, version=0x0, type_class=0, flags=3, skip_map=0x0) at dl-lookup.c:769
#1  0x00007ffff7951b91 in do_sym (handle=0x0, name=0x7ffff7634775 "anything", who=<optimized out>, vers=0x0, flags=2) at dl-sym.c:161
#2  0x00007ffff7bd80f4 in ?? () from /usr/lib/libdl.so.2
#3  0x00007ffff7de9f94 in _dl_catch_error (objname=0x7ffff7dda0f0, errstring=0x7ffff7dda0f8, mallocedp=0x7ffff7dda0e8, 
    operate=0x7ffff7bd80e0, args=0x7fffffffdea0) at dl-error.c:187
#4  0x00007ffff7bd85e1 in ?? () from /usr/lib/libdl.so.2
#5  0x00007ffff7bd8148 in dlsym () from /usr/lib/libdl.so.2
#6  0x00007ffff7dea0ea in call_init (l=<optimized out>, argc=argc@entry=1, argv=argv@entry=0x7fffffffe498, env=env@entry=0x7fffffffe4a8)
    at dl-init.c:72
#7  0x00007ffff7dea1fb in call_init (env=0x7fffffffe4a8, argv=0x7fffffffe498, argc=1, l=<optimized out>) at dl-init.c:30
#8  _dl_init (main_map=main_map@entry=0x601030, argc=1, argv=0x7fffffffe498, env=0x7fffffffe4a8) at dl-init.c:120
#9  0x00007ffff7dee627 in dl_open_worker (a=a@entry=0x7fffffffe168) at dl-open.c:568
#10 0x00007ffff7de9f94 in _dl_catch_error (objname=objname@entry=0x7fffffffe158, errstring=errstring@entry=0x7fffffffe160, 
    mallocedp=mallocedp@entry=0x7fffffffe157, operate=operate@entry=0x7ffff7dee260 <dl_open_worker>, args=args@entry=0x7fffffffe168)
    at dl-error.c:187
#11 0x00007ffff7dede01 in _dl_open (file=0x4006e4 "./shared.so", mode=-2147483646, caller_dlopen=0x400535 <main+21>, nsid=-2, argc=1, 
    argv=<optimized out>, env=0x7fffffffe4a8) at dl-open.c:652
#12 0x00007ffff7bd7fc9 in ?? () from /usr/lib/libdl.so.2
#13 0x00007ffff7de9f94 in _dl_catch_error (objname=0x7ffff7dda0f0, errstring=0x7ffff7dda0f8, mallocedp=0x7ffff7dda0e8, 
    operate=0x7ffff7bd7f70, args=0x7fffffffe380) at dl-error.c:187
#14 0x00007ffff7bd85e1 in ?? () from /usr/lib/libdl.so.2
#15 0x00007ffff7bd8061 in dlopen () from /usr/lib/libdl.so.2
#16 0x0000000000400535 in main ()
(gdb) info locals
old_hash = 4294967295
current_value = {s = 0x0, m = 0x0}
scope = 0x0
__PRETTY_FUNCTION__ = "_dl_lookup_symbol_x"
i = 0
protected = <optimized out>
(gdb) info args
undef_name = 0x7ffff7634775 "anything"
undef_map = 0x7ffff7ffd998 <_rtld_local+2456>
ref = 0x7fffffffdca8
symbol_scope = 0x0
version = 0x0
type_class = 0
flags = 3
skip_map = 0x0
Comment 3 Yuri Gribov 2017-01-02 21:39:45 UTC
This is a very funny bug but not related to GCC per se. Firstly, let's consider a miminal repro:
    __attribute__((constructor)) static void some_init() {
      dlsym(RTLD_DEFAULT, "anything");
    }
(segfaults just as well). Under -O0 this produces a normal call:
    call    dlsym@PLT
    ...
    ret
but with -O2 GCC is clever enough to tail-call-optimize it to a plain jump:
    jmp     dlsym@PLT

Now dlsym (and other dl-functions) secretly take shadow parameter - return address on stack:
    void *
    __dlsym (void *handle, const char *name DL_CALLER_DECL)
    {
    ...
      struct dlsym_args args;
      args.who = DL_CALLER;
      args.handle = handle;
      args.name = name;
(from dlsym.c). As in our case return address is missing, args.who argument is missing which causes segfault during symbol resolution (dynamic linker is lame on checks).
Comment 4 Yuri Gribov 2017-01-02 21:41:56 UTC
As this is not a GCC bug I suggest you
* close this issue (as not-a-bug?)
* report to Glibc folks (perhaps they could do more checking of return address or at least document their calling convention assumptions in manpages)
Comment 5 Rich Felker 2017-01-04 05:11:29 UTC
I think the issue is more complicated. Even if glibc were fixed not to crash, code like the following:

return dlsym(RTLD_NEXT, "whatever");

would return the wrong result under tco when the caller's caller is in a different dso. GCC probably needs a "notailcall" attribute to fix this, but maybe there are workarounds glibc could do to prevent tco without needing a new attribute...
Comment 6 Yuri Gribov 2017-01-14 06:40:00 UTC
(In reply to Rich Felker from comment #5)
> maybe there are workarounds glibc could do to prevent tco without needing a
> new attribute...

X-posted to Glibc BZ: https://sourceware.org/bugzilla/show_bug.cgi?id=21050
Comment 7 Andrew Pinski 2021-09-02 05:37:57 UTC
https://gist.github.com/daurnimator/a468e01800752d11cd15

Confirmed.