Created attachment 49667 [details] A testcase On Linux/x86-64, r11-5029 miscompiled dl-lookup.c with -O2 -std=gnu11 -fgnu89-inline -O2 -g -Wall -Wwrite-strings -Wundef -Werror -fmerge-all-constants -frounding-math -fno-stack-protector -Wstrict-prototypes -Wold-style-definition -fmath-errno -fPIC -fno-stack-protector -mno-mmx -fexceptions -fasynchronous-unwind-tables -ftls-model=initial-exec _dl_lookup_symbol_x: .LVL199: .LFB73: .loc 1 835 1 view -0 .cfi_startproc .loc 1 835 1 is_stmt 0 view .LVU632 pushq %r15 .cfi_def_cfa_offset 16 .cfi_offset 15, -16 pushq %r14 .cfi_def_cfa_offset 24 .cfi_offset 14, -24 pushq %r13 .cfi_def_cfa_offset 32 .cfi_offset 13, -32 pushq %r12 .cfi_def_cfa_offset 40 .cfi_offset 12, -40 movq %rdi, %r12 pushq %rbp .cfi_def_cfa_offset 48 .cfi_offset 6, -48 movq %rdx, %rbp pushq %rbx .cfi_def_cfa_offset 56 .cfi_offset 3, -56 .LBB251: .LBB252: .LBB253: .LBB254: .LBB255: .LBB256: .loc 1 640 72 view .LVU633 #APP # 640 "dl-lookup.c" 1 mov %fs:16,%rax # 0 "" 2 #NO_APP Source has /* Make sure nobody can unload the object while we are at it. */ if (__glibc_unlikely (flags & DL_LOOKUP_GSCOPE_LOCK)) { /* We can't just call __rtld_lock_lock_recursive (GL(dl_load_lock)) here, that can result in ABBA deadlock. */ THREAD_GSCOPE_RESET_FLAG (); __rtld_lock_lock_recursive (GL(dl_load_lock)); /* While MAP value won't change, after THREAD_GSCOPE_RESET_FLAG () it can e.g. point to unallocated memory. So avoid the optimizer treating the above read from MAP->l_serial as ensurance it can safely dereference it. */ But the check is removed by GCC 11.
*** Bug 98106 has been marked as a duplicate of this bug. ***
I see first real code changes between r11-5028 and r11-5029 during fre3 which optimizes away some rereads of current_value.m in _dl_lookup_symbol_x, which passes its address to other functions. # DEBUG BEGIN_STMT # PT = nonlocal escaped null _47 = current_value.m; _48 = _47->l_used; _49 = _48 == 0; ... _335 = val.m; - # PT = nonlocal escaped null - _336 = current_value.m; - if (_335 != _336) + if (_47 != _335) ... <bb 204> [local count: 3147328]: - # PT = nonlocal escaped null - _383 = current_value.m; - iftmp.45_384 = _383->l_map_start; + iftmp.45_384 = _47->l_map_start; ... - # PT = nonlocal escaped null - _120 = current_value.m; <bb 217> [local count: 114862414]: # PT = nonlocal escaped null - # _66 = PHI <0B(30), _116(153), _120(216)> + # _66 = PHI <0B(30), _116(153), _47(216)> (plus of course since the modref pass points to info changes).
Do we have a simple reproducer for the miscompiled Glibc library?
¤t_value is passed to do_lookup_x call which does modify it in some cases, e.g. result->s = sym; result->m = (struct link_map *) map; or passes it to other function, do_lookup_unique, which can perform similar changes. But it happens only early: for (size_t start = i; *scope != # 854 "dl-lookup.c" 3 4 ((void *)0) # 854 "dl-lookup.c" ; start = 0, ++scope) if (do_lookup_x (undef_name, new_hash, &old_hash, *ref, ¤t_value, *scope, start, version, flags, skip_map, type_class, undef_map) != 0) break; and then current_value is no longer modified, so unless I've missed do_lookup_x remembering the address of current_value somewhere, that optimization seems to be ok.
(In reply to Martin Liška from comment #3) > Do we have a simple reproducer for the miscompiled Glibc library? All one needs to do is with glibc git trunk CC=trunk-gcc CXX=trunk-g++ ../configure --disable-sanity-check make -jN elf/ld.so ./libc.so.6 Segmentation fault (core dumped)
The crash seems to be: => 0x00007ffff7fdbe20 <_dl_lookup_symbol_x+0>: push %r15 0x00007ffff7fdbe22 <_dl_lookup_symbol_x+2>: push %r14 0x00007ffff7fdbe24 <_dl_lookup_symbol_x+4>: push %r13 0x00007ffff7fdbe26 <_dl_lookup_symbol_x+6>: push %r12 0x00007ffff7fdbe28 <_dl_lookup_symbol_x+8>: mov %rdi,%r12 0x00007ffff7fdbe2b <_dl_lookup_symbol_x+11>: push %rbp 0x00007ffff7fdbe2c <_dl_lookup_symbol_x+12>: mov %rdx,%rbp 0x00007ffff7fdbe2f <_dl_lookup_symbol_x+15>: push %rbx 0x00007ffff7fdbe30 <_dl_lookup_symbol_x+16>: mov %fs:0x10,%rax 0x00007ffff7fdbe39 <_dl_lookup_symbol_x+25>: sub $0xa8,%rsp End of assembler dump. (gdb) bt #0 _dl_lookup_symbol_x (undef_name=0x7ffff7ff414a "__vdso_clock_gettime", undef_map=0x7ffff7ffe850, ref=0x7fffffffd598, symbol_scope=0x7ffff7ffebe8, version=0x7fffffffd5c0, type_class=0, flags=0, skip_map=0x0) at dl-lookup.c:835 #1 0x00007ffff7fd463f in dl_vdso_vsym (name=0x7ffff7ff414a "__vdso_clock_gettime") at ../sysdeps/unix/sysv/linux/dl-vdso.h:52 #2 setup_vdso_pointers () at ../sysdeps/unix/sysv/linux/dl-vdso-setup.h:30 #3 dl_main (phdr=<optimized out>, phnum=12, user_entry=<optimized out>, auxv=0x7fffffffdac8) at rtld.c:1620 #4 0x00007ffff7feb1e7 in _dl_sysdep_start (start_argptr=start_argptr@entry=0x7fffffffd8f0, dl_main=dl_main@entry=0x7ffff7fd34e0 <dl_main>) at ../elf/dl-sysdep.c:252 #5 0x00007ffff7fd3015 in _dl_start_final (arg=0x7fffffffd8f0) at rtld.c:485 #6 _dl_start (arg=0x7fffffffd8f0) at rtld.c:578 #7 0x00007ffff7fd2058 in _start () on the mov %fs:0x10,%rax perhaps %fs isn't initialized yet?
Hm, adding -fno-ipa-sra does not help while -O1 does in order to remove the miscompilation.
(In reply to Jakub Jelinek from comment #6) > on the mov %fs:0x10,%rax perhaps %fs isn't initialized yet? Yes, that's why the access is guarded by flags & DL_LOOKUP_GSCOPE_LOCK. During initial relocation, _dl_lookup_symbol_x is called without this flag.
And we should not end up in the add_dependency part, either because l_type won't be lt_loaded and the DL_LOOKUP_ADD_DEPENDENCY flag hasn't been set, either. The inline asm is marked as volatile, and that should prevent it from being moved across these checks.
Looking at *.optimized dump, mov %%fs:%c1,%0 appears in there only bb 64, guarded by flags_90(D) & 4 check, so exactly the check that appears in the source code - flags & DL_LOOKUP_GSCOPE_LOCK (among various other earlier checks). But in the assembly it is unconditional at the start of the function, so something must have hoisted that asm insn to the start of the function.
Ah, RTL loop_invariant. Perhaps because the inline asm is buggy? asm ("mov %%fs:%c1,%0" : "=r" (__self) : "i" (__builtin_offsetof (struct pthread, header.self))); The only input of the asm is the constant, so really nothing tells the optimizers it can't move it arbitrarily. I think it needs to have some memory input or clobber to properly model that it reads from unknown memory.
(In reply to Jakub Jelinek from comment #11) > Ah, RTL loop_invariant. Perhaps because the inline asm is buggy? > asm ("mov %%fs:%c1,%0" : "=r" (__self) : "i" (__builtin_offsetof (struct > pthread, header.self))); > The only input of the asm is the constant, so really nothing tells the > optimizers it can't move it arbitrarily. I think it needs to have some > memory input or clobber to properly model that it reads from unknown memory. Ahh, do you mean the THREAD_SELF macro? # define THREAD_SELF \ ({ struct pthread *__self; \ asm ("mov %%fs:%c1,%0" : "=r" (__self) \ : "i" (offsetof (struct pthread, header.self))); \ __self;}) I had only looked at the other macros. glibc relies on GCC optimizing away THREAD_SELF reads on x86-64 because most of the TCB access does not need it.
And no, the asm isn't marked volatile, that would have prevented it too: # define THREAD_SELF \ ({ struct pthread *__self; \ asm ("mov %%fs:%c1,%0" : "=r" (__self) \ : "i" (offsetof (struct pthread, header.self))); \ __self;}) This all boils down to: void foo (int *p) { for (int i = 0; i < 64; i++) { if (p[i]) { int *q; asm ("mov %%fs:%c1,%0" : "=r" (q) : "i" (16)); q[0]++; } } } which would hoist the inline asm to the function prologue even in GCC 4.1.
I guess with GCC 6 and later one can use: void foo (int *p) { for (int i = 0; i < 64; i++) { if (p[i]) { int *q; // asm ("mov %%fs:%c1,%0" : "=r" (q) : "i" (16)); q = *(int *__seg_fs *) 16; q[0]++; } } } instead (but I haven't done any testing beyond this testcase), for older GCC I guess you can keep using what it did because we've been lucky (dunno why).
(In reply to Jakub Jelinek from comment #13) > And no, the asm isn't marked volatile, that would have prevented it too: > # define THREAD_SELF \ > ({ struct pthread *__self; > \ > asm ("mov %%fs:%c1,%0" : "=r" (__self) > \ > : "i" (offsetof (struct pthread, header.self))); > \ > __self;}) > This all boils down to: > void > foo (int *p) > { > for (int i = 0; i < 64; i++) > { > if (p[i]) > { > int *q; > asm ("mov %%fs:%c1,%0" : "=r" (q) : "i" (16)); > q[0]++; > } > } > } > which would hoist the inline asm to the function prologue even in GCC 4.1. I suppose we'd need to mark it possibly trapping (which it actually does), but I think there's no good way to do this.
In the glibc bugzilla or on private IRC I've suggested either: diff --git a/sysdeps/x86_64/nptl/tls.h b/sysdeps/x86_64/nptl/tls.h index a08bf972de..ccb5f24d92 100644 --- a/sysdeps/x86_64/nptl/tls.h +++ b/sysdeps/x86_64/nptl/tls.h @@ -180,11 +180,16 @@ _Static_assert (offsetof (tcbhead_t, __glibc_unused2) == 0x80, assignments like pthread_descr self = thread_self(); do not get optimized away. */ -# define THREAD_SELF \ +# if __GNUC_PREREQ (6, 0) +# define THREAD_SELF \ + (*(struct pthread *__seg_fs *) offsetof (struct pthread, header.self)) +# else +# define THREAD_SELF \ ({ struct pthread *__self; \ asm ("mov %%fs:%c1,%0" : "=r" (__self) \ : "i" (offsetof (struct pthread, header.self))); \ __self;}) +# endif /* Magic for libthread_db to know how to do THREAD_SELF. */ # define DB_THREAD_SELF_INCLUDE <sys/reg.h> /* For the FS constant. */ or diff --git a/sysdeps/x86_64/nptl/tls.h b/sysdeps/x86_64/nptl/tls.h index a08bf972de..f3cbae9cba 100644 --- a/sysdeps/x86_64/nptl/tls.h +++ b/sysdeps/x86_64/nptl/tls.h @@ -180,11 +180,21 @@ _Static_assert (offsetof (tcbhead_t, __glibc_unused2) == 0x80, assignments like pthread_descr self = thread_self(); do not get optimized away. */ -# define THREAD_SELF \ +# if __GNUC_PREREQ (6, 0) +# define THREAD_SELF \ + ({ struct pthread *__self; \ + asm ("mov %%fs:%c1,%0" : "=r" (__self) \ + : "i" (offsetof (struct pthread, header.self)), \ + "m" (*(struct pthread *__seg_fs *) \ + offsetof (struct pthread, header.self))); \ + __self;}) +# else +# define THREAD_SELF \ ({ struct pthread *__self; \ asm ("mov %%fs:%c1,%0" : "=r" (__self) \ : "i" (offsetof (struct pthread, header.self))); \ __self;}) +# endif /* Magic for libthread_db to know how to do THREAD_SELF. */ # define DB_THREAD_SELF_INCLUDE <sys/reg.h> /* For the FS constant. */ With make check I'm seeing 153 glibc FAILs with trunk with the first patch, about to make check the second one, but maybe I'm doing something wrong or there are other gcc or glibc bugs unrelated to this one. I think it would be good to try these patches with GCC 10 first to see if those patches don't break stuff with known working compilers and only then look at the failed tests if any. Florian said he'll take it over on the glibc side.
Jakub's glibc test failures were due to --prefix=/usr/local, so that glibc wouldn't find the installed system libgcc_s in /usr/lib64.
This is a glibc bug.