Bug 98110 - [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
Summary: [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 11.0
: P1 critical
Target Milestone: 11.0
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
: 98106 (view as bug list)
Depends on:
Blocks:
 
Reported: 2020-12-03 00:10 UTC by H.J. Lu
Modified: 2022-06-02 00:42 UTC (History)
8 users (show)

See Also:
Host:
Target: x86-64
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
A testcase (50.18 KB, application/octet-stream)
2020-12-03 00:10 UTC, H.J. Lu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2020-12-03 00:10:14 UTC
Created attachment 49667 [details]
A testcase

On Linux/x86-64, r11-5029 miscompiled dl-lookup.c with

-O2 -std=gnu11 -fgnu89-inline  -O2 -g -Wall -Wwrite-strings -Wundef -Werror -fmerge-all-constants -frounding-math -fno-stack-protector -Wstrict-prototypes -Wold-style-definition -fmath-errno   -fPIC  -fno-stack-protector  -mno-mmx -fexceptions -fasynchronous-unwind-tables  -ftls-model=initial-exec

_dl_lookup_symbol_x:
.LVL199:
.LFB73:
        .loc 1 835 1 view -0
        .cfi_startproc
        .loc 1 835 1 is_stmt 0 view .LVU632
        pushq   %r15
        .cfi_def_cfa_offset 16
        .cfi_offset 15, -16
        pushq   %r14
        .cfi_def_cfa_offset 24
        .cfi_offset 14, -24
        pushq   %r13
        .cfi_def_cfa_offset 32
        .cfi_offset 13, -32
        pushq   %r12
        .cfi_def_cfa_offset 40
        .cfi_offset 12, -40
        movq    %rdi, %r12
        pushq   %rbp
        .cfi_def_cfa_offset 48
        .cfi_offset 6, -48
        movq    %rdx, %rbp
        pushq   %rbx
        .cfi_def_cfa_offset 56
        .cfi_offset 3, -56
.LBB251:
.LBB252:
.LBB253:
.LBB254:
.LBB255:
.LBB256:
        .loc 1 640 72 view .LVU633
#APP
# 640 "dl-lookup.c" 1
        mov %fs:16,%rax
# 0 "" 2
#NO_APP

Source has

  /* Make sure nobody can unload the object while we are at it.  */
  if (__glibc_unlikely (flags & DL_LOOKUP_GSCOPE_LOCK))
    {
      /* We can't just call __rtld_lock_lock_recursive (GL(dl_load_lock))
         here, that can result in ABBA deadlock.  */
      THREAD_GSCOPE_RESET_FLAG ();
      __rtld_lock_lock_recursive (GL(dl_load_lock));
      /* While MAP value won't change, after THREAD_GSCOPE_RESET_FLAG ()
         it can e.g. point to unallocated memory.  So avoid the optimizer
         treating the above read from MAP->l_serial as ensurance it
         can safely dereference it.  */

But the check is removed by GCC 11.
Comment 1 Jakub Jelinek 2020-12-03 00:35:49 UTC
*** Bug 98106 has been marked as a duplicate of this bug. ***
Comment 2 Jakub Jelinek 2020-12-03 09:58:51 UTC
I see first real code changes between r11-5028 and r11-5029 during fre3 which optimizes away some rereads of current_value.m in _dl_lookup_symbol_x, which passes its address to other functions.
   # DEBUG BEGIN_STMT
   # PT = nonlocal escaped null 
   _47 = current_value.m;
   _48 = _47->l_used;
   _49 = _48 == 0;
...
   _335 = val.m;
-  # PT = nonlocal escaped null 
-  _336 = current_value.m;
-  if (_335 != _336)
+  if (_47 != _335)
...
   <bb 204> [local count: 3147328]:
-  # PT = nonlocal escaped null 
-  _383 = current_value.m;
-  iftmp.45_384 = _383->l_map_start;
+  iftmp.45_384 = _47->l_map_start;
...
-  # PT = nonlocal escaped null 
-  _120 = current_value.m;
 
   <bb 217> [local count: 114862414]:
   # PT = nonlocal escaped null 
-  # _66 = PHI <0B(30), _116(153), _120(216)>
+  # _66 = PHI <0B(30), _116(153), _47(216)>

(plus of course since the modref pass points to info changes).
Comment 3 Martin Liška 2020-12-03 10:08:28 UTC
Do we have a simple reproducer for the miscompiled Glibc library?
Comment 4 Jakub Jelinek 2020-12-03 10:12:10 UTC
&current_value is passed to do_lookup_x call which does modify it in some cases, e.g.
       result->s = sym;
       result->m = (struct link_map *) map;
or passes it to other function, do_lookup_unique, which can perform similar changes.
But it happens only early:
  for (size_t start = i; *scope != 
# 854 "dl-lookup.c" 3 4
                                  ((void *)0)
# 854 "dl-lookup.c"
                                      ; start = 0, ++scope)
    if (do_lookup_x (undef_name, new_hash, &old_hash, *ref,
       &current_value, *scope, start, version, flags,
       skip_map, type_class, undef_map) != 0)
      break;
and then current_value is no longer modified, so unless I've missed do_lookup_x remembering the address of current_value somewhere, that optimization seems to be ok.
Comment 5 Jakub Jelinek 2020-12-03 10:14:35 UTC
(In reply to Martin Liška from comment #3)
> Do we have a simple reproducer for the miscompiled Glibc library?

All one needs to do is with glibc git trunk
CC=trunk-gcc CXX=trunk-g++ ../configure --disable-sanity-check
make -jN
elf/ld.so ./libc.so.6
Segmentation fault (core dumped)
Comment 6 Jakub Jelinek 2020-12-03 10:23:19 UTC
The crash seems to be:
=> 0x00007ffff7fdbe20 <_dl_lookup_symbol_x+0>:	push   %r15
   0x00007ffff7fdbe22 <_dl_lookup_symbol_x+2>:	push   %r14
   0x00007ffff7fdbe24 <_dl_lookup_symbol_x+4>:	push   %r13
   0x00007ffff7fdbe26 <_dl_lookup_symbol_x+6>:	push   %r12
   0x00007ffff7fdbe28 <_dl_lookup_symbol_x+8>:	mov    %rdi,%r12
   0x00007ffff7fdbe2b <_dl_lookup_symbol_x+11>:	push   %rbp
   0x00007ffff7fdbe2c <_dl_lookup_symbol_x+12>:	mov    %rdx,%rbp
   0x00007ffff7fdbe2f <_dl_lookup_symbol_x+15>:	push   %rbx
   0x00007ffff7fdbe30 <_dl_lookup_symbol_x+16>:	mov    %fs:0x10,%rax
   0x00007ffff7fdbe39 <_dl_lookup_symbol_x+25>:	sub    $0xa8,%rsp
End of assembler dump.
(gdb) bt
#0  _dl_lookup_symbol_x (undef_name=0x7ffff7ff414a "__vdso_clock_gettime", undef_map=0x7ffff7ffe850, ref=0x7fffffffd598, 
    symbol_scope=0x7ffff7ffebe8, version=0x7fffffffd5c0, type_class=0, flags=0, skip_map=0x0) at dl-lookup.c:835
#1  0x00007ffff7fd463f in dl_vdso_vsym (name=0x7ffff7ff414a "__vdso_clock_gettime") at ../sysdeps/unix/sysv/linux/dl-vdso.h:52
#2  setup_vdso_pointers () at ../sysdeps/unix/sysv/linux/dl-vdso-setup.h:30
#3  dl_main (phdr=<optimized out>, phnum=12, user_entry=<optimized out>, auxv=0x7fffffffdac8) at rtld.c:1620
#4  0x00007ffff7feb1e7 in _dl_sysdep_start (start_argptr=start_argptr@entry=0x7fffffffd8f0, dl_main=dl_main@entry=0x7ffff7fd34e0 <dl_main>)
    at ../elf/dl-sysdep.c:252
#5  0x00007ffff7fd3015 in _dl_start_final (arg=0x7fffffffd8f0) at rtld.c:485
#6  _dl_start (arg=0x7fffffffd8f0) at rtld.c:578
#7  0x00007ffff7fd2058 in _start ()

on the mov %fs:0x10,%rax perhaps %fs isn't initialized yet?
Comment 7 Martin Liška 2020-12-03 10:24:17 UTC
Hm, adding -fno-ipa-sra does not help while -O1 does in order to remove the miscompilation.
Comment 8 Florian Weimer 2020-12-03 10:26:11 UTC
(In reply to Jakub Jelinek from comment #6)
> on the mov %fs:0x10,%rax perhaps %fs isn't initialized yet?

Yes, that's why the access is guarded by flags & DL_LOOKUP_GSCOPE_LOCK. During initial relocation, _dl_lookup_symbol_x is called without this flag.
Comment 9 Florian Weimer 2020-12-03 10:33:09 UTC
And we should not end up in the add_dependency part, either because l_type won't be lt_loaded and the DL_LOOKUP_ADD_DEPENDENCY flag hasn't been set, either.

The inline asm is marked as volatile, and that should prevent it from being moved across these checks.
Comment 10 Jakub Jelinek 2020-12-03 10:35:29 UTC
Looking at *.optimized dump, mov %%fs:%c1,%0 appears in there only bb 64, guarded by flags_90(D) & 4 check, so exactly the check that appears in the source code
- flags & DL_LOOKUP_GSCOPE_LOCK (among various other earlier checks).
But in the assembly it is unconditional at the start of the function, so something must have hoisted that asm insn to the start of the function.
Comment 11 Jakub Jelinek 2020-12-03 10:42:22 UTC
Ah, RTL loop_invariant.  Perhaps because the inline asm is buggy?
asm ("mov %%fs:%c1,%0" : "=r" (__self) : "i" (__builtin_offsetof (struct pthread, header.self)));
The only input of the asm is the constant, so really nothing tells the optimizers it can't move it arbitrarily.  I think it needs to have some memory input or clobber to properly model that it reads from unknown memory.
Comment 12 Florian Weimer 2020-12-03 10:50:35 UTC
(In reply to Jakub Jelinek from comment #11)
> Ah, RTL loop_invariant.  Perhaps because the inline asm is buggy?
> asm ("mov %%fs:%c1,%0" : "=r" (__self) : "i" (__builtin_offsetof (struct
> pthread, header.self)));
> The only input of the asm is the constant, so really nothing tells the
> optimizers it can't move it arbitrarily.  I think it needs to have some
> memory input or clobber to properly model that it reads from unknown memory.

Ahh, do you mean the THREAD_SELF macro?

# define THREAD_SELF \
  ({ struct pthread *__self;                                                  \
     asm ("mov %%fs:%c1,%0" : "=r" (__self)                                   \
          : "i" (offsetof (struct pthread, header.self)));                    \
     __self;})

I had only looked at the other macros.

glibc relies on GCC optimizing away THREAD_SELF reads on x86-64 because most of the TCB access does not need it.
Comment 13 Jakub Jelinek 2020-12-03 10:51:40 UTC
And no, the asm isn't marked volatile, that would have prevented it too:
# define THREAD_SELF \
  ({ struct pthread *__self;                                                  \
     asm ("mov %%fs:%c1,%0" : "=r" (__self)                                   \
          : "i" (offsetof (struct pthread, header.self)));                    \
     __self;})
This all boils down to:
void
foo (int *p)
{
  for (int i = 0; i < 64; i++)
    {
      if (p[i])
	{
	  int *q;
	  asm ("mov %%fs:%c1,%0" : "=r" (q) : "i" (16));
	  q[0]++;
	}
    }
}
which would hoist the inline asm to the function prologue even in GCC 4.1.
Comment 14 Jakub Jelinek 2020-12-03 10:59:09 UTC
I guess with GCC 6 and later one can use:
void
foo (int *p)
{
  for (int i = 0; i < 64; i++)
    {
      if (p[i])
	{
	  int *q;
//	  asm ("mov %%fs:%c1,%0" : "=r" (q) : "i" (16));
	  q = *(int *__seg_fs *) 16;
	  q[0]++;
	}
    }
}
instead (but I haven't done any testing beyond this testcase), for older GCC I guess you can keep using what it did because we've been lucky (dunno why).
Comment 15 Richard Biener 2020-12-03 11:56:57 UTC
(In reply to Jakub Jelinek from comment #13)
> And no, the asm isn't marked volatile, that would have prevented it too:
> # define THREAD_SELF \
>   ({ struct pthread *__self;                                                
> \
>      asm ("mov %%fs:%c1,%0" : "=r" (__self)                                 
> \
>           : "i" (offsetof (struct pthread, header.self)));                  
> \
>      __self;})
> This all boils down to:
> void
> foo (int *p)
> {
>   for (int i = 0; i < 64; i++)
>     {
>       if (p[i])
> 	{
> 	  int *q;
> 	  asm ("mov %%fs:%c1,%0" : "=r" (q) : "i" (16));
> 	  q[0]++;
> 	}
>     }
> }
> which would hoist the inline asm to the function prologue even in GCC 4.1.

I suppose we'd need to mark it possibly trapping (which it actually does),
but I think there's no good way to do this.
Comment 16 Jakub Jelinek 2020-12-03 12:01:27 UTC
In the glibc bugzilla or on private IRC I've suggested either:
diff --git a/sysdeps/x86_64/nptl/tls.h b/sysdeps/x86_64/nptl/tls.h
index a08bf972de..ccb5f24d92 100644
--- a/sysdeps/x86_64/nptl/tls.h
+++ b/sysdeps/x86_64/nptl/tls.h
@@ -180,11 +180,16 @@ _Static_assert (offsetof (tcbhead_t, __glibc_unused2) == 0x80,
    assignments like
 	pthread_descr self = thread_self();
    do not get optimized away.  */
-# define THREAD_SELF \
+# if __GNUC_PREREQ (6, 0)
+#  define THREAD_SELF \
+  (*(struct pthread *__seg_fs *) offsetof (struct pthread, header.self))
+# else
+#  define THREAD_SELF \
   ({ struct pthread *__self;						      \
      asm ("mov %%fs:%c1,%0" : "=r" (__self)				      \
 	  : "i" (offsetof (struct pthread, header.self)));	 	      \
      __self;})
+# endif
 
 /* Magic for libthread_db to know how to do THREAD_SELF.  */
 # define DB_THREAD_SELF_INCLUDE  <sys/reg.h> /* For the FS constant.  */
or
diff --git a/sysdeps/x86_64/nptl/tls.h b/sysdeps/x86_64/nptl/tls.h
index a08bf972de..f3cbae9cba 100644
--- a/sysdeps/x86_64/nptl/tls.h
+++ b/sysdeps/x86_64/nptl/tls.h
@@ -180,11 +180,21 @@ _Static_assert (offsetof (tcbhead_t, __glibc_unused2) == 0x80,
    assignments like
        pthread_descr self = thread_self();
    do not get optimized away.  */
-# define THREAD_SELF \
+# if __GNUC_PREREQ (6, 0)
+#  define THREAD_SELF \
+  ({ struct pthread *__self;                                                 \
+     asm ("mov %%fs:%c1,%0" : "=r" (__self)                                  \
+         : "i" (offsetof (struct pthread, header.self)),                     \
+           "m" (*(struct pthread *__seg_fs *)                                \
+                offsetof (struct pthread, header.self)));                    \
+     __self;})
+# else
+#  define THREAD_SELF \
   ({ struct pthread *__self;                                                 \
      asm ("mov %%fs:%c1,%0" : "=r" (__self)                                  \
          : "i" (offsetof (struct pthread, header.self)));                    \
      __self;})
+# endif
 
 /* Magic for libthread_db to know how to do THREAD_SELF.  */
 # define DB_THREAD_SELF_INCLUDE  <sys/reg.h> /* For the FS constant.  */

With make check I'm seeing 153 glibc FAILs with trunk with the first patch, about to make check the second one, but maybe I'm doing something wrong or there are other gcc or glibc bugs unrelated to this one.  I think it would be good to try these patches with GCC 10 first to see if those patches don't break stuff with known working compilers and only then look at the failed tests if any.
Florian said he'll take it over on the glibc side.
Comment 17 Florian Weimer 2020-12-03 12:26:15 UTC
Jakub's glibc test failures were due to --prefix=/usr/local, so that glibc wouldn't find the installed system libgcc_s in /usr/lib64.
Comment 18 Florian Weimer 2020-12-03 13:49:17 UTC
This is a glibc bug.