Bug 120653 - Hardened glibc (-z now) compiled with GCC 14.3 will crash when unwinding stack
Summary: Hardened glibc (-z now) compiled with GCC 14.3 will crash when unwinding stack
Status: RESOLVED MOVED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 14.3.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2025-06-13 23:14 UTC by Sergio Durigan Junior
Modified: 2025-06-16 00:46 UTC (History)
9 users (show)

See Also:
Host:
Target: X86_64
Build:
Known to work:
Known to fail:
Last reconfirmed: 2025-06-14 00:00:00


Attachments
rtld.i.xz (75.26 KB, application/x-xz)
2025-06-14 11:13 UTC, Sam James
Details
rtld-unwind.tar.xz (tarball containing good/bad rtld.os) (22.85 KB, application/x-xz)
2025-06-14 11:19 UTC, Sam James
Details
build.sh (1.55 KB, text/plain)
2025-06-14 11:26 UTC, Sam James
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sergio Durigan Junior 2025-06-13 23:14:47 UTC
Hi,

This is an interesting bug which took me quite some time to (partially) understand.  I decided to file this upstream report to:

- See if an upstream developer could help me fully understand what's going one, and
- Get a patch backported to GCC 14 to fix the issue.

It all started when we noticed that compiling a glibc using the following hardening flags (from the OpenSSF project) would lead to an abortion in certain scenarios:

====
*self_spec:
+ %{!O:%{!O1:%{!O2:%{!O3:%{!O0:%{!Os:%{!0fast:%{!0g:%{!0z:-O2}}}}}}}}} -fhardened -Wno-error=hardened -Wno-hardened %{!fdelete-null-pointer-checks:-fno-delete-null-pointer-checks} -fno-strict-overflow -fno-strict-aliasing %{!fomit-frame-pointer:-fno-omit-frame-pointer} -mno-omit-leaf-frame-pointer

*link:
+ --as-needed -O1 --sort-common -z noexecstack -z relro -z now
====

It is important to notice that:

- The bug only happens when using a glibc compiled with the "-z now" hardening flag.  If the flag is removed, then the abort doesn't occur.
- The bug only happens when using a glibc compiled with GCC 14.x (14.3 included).
- The bug *does not* happen with GCC 15.  I bisected and found a commit that fixes the problem; see below.

After some investigation, I was able to determine that the problem seemed to be happening while unwinding the stack.  For example, this is the backtrace I would get when running "python3 -c 'import matplotlib'":

#0  0x00007c43afe9972c in __pthread_kill_implementation () from /lib/libc.so.6
#1  0x00007c43afe3d8be in raise () from /lib/libc.so.6
#2  0x00007c43afe2531f in abort () from /lib/libc.so.6
#3  0x00007c43af84f79d in uw_init_context_1[cold] () from /usr/lib/libgcc_s.so.1
#4  0x00007c43af86d4d8 in _Unwind_RaiseException () from /usr/lib/libgcc_s.so.1
#5  0x00007c43acac9014 in __cxxabiv1::__cxa_throw (obj=0x5b7d7f52fab0, warning: could not convert 'std::type_info' from the host encoding (ISO-8859-1) to UTF-32.
This normally should not happen, please file a bug report.
tinfo=0x7c429b6fd218 <typeinfo for pybind11::attribute_error>, dest=0x7c429b5f7f70 <pybind11::reference_cast_error::~reference_cast_error() [clone .lto_priv.0]>)
    at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:93
#6  0x00007c429b5ec3a7 in ft2font__getattr__(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) [clone .lto_priv.0] [clone .cold] () from /usr/lib/python3.13/site-packages/matplotlib/ft2font.cpython-313-x86_64-linux-gnu.so
#7  0x00007c429b62f086 in pybind11::cpp_function::initialize<pybind11::object (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::object, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, pybind11::name, pybind11::scope, pybind11::sibling>(pybind11::object (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::object (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#1}::_FUN(pybind11::detail::function_call&) [clone .lto_priv.0] ()
   from /usr/lib/python3.13/site-packages/matplotlib/ft2font.cpython-313-x86_64-linux-gnu.so
#8  0x00007c429b603886 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /usr/lib/python3.13/site-packages/matplotlib/ft2font.cpython-313-x86_64-linux-gnu.so
...

and this is the backtrace I would get when running Emacs:

#0  0x00007eede329972c in __pthread_kill_implementation () from /lib/libc.so.6
#1  0x00007eede323d8be in raise () from /lib/libc.so.6
#2  0x00007eede322531f in abort () from /lib/libc.so.6
#3  0x00007eede262879d in uw_init_context_1[cold] () from /usr/lib/libgcc_s.so.1
#4  0x00007eede2646e7c in _Unwind_Backtrace () from /usr/lib/libgcc_s.so.1
#5  0x00007eede3327b11 in backtrace () from /lib/libc.so.6
#6  0x000059535963a8a1 in emacs_backtrace ()
#7  0x000059535956499a in main ()

After some more debugging, it became clear that the issue abort was being triggered by this asm excerpt from libgcc:

...
   0x00007eede2645e82 <+146>:   call   0x7eede2644a90 <uw_frame_state_for>
   0x00007eede2645e87 <+151>:   test   %eax,%eax
   0x00007eede2645e89 <+153>:   jne    0x7eede2628798 <uw_init_context_1.cold>
...

which led me to (apologies for the Microsoft github links):

https://github.com/gcc-mirror/gcc/blob/c0be0298a9553c13af642f45628a15d833473657/libgcc/unwind-dw2.c#L1340-L1341

I also checked the contents of %eax, and it the value of _URC_END_OF_STACK. So now we know that this is the cause of the abort.

I decided to proceed further and compile a new GCC with debuginfo so that I could analyse what uw_frame_state_for is doing. This took me down a rabbit hole, but eventually I found that the problem is that fde is NULL here:

https://github.com/gcc-mirror/gcc/blob/c0be0298a9553c13af642f45628a15d833473657/libgcc/unwind-dw2.c#L1007-L1019

Because amd64 defines MD_FALLBACK_FRAME_STATE_FOR, we end up calling it inside the if statement, but that's not really important. What's important is understanding that _Unwind_Find_FDE never returns NULL when glibc isn't hardened with "-z now", but does return NULL sometimes when glibc is hardened with "-z now".

Now, _Unwind_Find_FDE is complex and deals with CFI and .eh_frame in order to find the frame description information and properly unwind the stack. It eventually calls find_fde_tail, which does the hard job.

Here we get to the end of the line, at least for my debugging session. I was able to track down the exact spot where find_fde_tail is returning NULL:

https://github.com/gcc-mirror/gcc/blob/c0be0298a9553c13af642f45628a15d833473657/libgcc/unwind-dw2-fde-dip.c#L454-L455

When I looked at the addresses for pc and table[0].initial_loc + data_base, I found that the latter is an address that falls within /lib/ld-linux-x86-64.so.2's text section, while the former fell within libgcc's text. Since the loader is mapped close to the end of the process virtual memory, the comparison is true and the function ends up returning NULL. On a non-hardened glibc, I noticed that both addresses would fall within libgcc's sections, which wouldn't trigger the return.

I was also able to come up with a simple reproducer for the problem.  We can trigger a stack unwinding by using backtrace(3):

====
#include <execinfo.h>

int
main(int argc, char *argv[])
{
  void *a[4096];
  backtrace (a, 100);
  return 0;
}
====

And as I said, I was also able to bisect GCC and found that the following commit seems to fix the issue:

====
commit 99b1daae18c095d6c94d32efb77442838e11cbfb
Author: Richard Biener <rguenther@suse.de>
Date:   Fri May 3 14:04:41 2024 +0200

    tree-optimization/114589 - remove profile based sink heuristics
====

That's the reason why I assigned this bug to the tree-optimization component, and will Cc Richard as well.  My GCC-fu only took me to this point, so I'm hoping that someone can further explain what exactly the commit is doing, and why it seems to fix the issue.

Thanks.
Comment 1 Sergio Durigan Junior 2025-06-13 23:19:38 UTC
Kudos to my friend Gabriel F. T. Gomes for helping with debugging this, btw!
Comment 2 Jakub Jelinek 2025-06-14 05:07:15 UTC
I think we need either detailed instruction how exactly was glibc built, or details on what exact address' FDE in ld-linux-x86-64.so.2 was attempted to be found and wasn't found, around which function etc.
Because none of the above backtraces actually show any address in ld-linux-x86-64.so.2.
Comment 3 Sam James 2025-06-14 07:49:37 UTC
The -fno-strict-aliasing (and some of the other options) option as a "hardening" measure is ridiculous and they shouldn't be recommending that, see https://github.com/ossf/wg-best-practices-os-developers/issues/660.

Anyway, we actually had a similar report of this from a user who decided to use -fno-strict-aliasing to build glibc: https://bugs.gentoo.org/955635.
Comment 4 Sam James 2025-06-14 09:58:14 UTC
I can reproduce in a vanilla fedora:41 container (=> no spec or default patching; should really consider that on our end too) with just build deps installed, then with the glibc-2.41 tarball:
```
$ mkdir /tmp/build && cd /tmp/build
$ /tmp/glibc-2.41/configure --prefix=/usr
$ make -j$(nproc) -l$(nproc)
$ make -j$(nproc) -l$(nproc) check
[...]
                === Summary of results ===
      5 FAIL
   6008 PASS
    122 UNSUPPORTED
     16 XFAIL
      4 XPASS
``` 

Those are harmless and related to the container.

Then:
```
$ mkdir /tmp/build-mangled && cd /tmp/build-mangled
$ /tmp/glibc-2.41/configure --prefix=/usr CFLAGS="-O2 -fno-strict-aliasing"
$ make -j$(nproc) -l$(nproc)
$ make -j$(nproc) -l$(nproc) check
[...]

                === Summary of results ===
    130 FAIL
   5883 PASS
    122 UNSUPPORTED
     16 XFAIL
      4 XPASS
$ grep ^FAIL tests.sum  | grep back
FAIL: debug/backtrace-tst
FAIL: debug/tst-backtrace2
FAIL: debug/tst-backtrace3
FAIL: debug/tst-backtrace4
FAIL: debug/tst-backtrace5
FAIL: debug/tst-backtrace6
FAIL: nptl/tst-backtrace1
```
Comment 5 Sam James 2025-06-14 11:13:04 UTC
Created attachment 61636 [details]
rtld.i.xz

Building elf/rtld.o with -fno-strict-aliasing breaks it. Attached rtld.i.xz.

It's built with:
```
gcc rtld.c -c -std=gnu11 -fgnu89-inline -O2 -Wall -Wwrite-strings -Wundef -Wimplicit-fallthrough -fmerge-all-constants -frounding-math -fno-stack-protector -fno-common -U_FORTIFY_SOURCE -Wstrict-prototypes -Wold-style-definition -fmath-errno -fPIC -fno-stack-protector -DSTACK_PROTECTOR_LEVEL=0 -mno-mmx -fno-tree-loop-distribute-patterns '-DSYSCONFDIR="/etc"' -ftls-model=initial-exec -I../include -I/tmp/build-mangled/elf -I/tmp/build-mangled -I../sysdeps/unix/sysv/linux/x86_64/64 -I../sysdeps/unix/sysv/linux/x86_64/include -I../sysdeps/unix/sysv/linux/x86_64 -I../sysdeps/unix/sysv/linux/x86/include -I../sysdeps/unix/sysv/linux/x86 -I../sysdeps/x86/nptl -I../sysdeps/unix/sysv/linux/wordsize-64 -I../sysdeps/x86_64/nptl -I../sysdeps/unix/sysv/linux/include -I../sysdeps/unix/sysv/linux -I../sysdeps/nptl -I../sysdeps/pthread -I../sysdeps/gnu -I../sysdeps/unix/inet -I../sysdeps/unix/sysv -I../sysdeps/unix/x86_64 -I../sysdeps/unix -I../sysdeps/posix -I../sysdeps/x86_64/64 -I../sysdeps/x86_64/fpu/multiarch -I../sysdeps/x86_64/fpu -I../sysdeps/x86/fpu -I../sysdeps/x86_64/multiarch -I../sysdeps/x86_64 -I../sysdeps/x86/include -I../sysdeps/x86 -I../sysdeps/ieee754/float128 -I../sysdeps/ieee754/ldbl-96/include -I../sysdeps/ieee754/ldbl-96 -I../sysdeps/ieee754/dbl-64 -I../sysdeps/ieee754/flt-32 -I../sysdeps/wordsize-64 -I../sysdeps/ieee754 -I../sysdeps/generic -I.. -I../libio -I. -D_LIBC_REENTRANT -include /tmp/build-mangled/libc-modules.h -DMODULE_NAME=rtld -include ../include/libc-symbols.h -DPIC -DSHARED -DTOP_NAMESPACE=glibc -o /tmp/build-mangled/elf/rtld.os -MD -MP -MF /tmp/build-mangled/elf/rtld.os.dt -MT /tmp/build-mangled/elf/rtld.os -save-temps
```

Adding -fno-strict-aliasing in there and using a small script to relink and build the tests makes it fail.
Comment 6 Sam James 2025-06-14 11:19:35 UTC
Created attachment 61637 [details]
rtld-unwind.tar.xz (tarball containing good/bad rtld.os)
Comment 7 Sam James 2025-06-14 11:22:45 UTC
The diff between the two is a bit noisy and the layout is a little different.

Dunno if it's interesting but in the bad object, __ehdr_start appears in .rela.data.rel.ro.local.

```
│ -Relocation section '.rela.data.rel.ro.local' at offset 0xe408 contains 1 entry:
│ +Relocation section '.rela.data.rel.ro.local' at offset 0xe668 contains 2 entries:
│      Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
│  0000000000000000  0000000900000001 R_X86_64_64            0000000000000000 .rodata.str1.1 + 4c4
│ +0000000000000018  000000c100000001 R_X86_64_64            0000000000000000 __ehdr_start + 0
```
Comment 8 Sam James 2025-06-14 11:26:23 UTC
Created attachment 61638 [details]
build.sh
Comment 9 Sam James 2025-06-14 11:46:22 UTC
optimize("no-strict-aliasing") on _dl_start_final is enough to break it.
Comment 10 Sam James 2025-06-14 11:51:43 UTC
(In reply to Sam James from comment #9)
> optimize("no-strict-aliasing") on _dl_start_final is enough to break it.

│ Relocation section '.rela.data.rel.ro.local' at offset 0xe408 contains 1 entry:
│     Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
│ -0000000000000000  0000000900000001 R_X86_64_64            0000000000000000 .rodata.str1.1 + 4c4
│ +0000000000000000  000000c100000001 R_X86_64_64            0000000000000000 __ehdr_start + 0
Comment 11 Eric Botcazou 2025-06-14 11:55:05 UTC
FWIW the *self_spec line contains typos (0fast, 0g, 0z).
Comment 12 Jakub Jelinek 2025-06-14 15:15:39 UTC
Reproduced.  It is actually _Unwind_Find_FDE not finding context->ra from libgcc_s.so.1 in the bad ld.so case and finding it in the correct case.
Comment 13 Jakub Jelinek 2025-06-14 15:29:19 UTC
In the good case,
_Unwind_Find_FDE (pc=0x7ffff7dc32cb <_Unwind_Backtrace+59>, bases=bases@entry=0x7fffffffe008) at ../../../libgcc/unwind-dw2-btree.h:860
calls _dl_find_object on that pc and it gives
(gdb) p dlfo
$5 = {dlfo_flags = 0, dlfo_map_start = 0x7ffff7da0000, dlfo_map_end = 0x7ffff7dcf3c8, dlfo_link_map = 0x55555555c2d0, dlfo_eh_frame = 0x7ffff7dc8af4, __dflo_reserved = {0, 1933, 
    140737353382200, 140737353867808, 140737488346600, 140737488346596, 0}}
which looks ok because that address is at 0x232cb from start of the library.
In the bad case we get for the same address in libgcc_s loaded at the same base
$7 = {dlfo_flags = 0x0, dlfo_map_start = 0x0, dlfo_map_end = 0x7ffff7ffe310, dlfo_link_map = 0x7ffff7ffdda0, dlfo_eh_frame = 0x7ffff7ff5c58, __dflo_reserved = {0x0, 0x78d, 
    0x7ffff7f47938, 0x7ffff7fbe220, 0x7fffffffdde8, 0x7fffffffdde4, 0x0}}
so it looks like the ld.so entry spans all addresses from 0 to end of the ld.so mapping (0x7ffff7ffe310 matches
7ffff7fc6000-7ffff7fc7000 r--p 00000000 103:03 1853108                   /tmp/build2/elf/ld.so
7ffff7fc7000-7ffff7fef000 r-xp 00001000 103:03 1853108                   /tmp/build2/elf/ld.so
7ffff7fef000-7ffff7ffb000 r--p 00029000 103:03 1853108                   /tmp/build2/elf/ld.so
7ffff7ffb000-7ffff7ffd000 r--p 00035000 103:03 1853108                   /tmp/build2/elf/ld.so
7ffff7ffd000-7ffff7ffe000 rw-p 00037000 103:03 1853108                   /tmp/build2/elf/ld.so
7ffff7ffe000-7ffff7fff000 rw-p 00000000 00:00 0 
).
That is
  _dl_rtld_map.l_map_start = (ElfW(Addr)) &__ehdr_start;
  _dl_rtld_map.l_map_end = (ElfW(Addr)) _end;
__ehdr_start is for both the good and bad linker:
readelf -Wa build*/elf/ld.so | grep __ehdr_start
   354: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT    1 __ehdr_start
   354: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT    1 __ehdr_start
so it just seems that in the bad case the relocation for it hasn't been done.
Comment 14 Jakub Jelinek 2025-06-14 15:55:13 UTC
Seems _dl_start_final is in this configuration inlined into _dl_start and the important difference is (-fstrict-aliasing to -fno-strict-aliasing):
@@ -1206,11 +1207,8 @@ _dl_start:
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
-       leaq    __ehdr_start(%rip), %rsi
        leaq    _end(%rip), %rax
-       movq    %rsi, %xmm2
        movq    %rax, %xmm3
-       punpcklqdq      %xmm3, %xmm2
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        pushq   %r15
@@ -1225,214 +1223,176 @@ _dl_start:
        .cfi_offset 12, -48
        .cfi_offset 3, -56
        movq    %rdi, -136(%rbp)
+       movq    .LC31(%rip), %xmm2
+       punpcklqdq      %xmm3, %xmm2
        movaps  %xmm2, -128(%rbp)
        rdtsc
+       leaq    __ehdr_start(%rip), %rdi
        andb    $-33, 854+_dl_rtld_map(%rip)
-       leaq    64+_dl_rtld_map(%rip), %rcx
-       movl    $1879048191, %r8d
-       movl    $1879048233, %r9d
+       movq    %rdi, _dl_rtld_map(%rip)
        salq    $32, %rdx
-       movq    %rsi, _dl_rtld_map(%rip)
        orq     %rdx, %rax
        leaq    _DYNAMIC(%rip), %rdx
        movq    %rax, start_time(%rip)
        movq    _DYNAMIC(%rip), %rax
...
@@ -6289,9 +6296,13 @@ _rtld_global_ro:
        .globl  _rtld_local_ro
        .hidden _rtld_local_ro
        .set    _rtld_local_ro,_rtld_global_ro
+       .section        .data.rel.ro.local
+       .align 8
+.LC31:
+       .quad   __ehdr_start
        .section        .rodata.cst16,"aM",@progbits,16
        .align 16
-.LC74:
+.LC75:
        .quad   -1
        .quad   0
        .hidden __rtld_libc_freeres
The compiler has decided to vectorize the __ehdr_start and _end stores in both cases, optimized dump has similar code like:
  _359 = (long unsigned int) &_end;
  _358 = (long unsigned int) &__ehdr_start;
  _357 = {_358, _359};
...
  MEM <vector(2) long unsigned int> [(long unsigned int *)&_dl_rtld_map + 912B] = _357;
Before RA we have in both cases something like
(insn 15 3 1129 2 (set (reg/f:DI 253)
        (symbol_ref:DI ("__ehdr_start") [flags 0x42]  <var_decl 0x7fb465c3e1b0 __ehdr_start>)) 84 {*movdi_internal}
     (expr_list:REG_EQUIV (symbol_ref:DI ("__ehdr_start") [flags 0x42]  <var_decl 0x7fb465c3e1b0 __ehdr_start>)
        (nil)))
(insn 1129 15 16 2 (set (reg/f:DI 252)
        (symbol_ref:DI ("_end") [flags 0x42]  <var_decl 0x7fb465c89000 _end>)) 84 {*movdi_internal}
     (expr_list:REG_EQUIV (symbol_ref:DI ("_end") [flags 0x42]  <var_decl 0x7fb465c89000 _end>)
        (nil)))
(insn 16 1129 18 2 (set (reg:V2DI 235 [ _357 ])
        (vec_concat:V2DI (reg/f:DI 253)
            (reg/f:DI 252))) 7525 {vec_concatv2di}
     (expr_list:REG_DEAD (reg/f:DI 252)
        (expr_list:REG_EQUIV (vec_concat:V2DI (symbol_ref:DI ("__ehdr_start") [flags 0x42]  <var_decl 0x7fb465c3e1b0 __ehdr_start>)
                (symbol_ref:DI ("_end") [flags 0x42]  <var_decl 0x7fb465c89000 _end>))
            (nil))))
i.e. set one pseudo to __ehdr_start, another to _end and do vec_concat on that.
But in the -fno-strict-aliasing case RA decides to spill __ehdr_start into memory and
load from memory:
(insn 1102 13 1107 2 (set (reg/f:DI 0 ax [258])
        (symbol_ref:DI ("_end") [flags 0x42]  <var_decl 0x7f759e489000 _end>)) 84 {*movdi_internal}
     (expr_list:REG_EQUIV (symbol_ref:DI ("_end") [flags 0x42]  <var_decl 0x7f759e489000 _end>)
        (nil)))
(insn 1107 1102 1109 2 (set (reg:DI 22 xmm2 [orig:243 _359 ] [243])
        (mem/u/c:DI (symbol_ref/u:DI ("*.LC31") [flags 0x2]) [0  S8 A64])) 84 {*movdi_internal}
     (nil))
(insn 1109 1107 14 2 (set (reg/f:DI 23 xmm3 [258])
        (reg/f:DI 0 ax [258])) 84 {*movdi_internal}
     (nil))
(insn 14 1109 1108 2 (set (reg:V2DI 22 xmm2 [orig:243 _359 ] [243])
        (vec_concat:V2DI (reg:DI 22 xmm2 [orig:243 _359 ] [243])
            (reg/f:DI 23 xmm3 [258]))) 7525 {vec_concatv2di}
     (expr_list:REG_EQUIV (vec_concat:V2DI (symbol_ref:DI ("__ehdr_start") [flags 0x42]  <var_decl 0x7f759e43e1b0 __ehdr_start>)
            (symbol_ref:DI ("_end") [flags 0x42]  <var_decl 0x7f759e489000 _end>))
        (nil)))
while in the -fstrict-aliasing case it doesn't:
(insn 15 3 1129 2 (set (reg/f:DI 4 si [253])
        (symbol_ref:DI ("__ehdr_start") [flags 0x42]  <var_decl 0x7fb465c3e1b0 __ehdr_start>)) 84 {*movdi_internal}
     (expr_list:REG_EQUIV (symbol_ref:DI ("__ehdr_start") [flags 0x42]  <var_decl 0x7fb465c3e1b0 __ehdr_start>)
        (nil)))
(insn 1129 15 1134 2 (set (reg/f:DI 0 ax [252])
        (symbol_ref:DI ("_end") [flags 0x42]  <var_decl 0x7fb465c89000 _end>)) 84 {*movdi_internal}
     (expr_list:REG_EQUIV (symbol_ref:DI ("_end") [flags 0x42]  <var_decl 0x7fb465c89000 _end>)
        (nil)))
(insn 1134 1129 1136 2 (set (reg:DI 22 xmm2 [orig:235 _357 ] [235])
        (reg/f:DI 4 si [253])) 84 {*movdi_internal}
     (nil))
(insn 1136 1134 16 2 (set (reg/f:DI 23 xmm3 [252])
        (reg/f:DI 0 ax [252])) 84 {*movdi_internal}
     (nil))
(insn 16 1136 1135 2 (set (reg:V2DI 22 xmm2 [orig:235 _357 ] [235])
        (vec_concat:V2DI (reg:DI 22 xmm2 [orig:235 _357 ] [235])
            (reg/f:DI 23 xmm3 [252]))) 7525 {vec_concatv2di}
     (expr_list:REG_EQUIV (vec_concat:V2DI (symbol_ref:DI ("__ehdr_start") [flags 0x42]  <var_decl 0x7fb465c3e1b0 __ehdr_start>)
            (symbol_ref:DI ("_end") [flags 0x42]  <var_decl 0x7fb465c89000 _end>))
        (nil)))

I don't really see anything wrong here on the GCC side.
Perhaps rtld.c should be compiled with -fno-tree-vectorize -fno-slp-vectorize if the compiler supports those switches, or _dl_start and _dl_start_final should use optimize attribute to achive the same effect again if compiler supports that.
Comment 15 Florian Weimer 2025-06-14 19:31:14 UTC
Jakub, thanks for the really helpful analysis!

In glibc, we do additional gymnastics to self-relocate the dynamic linker earlier, with a compiler barrier, but only for HIDDEN_VAR_NEEDS_DYNAMIC_RELOC architectures (like MIPS)—not for x86-64. I admit that this is not entirely correct because GCC can materialize constants that require relocation into global data. This happens in the Fedora 42 build as well, for the rfv constant in dl_vdso_vsym in sysdeps/unix/sysv/linux/dl-vdso.h:

  const struct r_found_version rfv = { VDSO_NAME, VDSO_HASH, 1, NULL };

  /* Search the scope of the vdso map.  */
  const ElfW (Sym) *ref = &wsym;
  lookup_t result = GLRO (dl_lookup_symbol_x) (name, map, &ref,
                                               map->l_local_scope,
                                               &rfv, 0, 0, NULL);
  return ref != NULL ? DL_SYMBOL_ADDRESS (result, ref) : NULL;

It just so happens that the first call to dl_vdso_vsym happens after the call to ELF_DYNAMIC_RELOCATE for ld.so, which is why this works. But the entire design really assumes that the ELF_DYNAMIC_RELOCATE call is a no-op.

So maybe we are just naïve, and need to treat all architectures as HIDDEN_VAR_NEEDS_DYNAMIC_RELOC? Or get real, remove the ELF_DYNAMIC_RELOCATE call for ld.so on !HIDDEN_VAR_NEEDS_DYNAMIC_RELOC architectures because it should do nothing at all, and add a test case that requires that there are no dynamic relocations whatsoever. But it looks like the latter would require a way to request that GCC does not produce global data constants (with relocations) for initializing local variables.
Comment 16 Sergio Durigan Junior 2025-06-14 19:59:24 UTC
Thank you very much for the detailed investigation; much appreciated.

I'm not sure if it's still needed, but to reply to Jakub's request, here's how glibc is being built:

https://github.com/wolfi-dev/os/blob/main/glibc.yaml

I can obtain the build logs if needed; they're not readily accessible unfortunately.

The compiler flags are the ones I listed in the description, coming directly from OpenSSF.  The recipe above currently disables the hardening flags entirely (by setting GCC_SPEC_FILE to /dev/null, on line 52), but I obviously reenabled them for my tests.

Now, it's very interesting that -fno-strict-aliasing can cause such mess.  As I said, in my tests it appeared to be "-z now".  That Gentoo bug seems to be exactly the same thing as I'm reporting here.  I'll run some tests removing -fno-strict-aliasing (but keeping "-z now") and see if it makes a difference.
Comment 17 H.J. Lu 2025-06-14 22:09:08 UTC
(In reply to Jakub Jelinek from comment #14)
> Seems _dl_start_final is in this configuration inlined into _dl_start and
> the important difference is (-fstrict-aliasing to -fno-strict-aliasing):
> @@ -1206,11 +1207,8 @@ _dl_start:
>         pushq   %rbp
>         .cfi_def_cfa_offset 16
>         .cfi_offset 6, -16
> -       leaq    __ehdr_start(%rip), %rsi

This doesn't need run-time relocation.

>         leaq    _end(%rip), %rax
> -       movq    %rsi, %xmm2
>         movq    %rax, %xmm3
> -       punpcklqdq      %xmm3, %xmm2
>         movq    %rsp, %rbp
>         .cfi_def_cfa_register 6
>         pushq   %r15
> @@ -1225,214 +1223,176 @@ _dl_start:
>         .cfi_offset 12, -48
>         .cfi_offset 3, -56
>         movq    %rdi, -136(%rbp)
> +       movq    .LC31(%rip), %xmm2
> +       punpcklqdq      %xmm3, %xmm2
>         movaps  %xmm2, -128(%rbp)
>         rdtsc
> +       leaq    __ehdr_start(%rip), %rdi
>         andb    $-33, 854+_dl_rtld_map(%rip)
> -       leaq    64+_dl_rtld_map(%rip), %rcx
> -       movl    $1879048191, %r8d
> -       movl    $1879048233, %r9d
> +       movq    %rdi, _dl_rtld_map(%rip)
>         salq    $32, %rdx
> -       movq    %rsi, _dl_rtld_map(%rip)
>         orq     %rdx, %rax
>         leaq    _DYNAMIC(%rip), %rdx
>         movq    %rax, start_time(%rip)
>         movq    _DYNAMIC(%rip), %rax
> ...
> @@ -6289,9 +6296,13 @@ _rtld_global_ro:
>         .globl  _rtld_local_ro
>         .hidden _rtld_local_ro
>         .set    _rtld_local_ro,_rtld_global_ro
> +       .section        .data.rel.ro.local
> +       .align 8
> +.LC31:
> +       .quad   __ehdr_start

This requires the run-time relocation. This is another case of PR 103762.
Comment 18 H.J. Lu 2025-06-14 22:49:38 UTC
[hjl@gnu-zen4-1 cvise-1]$ cat x.i
typedef long Elf64_Addr;
struct
{
  Elf64_Addr l_map_start, l_map_end;
} _dl_rtld_map;
extern int __ehdr_start __attribute__((visibility("hidden")));
extern int _end __attribute__((visibility("hidden")));
void
__attribute___dl_start (void)
{
  _dl_rtld_map.l_map_start = (Elf64_Addr)&__ehdr_start;
  _dl_rtld_map.l_map_end = (Elf64_Addr)&_end;
}
[hjl@gnu-zen4-1 cvise-1]$ gcc -S -O2 -fPIC x.i
[hjl@gnu-zen4-1 cvise-1]$ cat x.s
	.file	"x.i"
	.text
	.p2align 4
	.globl	__attribute___dl_start
	.type	__attribute___dl_start, @function
__attribute___dl_start:
.LFB0:
	.cfi_startproc
	movq	.LC0(%rip), %xmm0
	leaq	_end(%rip), %rax
	movq	%rax, %xmm1
	movq	_dl_rtld_map@GOTPCREL(%rip), %rax
	punpcklqdq	%xmm1, %xmm0
	movups	%xmm0, (%rax)
	ret
	.cfi_endproc
.LFE0:
	.size	__attribute___dl_start, .-__attribute___dl_start
	.globl	_dl_rtld_map
	.bss
	.align 16
	.type	_dl_rtld_map, @object
	.size	_dl_rtld_map, 16
_dl_rtld_map:
	.zero	16
	.section	.data.rel.ro.local,"aw"
	.align 8
.LC0:
	.quad	__ehdr_start
	.hidden	__ehdr_start
	.hidden	_end
	.ident	"GCC: (GNU) 15.1.1 20250521 (Red Hat 15.1.1-2)"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-zen4-1 cvise-1]$ gcc -S -O1 -fPIC x.i
[hjl@gnu-zen4-1 cvise-1]$ cat x.s
	.file	"x.i"
	.text
	.globl	__attribute___dl_start
	.type	__attribute___dl_start, @function
__attribute___dl_start:
.LFB0:
	.cfi_startproc
	movq	_dl_rtld_map@GOTPCREL(%rip), %rax
	leaq	__ehdr_start(%rip), %rdx
	movq	%rdx, (%rax)
	leaq	_end(%rip), %rcx
	movq	%rcx, 8(%rax)
	ret
	.cfi_endproc
.LFE0:
	.size	__attribute___dl_start, .-__attribute___dl_start
	.globl	_dl_rtld_map
	.bss
	.align 16
	.type	_dl_rtld_map, @object
	.size	_dl_rtld_map, 16
_dl_rtld_map:
	.zero	16
	.hidden	_end
	.hidden	__ehdr_start
	.ident	"GCC: (GNU) 15.1.1 20250521 (Red Hat 15.1.1-2)"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-zen4-1 cvise-1]$
Comment 19 H.J. Lu 2025-06-15 01:17:33 UTC
I opened a glibc bug:

https://sourceware.org/bugzilla/show_bug.cgi?id=33088

and attached a patch:

https://sourceware.org/bugzilla/attachment.cgi?id=16138
Comment 20 H.J. Lu 2025-06-15 02:46:55 UTC
(In reply to Sergio Durigan Junior from comment #0)
> Hi,
> 
> This is an interesting bug which took me quite some time to (partially)
> understand.  I decided to file this upstream report to:
> 
> - See if an upstream developer could help me fully understand what's going
> one, and
> - Get a patch backported to GCC 14 to fix the issue.
> 
> It all started when we noticed that compiling a glibc using the following
> hardening flags (from the OpenSSF project) would lead to an abortion in
> certain scenarios:
> 
> ====
> *self_spec:
> + %{!O:%{!O1:%{!O2:%{!O3:%{!O0:%{!Os:%{!0fast:%{!0g:%{!0z:-O2}}}}}}}}}
> -fhardened -Wno-error=hardened -Wno-hardened
> %{!fdelete-null-pointer-checks:-fno-delete-null-pointer-checks}
> -fno-strict-overflow -fno-strict-aliasing
> %{!fomit-frame-pointer:-fno-omit-frame-pointer} -mno-omit-leaf-frame-pointer
> 
> *link:
> + --as-needed -O1 --sort-common -z noexecstack -z relro -z now
> ====
> 
> It is important to notice that:
> 
> - The bug only happens when using a glibc compiled with the "-z now"
> hardening flag.  If the flag is removed, then the abort doesn't occur.
> - The bug only happens when using a glibc compiled with GCC 14.x (14.3
> included).

Please provide the output of

# readelf -rW elf/rtld.os | grep __ehdr_start

on the bad glibc build.  Is there

0000000000000000  0000000800000001 R_X86_64_64            0000000000000000 __ehdr_start + 0
Comment 21 H.J. Lu 2025-06-16 00:46:57 UTC
Moved to glibc:

https://sourceware.org/bugzilla/show_bug.cgi?id=33088