Hi, This is an interesting bug which took me quite some time to (partially) understand. I decided to file this upstream report to: - See if an upstream developer could help me fully understand what's going one, and - Get a patch backported to GCC 14 to fix the issue. It all started when we noticed that compiling a glibc using the following hardening flags (from the OpenSSF project) would lead to an abortion in certain scenarios: ==== *self_spec: + %{!O:%{!O1:%{!O2:%{!O3:%{!O0:%{!Os:%{!0fast:%{!0g:%{!0z:-O2}}}}}}}}} -fhardened -Wno-error=hardened -Wno-hardened %{!fdelete-null-pointer-checks:-fno-delete-null-pointer-checks} -fno-strict-overflow -fno-strict-aliasing %{!fomit-frame-pointer:-fno-omit-frame-pointer} -mno-omit-leaf-frame-pointer *link: + --as-needed -O1 --sort-common -z noexecstack -z relro -z now ==== It is important to notice that: - The bug only happens when using a glibc compiled with the "-z now" hardening flag. If the flag is removed, then the abort doesn't occur. - The bug only happens when using a glibc compiled with GCC 14.x (14.3 included). - The bug *does not* happen with GCC 15. I bisected and found a commit that fixes the problem; see below. After some investigation, I was able to determine that the problem seemed to be happening while unwinding the stack. For example, this is the backtrace I would get when running "python3 -c 'import matplotlib'": #0 0x00007c43afe9972c in __pthread_kill_implementation () from /lib/libc.so.6 #1 0x00007c43afe3d8be in raise () from /lib/libc.so.6 #2 0x00007c43afe2531f in abort () from /lib/libc.so.6 #3 0x00007c43af84f79d in uw_init_context_1[cold] () from /usr/lib/libgcc_s.so.1 #4 0x00007c43af86d4d8 in _Unwind_RaiseException () from /usr/lib/libgcc_s.so.1 #5 0x00007c43acac9014 in __cxxabiv1::__cxa_throw (obj=0x5b7d7f52fab0, warning: could not convert 'std::type_info' from the host encoding (ISO-8859-1) to UTF-32. This normally should not happen, please file a bug report. tinfo=0x7c429b6fd218 <typeinfo for pybind11::attribute_error>, dest=0x7c429b5f7f70 <pybind11::reference_cast_error::~reference_cast_error() [clone .lto_priv.0]>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:93 #6 0x00007c429b5ec3a7 in ft2font__getattr__(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) [clone .lto_priv.0] [clone .cold] () from /usr/lib/python3.13/site-packages/matplotlib/ft2font.cpython-313-x86_64-linux-gnu.so #7 0x00007c429b62f086 in pybind11::cpp_function::initialize<pybind11::object (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::object, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, pybind11::name, pybind11::scope, pybind11::sibling>(pybind11::object (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::object (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#1}::_FUN(pybind11::detail::function_call&) [clone .lto_priv.0] () from /usr/lib/python3.13/site-packages/matplotlib/ft2font.cpython-313-x86_64-linux-gnu.so #8 0x00007c429b603886 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /usr/lib/python3.13/site-packages/matplotlib/ft2font.cpython-313-x86_64-linux-gnu.so ... and this is the backtrace I would get when running Emacs: #0 0x00007eede329972c in __pthread_kill_implementation () from /lib/libc.so.6 #1 0x00007eede323d8be in raise () from /lib/libc.so.6 #2 0x00007eede322531f in abort () from /lib/libc.so.6 #3 0x00007eede262879d in uw_init_context_1[cold] () from /usr/lib/libgcc_s.so.1 #4 0x00007eede2646e7c in _Unwind_Backtrace () from /usr/lib/libgcc_s.so.1 #5 0x00007eede3327b11 in backtrace () from /lib/libc.so.6 #6 0x000059535963a8a1 in emacs_backtrace () #7 0x000059535956499a in main () After some more debugging, it became clear that the issue abort was being triggered by this asm excerpt from libgcc: ... 0x00007eede2645e82 <+146>: call 0x7eede2644a90 <uw_frame_state_for> 0x00007eede2645e87 <+151>: test %eax,%eax 0x00007eede2645e89 <+153>: jne 0x7eede2628798 <uw_init_context_1.cold> ... which led me to (apologies for the Microsoft github links): https://github.com/gcc-mirror/gcc/blob/c0be0298a9553c13af642f45628a15d833473657/libgcc/unwind-dw2.c#L1340-L1341 I also checked the contents of %eax, and it the value of _URC_END_OF_STACK. So now we know that this is the cause of the abort. I decided to proceed further and compile a new GCC with debuginfo so that I could analyse what uw_frame_state_for is doing. This took me down a rabbit hole, but eventually I found that the problem is that fde is NULL here: https://github.com/gcc-mirror/gcc/blob/c0be0298a9553c13af642f45628a15d833473657/libgcc/unwind-dw2.c#L1007-L1019 Because amd64 defines MD_FALLBACK_FRAME_STATE_FOR, we end up calling it inside the if statement, but that's not really important. What's important is understanding that _Unwind_Find_FDE never returns NULL when glibc isn't hardened with "-z now", but does return NULL sometimes when glibc is hardened with "-z now". Now, _Unwind_Find_FDE is complex and deals with CFI and .eh_frame in order to find the frame description information and properly unwind the stack. It eventually calls find_fde_tail, which does the hard job. Here we get to the end of the line, at least for my debugging session. I was able to track down the exact spot where find_fde_tail is returning NULL: https://github.com/gcc-mirror/gcc/blob/c0be0298a9553c13af642f45628a15d833473657/libgcc/unwind-dw2-fde-dip.c#L454-L455 When I looked at the addresses for pc and table[0].initial_loc + data_base, I found that the latter is an address that falls within /lib/ld-linux-x86-64.so.2's text section, while the former fell within libgcc's text. Since the loader is mapped close to the end of the process virtual memory, the comparison is true and the function ends up returning NULL. On a non-hardened glibc, I noticed that both addresses would fall within libgcc's sections, which wouldn't trigger the return. I was also able to come up with a simple reproducer for the problem. We can trigger a stack unwinding by using backtrace(3): ==== #include <execinfo.h> int main(int argc, char *argv[]) { void *a[4096]; backtrace (a, 100); return 0; } ==== And as I said, I was also able to bisect GCC and found that the following commit seems to fix the issue: ==== commit 99b1daae18c095d6c94d32efb77442838e11cbfb Author: Richard Biener <rguenther@suse.de> Date: Fri May 3 14:04:41 2024 +0200 tree-optimization/114589 - remove profile based sink heuristics ==== That's the reason why I assigned this bug to the tree-optimization component, and will Cc Richard as well. My GCC-fu only took me to this point, so I'm hoping that someone can further explain what exactly the commit is doing, and why it seems to fix the issue. Thanks.
Kudos to my friend Gabriel F. T. Gomes for helping with debugging this, btw!
I think we need either detailed instruction how exactly was glibc built, or details on what exact address' FDE in ld-linux-x86-64.so.2 was attempted to be found and wasn't found, around which function etc. Because none of the above backtraces actually show any address in ld-linux-x86-64.so.2.
The -fno-strict-aliasing (and some of the other options) option as a "hardening" measure is ridiculous and they shouldn't be recommending that, see https://github.com/ossf/wg-best-practices-os-developers/issues/660. Anyway, we actually had a similar report of this from a user who decided to use -fno-strict-aliasing to build glibc: https://bugs.gentoo.org/955635.
I can reproduce in a vanilla fedora:41 container (=> no spec or default patching; should really consider that on our end too) with just build deps installed, then with the glibc-2.41 tarball: ``` $ mkdir /tmp/build && cd /tmp/build $ /tmp/glibc-2.41/configure --prefix=/usr $ make -j$(nproc) -l$(nproc) $ make -j$(nproc) -l$(nproc) check [...] === Summary of results === 5 FAIL 6008 PASS 122 UNSUPPORTED 16 XFAIL 4 XPASS ``` Those are harmless and related to the container. Then: ``` $ mkdir /tmp/build-mangled && cd /tmp/build-mangled $ /tmp/glibc-2.41/configure --prefix=/usr CFLAGS="-O2 -fno-strict-aliasing" $ make -j$(nproc) -l$(nproc) $ make -j$(nproc) -l$(nproc) check [...] === Summary of results === 130 FAIL 5883 PASS 122 UNSUPPORTED 16 XFAIL 4 XPASS $ grep ^FAIL tests.sum | grep back FAIL: debug/backtrace-tst FAIL: debug/tst-backtrace2 FAIL: debug/tst-backtrace3 FAIL: debug/tst-backtrace4 FAIL: debug/tst-backtrace5 FAIL: debug/tst-backtrace6 FAIL: nptl/tst-backtrace1 ```
Created attachment 61636 [details] rtld.i.xz Building elf/rtld.o with -fno-strict-aliasing breaks it. Attached rtld.i.xz. It's built with: ``` gcc rtld.c -c -std=gnu11 -fgnu89-inline -O2 -Wall -Wwrite-strings -Wundef -Wimplicit-fallthrough -fmerge-all-constants -frounding-math -fno-stack-protector -fno-common -U_FORTIFY_SOURCE -Wstrict-prototypes -Wold-style-definition -fmath-errno -fPIC -fno-stack-protector -DSTACK_PROTECTOR_LEVEL=0 -mno-mmx -fno-tree-loop-distribute-patterns '-DSYSCONFDIR="/etc"' -ftls-model=initial-exec -I../include -I/tmp/build-mangled/elf -I/tmp/build-mangled -I../sysdeps/unix/sysv/linux/x86_64/64 -I../sysdeps/unix/sysv/linux/x86_64/include -I../sysdeps/unix/sysv/linux/x86_64 -I../sysdeps/unix/sysv/linux/x86/include -I../sysdeps/unix/sysv/linux/x86 -I../sysdeps/x86/nptl -I../sysdeps/unix/sysv/linux/wordsize-64 -I../sysdeps/x86_64/nptl -I../sysdeps/unix/sysv/linux/include -I../sysdeps/unix/sysv/linux -I../sysdeps/nptl -I../sysdeps/pthread -I../sysdeps/gnu -I../sysdeps/unix/inet -I../sysdeps/unix/sysv -I../sysdeps/unix/x86_64 -I../sysdeps/unix -I../sysdeps/posix -I../sysdeps/x86_64/64 -I../sysdeps/x86_64/fpu/multiarch -I../sysdeps/x86_64/fpu -I../sysdeps/x86/fpu -I../sysdeps/x86_64/multiarch -I../sysdeps/x86_64 -I../sysdeps/x86/include -I../sysdeps/x86 -I../sysdeps/ieee754/float128 -I../sysdeps/ieee754/ldbl-96/include -I../sysdeps/ieee754/ldbl-96 -I../sysdeps/ieee754/dbl-64 -I../sysdeps/ieee754/flt-32 -I../sysdeps/wordsize-64 -I../sysdeps/ieee754 -I../sysdeps/generic -I.. -I../libio -I. -D_LIBC_REENTRANT -include /tmp/build-mangled/libc-modules.h -DMODULE_NAME=rtld -include ../include/libc-symbols.h -DPIC -DSHARED -DTOP_NAMESPACE=glibc -o /tmp/build-mangled/elf/rtld.os -MD -MP -MF /tmp/build-mangled/elf/rtld.os.dt -MT /tmp/build-mangled/elf/rtld.os -save-temps ``` Adding -fno-strict-aliasing in there and using a small script to relink and build the tests makes it fail.
Created attachment 61637 [details] rtld-unwind.tar.xz (tarball containing good/bad rtld.os)
The diff between the two is a bit noisy and the layout is a little different. Dunno if it's interesting but in the bad object, __ehdr_start appears in .rela.data.rel.ro.local. ``` │ -Relocation section '.rela.data.rel.ro.local' at offset 0xe408 contains 1 entry: │ +Relocation section '.rela.data.rel.ro.local' at offset 0xe668 contains 2 entries: │ Offset Info Type Symbol's Value Symbol's Name + Addend │ 0000000000000000 0000000900000001 R_X86_64_64 0000000000000000 .rodata.str1.1 + 4c4 │ +0000000000000018 000000c100000001 R_X86_64_64 0000000000000000 __ehdr_start + 0 ```
Created attachment 61638 [details] build.sh
optimize("no-strict-aliasing") on _dl_start_final is enough to break it.
(In reply to Sam James from comment #9) > optimize("no-strict-aliasing") on _dl_start_final is enough to break it. │ Relocation section '.rela.data.rel.ro.local' at offset 0xe408 contains 1 entry: │ Offset Info Type Symbol's Value Symbol's Name + Addend │ -0000000000000000 0000000900000001 R_X86_64_64 0000000000000000 .rodata.str1.1 + 4c4 │ +0000000000000000 000000c100000001 R_X86_64_64 0000000000000000 __ehdr_start + 0
FWIW the *self_spec line contains typos (0fast, 0g, 0z).
Reproduced. It is actually _Unwind_Find_FDE not finding context->ra from libgcc_s.so.1 in the bad ld.so case and finding it in the correct case.
In the good case, _Unwind_Find_FDE (pc=0x7ffff7dc32cb <_Unwind_Backtrace+59>, bases=bases@entry=0x7fffffffe008) at ../../../libgcc/unwind-dw2-btree.h:860 calls _dl_find_object on that pc and it gives (gdb) p dlfo $5 = {dlfo_flags = 0, dlfo_map_start = 0x7ffff7da0000, dlfo_map_end = 0x7ffff7dcf3c8, dlfo_link_map = 0x55555555c2d0, dlfo_eh_frame = 0x7ffff7dc8af4, __dflo_reserved = {0, 1933, 140737353382200, 140737353867808, 140737488346600, 140737488346596, 0}} which looks ok because that address is at 0x232cb from start of the library. In the bad case we get for the same address in libgcc_s loaded at the same base $7 = {dlfo_flags = 0x0, dlfo_map_start = 0x0, dlfo_map_end = 0x7ffff7ffe310, dlfo_link_map = 0x7ffff7ffdda0, dlfo_eh_frame = 0x7ffff7ff5c58, __dflo_reserved = {0x0, 0x78d, 0x7ffff7f47938, 0x7ffff7fbe220, 0x7fffffffdde8, 0x7fffffffdde4, 0x0}} so it looks like the ld.so entry spans all addresses from 0 to end of the ld.so mapping (0x7ffff7ffe310 matches 7ffff7fc6000-7ffff7fc7000 r--p 00000000 103:03 1853108 /tmp/build2/elf/ld.so 7ffff7fc7000-7ffff7fef000 r-xp 00001000 103:03 1853108 /tmp/build2/elf/ld.so 7ffff7fef000-7ffff7ffb000 r--p 00029000 103:03 1853108 /tmp/build2/elf/ld.so 7ffff7ffb000-7ffff7ffd000 r--p 00035000 103:03 1853108 /tmp/build2/elf/ld.so 7ffff7ffd000-7ffff7ffe000 rw-p 00037000 103:03 1853108 /tmp/build2/elf/ld.so 7ffff7ffe000-7ffff7fff000 rw-p 00000000 00:00 0 ). That is _dl_rtld_map.l_map_start = (ElfW(Addr)) &__ehdr_start; _dl_rtld_map.l_map_end = (ElfW(Addr)) _end; __ehdr_start is for both the good and bad linker: readelf -Wa build*/elf/ld.so | grep __ehdr_start 354: 0000000000000000 0 NOTYPE LOCAL DEFAULT 1 __ehdr_start 354: 0000000000000000 0 NOTYPE LOCAL DEFAULT 1 __ehdr_start so it just seems that in the bad case the relocation for it hasn't been done.
Seems _dl_start_final is in this configuration inlined into _dl_start and the important difference is (-fstrict-aliasing to -fno-strict-aliasing): @@ -1206,11 +1207,8 @@ _dl_start: pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 - leaq __ehdr_start(%rip), %rsi leaq _end(%rip), %rax - movq %rsi, %xmm2 movq %rax, %xmm3 - punpcklqdq %xmm3, %xmm2 movq %rsp, %rbp .cfi_def_cfa_register 6 pushq %r15 @@ -1225,214 +1223,176 @@ _dl_start: .cfi_offset 12, -48 .cfi_offset 3, -56 movq %rdi, -136(%rbp) + movq .LC31(%rip), %xmm2 + punpcklqdq %xmm3, %xmm2 movaps %xmm2, -128(%rbp) rdtsc + leaq __ehdr_start(%rip), %rdi andb $-33, 854+_dl_rtld_map(%rip) - leaq 64+_dl_rtld_map(%rip), %rcx - movl $1879048191, %r8d - movl $1879048233, %r9d + movq %rdi, _dl_rtld_map(%rip) salq $32, %rdx - movq %rsi, _dl_rtld_map(%rip) orq %rdx, %rax leaq _DYNAMIC(%rip), %rdx movq %rax, start_time(%rip) movq _DYNAMIC(%rip), %rax ... @@ -6289,9 +6296,13 @@ _rtld_global_ro: .globl _rtld_local_ro .hidden _rtld_local_ro .set _rtld_local_ro,_rtld_global_ro + .section .data.rel.ro.local + .align 8 +.LC31: + .quad __ehdr_start .section .rodata.cst16,"aM",@progbits,16 .align 16 -.LC74: +.LC75: .quad -1 .quad 0 .hidden __rtld_libc_freeres The compiler has decided to vectorize the __ehdr_start and _end stores in both cases, optimized dump has similar code like: _359 = (long unsigned int) &_end; _358 = (long unsigned int) &__ehdr_start; _357 = {_358, _359}; ... MEM <vector(2) long unsigned int> [(long unsigned int *)&_dl_rtld_map + 912B] = _357; Before RA we have in both cases something like (insn 15 3 1129 2 (set (reg/f:DI 253) (symbol_ref:DI ("__ehdr_start") [flags 0x42] <var_decl 0x7fb465c3e1b0 __ehdr_start>)) 84 {*movdi_internal} (expr_list:REG_EQUIV (symbol_ref:DI ("__ehdr_start") [flags 0x42] <var_decl 0x7fb465c3e1b0 __ehdr_start>) (nil))) (insn 1129 15 16 2 (set (reg/f:DI 252) (symbol_ref:DI ("_end") [flags 0x42] <var_decl 0x7fb465c89000 _end>)) 84 {*movdi_internal} (expr_list:REG_EQUIV (symbol_ref:DI ("_end") [flags 0x42] <var_decl 0x7fb465c89000 _end>) (nil))) (insn 16 1129 18 2 (set (reg:V2DI 235 [ _357 ]) (vec_concat:V2DI (reg/f:DI 253) (reg/f:DI 252))) 7525 {vec_concatv2di} (expr_list:REG_DEAD (reg/f:DI 252) (expr_list:REG_EQUIV (vec_concat:V2DI (symbol_ref:DI ("__ehdr_start") [flags 0x42] <var_decl 0x7fb465c3e1b0 __ehdr_start>) (symbol_ref:DI ("_end") [flags 0x42] <var_decl 0x7fb465c89000 _end>)) (nil)))) i.e. set one pseudo to __ehdr_start, another to _end and do vec_concat on that. But in the -fno-strict-aliasing case RA decides to spill __ehdr_start into memory and load from memory: (insn 1102 13 1107 2 (set (reg/f:DI 0 ax [258]) (symbol_ref:DI ("_end") [flags 0x42] <var_decl 0x7f759e489000 _end>)) 84 {*movdi_internal} (expr_list:REG_EQUIV (symbol_ref:DI ("_end") [flags 0x42] <var_decl 0x7f759e489000 _end>) (nil))) (insn 1107 1102 1109 2 (set (reg:DI 22 xmm2 [orig:243 _359 ] [243]) (mem/u/c:DI (symbol_ref/u:DI ("*.LC31") [flags 0x2]) [0 S8 A64])) 84 {*movdi_internal} (nil)) (insn 1109 1107 14 2 (set (reg/f:DI 23 xmm3 [258]) (reg/f:DI 0 ax [258])) 84 {*movdi_internal} (nil)) (insn 14 1109 1108 2 (set (reg:V2DI 22 xmm2 [orig:243 _359 ] [243]) (vec_concat:V2DI (reg:DI 22 xmm2 [orig:243 _359 ] [243]) (reg/f:DI 23 xmm3 [258]))) 7525 {vec_concatv2di} (expr_list:REG_EQUIV (vec_concat:V2DI (symbol_ref:DI ("__ehdr_start") [flags 0x42] <var_decl 0x7f759e43e1b0 __ehdr_start>) (symbol_ref:DI ("_end") [flags 0x42] <var_decl 0x7f759e489000 _end>)) (nil))) while in the -fstrict-aliasing case it doesn't: (insn 15 3 1129 2 (set (reg/f:DI 4 si [253]) (symbol_ref:DI ("__ehdr_start") [flags 0x42] <var_decl 0x7fb465c3e1b0 __ehdr_start>)) 84 {*movdi_internal} (expr_list:REG_EQUIV (symbol_ref:DI ("__ehdr_start") [flags 0x42] <var_decl 0x7fb465c3e1b0 __ehdr_start>) (nil))) (insn 1129 15 1134 2 (set (reg/f:DI 0 ax [252]) (symbol_ref:DI ("_end") [flags 0x42] <var_decl 0x7fb465c89000 _end>)) 84 {*movdi_internal} (expr_list:REG_EQUIV (symbol_ref:DI ("_end") [flags 0x42] <var_decl 0x7fb465c89000 _end>) (nil))) (insn 1134 1129 1136 2 (set (reg:DI 22 xmm2 [orig:235 _357 ] [235]) (reg/f:DI 4 si [253])) 84 {*movdi_internal} (nil)) (insn 1136 1134 16 2 (set (reg/f:DI 23 xmm3 [252]) (reg/f:DI 0 ax [252])) 84 {*movdi_internal} (nil)) (insn 16 1136 1135 2 (set (reg:V2DI 22 xmm2 [orig:235 _357 ] [235]) (vec_concat:V2DI (reg:DI 22 xmm2 [orig:235 _357 ] [235]) (reg/f:DI 23 xmm3 [252]))) 7525 {vec_concatv2di} (expr_list:REG_EQUIV (vec_concat:V2DI (symbol_ref:DI ("__ehdr_start") [flags 0x42] <var_decl 0x7fb465c3e1b0 __ehdr_start>) (symbol_ref:DI ("_end") [flags 0x42] <var_decl 0x7fb465c89000 _end>)) (nil))) I don't really see anything wrong here on the GCC side. Perhaps rtld.c should be compiled with -fno-tree-vectorize -fno-slp-vectorize if the compiler supports those switches, or _dl_start and _dl_start_final should use optimize attribute to achive the same effect again if compiler supports that.
Jakub, thanks for the really helpful analysis! In glibc, we do additional gymnastics to self-relocate the dynamic linker earlier, with a compiler barrier, but only for HIDDEN_VAR_NEEDS_DYNAMIC_RELOC architectures (like MIPS)—not for x86-64. I admit that this is not entirely correct because GCC can materialize constants that require relocation into global data. This happens in the Fedora 42 build as well, for the rfv constant in dl_vdso_vsym in sysdeps/unix/sysv/linux/dl-vdso.h: const struct r_found_version rfv = { VDSO_NAME, VDSO_HASH, 1, NULL }; /* Search the scope of the vdso map. */ const ElfW (Sym) *ref = &wsym; lookup_t result = GLRO (dl_lookup_symbol_x) (name, map, &ref, map->l_local_scope, &rfv, 0, 0, NULL); return ref != NULL ? DL_SYMBOL_ADDRESS (result, ref) : NULL; It just so happens that the first call to dl_vdso_vsym happens after the call to ELF_DYNAMIC_RELOCATE for ld.so, which is why this works. But the entire design really assumes that the ELF_DYNAMIC_RELOCATE call is a no-op. So maybe we are just naïve, and need to treat all architectures as HIDDEN_VAR_NEEDS_DYNAMIC_RELOC? Or get real, remove the ELF_DYNAMIC_RELOCATE call for ld.so on !HIDDEN_VAR_NEEDS_DYNAMIC_RELOC architectures because it should do nothing at all, and add a test case that requires that there are no dynamic relocations whatsoever. But it looks like the latter would require a way to request that GCC does not produce global data constants (with relocations) for initializing local variables.
Thank you very much for the detailed investigation; much appreciated. I'm not sure if it's still needed, but to reply to Jakub's request, here's how glibc is being built: https://github.com/wolfi-dev/os/blob/main/glibc.yaml I can obtain the build logs if needed; they're not readily accessible unfortunately. The compiler flags are the ones I listed in the description, coming directly from OpenSSF. The recipe above currently disables the hardening flags entirely (by setting GCC_SPEC_FILE to /dev/null, on line 52), but I obviously reenabled them for my tests. Now, it's very interesting that -fno-strict-aliasing can cause such mess. As I said, in my tests it appeared to be "-z now". That Gentoo bug seems to be exactly the same thing as I'm reporting here. I'll run some tests removing -fno-strict-aliasing (but keeping "-z now") and see if it makes a difference.
(In reply to Jakub Jelinek from comment #14) > Seems _dl_start_final is in this configuration inlined into _dl_start and > the important difference is (-fstrict-aliasing to -fno-strict-aliasing): > @@ -1206,11 +1207,8 @@ _dl_start: > pushq %rbp > .cfi_def_cfa_offset 16 > .cfi_offset 6, -16 > - leaq __ehdr_start(%rip), %rsi This doesn't need run-time relocation. > leaq _end(%rip), %rax > - movq %rsi, %xmm2 > movq %rax, %xmm3 > - punpcklqdq %xmm3, %xmm2 > movq %rsp, %rbp > .cfi_def_cfa_register 6 > pushq %r15 > @@ -1225,214 +1223,176 @@ _dl_start: > .cfi_offset 12, -48 > .cfi_offset 3, -56 > movq %rdi, -136(%rbp) > + movq .LC31(%rip), %xmm2 > + punpcklqdq %xmm3, %xmm2 > movaps %xmm2, -128(%rbp) > rdtsc > + leaq __ehdr_start(%rip), %rdi > andb $-33, 854+_dl_rtld_map(%rip) > - leaq 64+_dl_rtld_map(%rip), %rcx > - movl $1879048191, %r8d > - movl $1879048233, %r9d > + movq %rdi, _dl_rtld_map(%rip) > salq $32, %rdx > - movq %rsi, _dl_rtld_map(%rip) > orq %rdx, %rax > leaq _DYNAMIC(%rip), %rdx > movq %rax, start_time(%rip) > movq _DYNAMIC(%rip), %rax > ... > @@ -6289,9 +6296,13 @@ _rtld_global_ro: > .globl _rtld_local_ro > .hidden _rtld_local_ro > .set _rtld_local_ro,_rtld_global_ro > + .section .data.rel.ro.local > + .align 8 > +.LC31: > + .quad __ehdr_start This requires the run-time relocation. This is another case of PR 103762.
[hjl@gnu-zen4-1 cvise-1]$ cat x.i typedef long Elf64_Addr; struct { Elf64_Addr l_map_start, l_map_end; } _dl_rtld_map; extern int __ehdr_start __attribute__((visibility("hidden"))); extern int _end __attribute__((visibility("hidden"))); void __attribute___dl_start (void) { _dl_rtld_map.l_map_start = (Elf64_Addr)&__ehdr_start; _dl_rtld_map.l_map_end = (Elf64_Addr)&_end; } [hjl@gnu-zen4-1 cvise-1]$ gcc -S -O2 -fPIC x.i [hjl@gnu-zen4-1 cvise-1]$ cat x.s .file "x.i" .text .p2align 4 .globl __attribute___dl_start .type __attribute___dl_start, @function __attribute___dl_start: .LFB0: .cfi_startproc movq .LC0(%rip), %xmm0 leaq _end(%rip), %rax movq %rax, %xmm1 movq _dl_rtld_map@GOTPCREL(%rip), %rax punpcklqdq %xmm1, %xmm0 movups %xmm0, (%rax) ret .cfi_endproc .LFE0: .size __attribute___dl_start, .-__attribute___dl_start .globl _dl_rtld_map .bss .align 16 .type _dl_rtld_map, @object .size _dl_rtld_map, 16 _dl_rtld_map: .zero 16 .section .data.rel.ro.local,"aw" .align 8 .LC0: .quad __ehdr_start .hidden __ehdr_start .hidden _end .ident "GCC: (GNU) 15.1.1 20250521 (Red Hat 15.1.1-2)" .section .note.GNU-stack,"",@progbits [hjl@gnu-zen4-1 cvise-1]$ gcc -S -O1 -fPIC x.i [hjl@gnu-zen4-1 cvise-1]$ cat x.s .file "x.i" .text .globl __attribute___dl_start .type __attribute___dl_start, @function __attribute___dl_start: .LFB0: .cfi_startproc movq _dl_rtld_map@GOTPCREL(%rip), %rax leaq __ehdr_start(%rip), %rdx movq %rdx, (%rax) leaq _end(%rip), %rcx movq %rcx, 8(%rax) ret .cfi_endproc .LFE0: .size __attribute___dl_start, .-__attribute___dl_start .globl _dl_rtld_map .bss .align 16 .type _dl_rtld_map, @object .size _dl_rtld_map, 16 _dl_rtld_map: .zero 16 .hidden _end .hidden __ehdr_start .ident "GCC: (GNU) 15.1.1 20250521 (Red Hat 15.1.1-2)" .section .note.GNU-stack,"",@progbits [hjl@gnu-zen4-1 cvise-1]$
I opened a glibc bug: https://sourceware.org/bugzilla/show_bug.cgi?id=33088 and attached a patch: https://sourceware.org/bugzilla/attachment.cgi?id=16138
(In reply to Sergio Durigan Junior from comment #0) > Hi, > > This is an interesting bug which took me quite some time to (partially) > understand. I decided to file this upstream report to: > > - See if an upstream developer could help me fully understand what's going > one, and > - Get a patch backported to GCC 14 to fix the issue. > > It all started when we noticed that compiling a glibc using the following > hardening flags (from the OpenSSF project) would lead to an abortion in > certain scenarios: > > ==== > *self_spec: > + %{!O:%{!O1:%{!O2:%{!O3:%{!O0:%{!Os:%{!0fast:%{!0g:%{!0z:-O2}}}}}}}}} > -fhardened -Wno-error=hardened -Wno-hardened > %{!fdelete-null-pointer-checks:-fno-delete-null-pointer-checks} > -fno-strict-overflow -fno-strict-aliasing > %{!fomit-frame-pointer:-fno-omit-frame-pointer} -mno-omit-leaf-frame-pointer > > *link: > + --as-needed -O1 --sort-common -z noexecstack -z relro -z now > ==== > > It is important to notice that: > > - The bug only happens when using a glibc compiled with the "-z now" > hardening flag. If the flag is removed, then the abort doesn't occur. > - The bug only happens when using a glibc compiled with GCC 14.x (14.3 > included). Please provide the output of # readelf -rW elf/rtld.os | grep __ehdr_start on the bad glibc build. Is there 0000000000000000 0000000800000001 R_X86_64_64 0000000000000000 __ehdr_start + 0
Moved to glibc: https://sourceware.org/bugzilla/show_bug.cgi?id=33088