GCC generates invalid TLS sequences not understood by SPARC/Solaris linker so resulting binary dumps core. The following example shows dump of .o file containing TLS sequence that is not contiguous and uses register %i2 to hold temporary value. See an example: 48: 35 00 00 00 sethi %hi(0), %i2 48: R_SPARC_TLS_GD_HI22 _ZSt15__once_callable ... 50: b4 06 a0 00 add %i2, 0, %i2 50: R_SPARC_TLS_GD_LO10 _ZSt15__once_callable ... 68: b4 05 c0 1a add %l7, %i2, %i2 68: R_SPARC_TLS_GD_ADD _ZSt15__once_callable ... 398: 40 00 00 00 call 398 <_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN4llvm10ThreadPoolC4EjEUlvE_EEEEE6_M_runEv+0x398> 398: R_SPARC_TLS_GD_CALL _ZSt15__once_callable 39c: 90 10 00 1a mov %i2, %o0 Per spec [1] GD TLS sequnce should be 4 instructions in this order: General Dynamic Model Code Sequence Initial Relocation Symbol 0x00 sethi %hi(@dtlndx(x)), %o0 R_SPARC_TLS_GD_HI22 x 0x04 add %o0,%lo(@dtlndx(x)), %o0 R_SPARC_TLS_GD_LO10 x 0x08 add %l7,%o0, %o0 R_SPARC_TLS_GD_ADD x 0x0c call __tls_get_addr R_SPARC_TLS_GD_CALL x With comment: > The code sequence must appear in the code as is. It is not possible to move > the second add instruction in the delay slot of the call instruction since > the linker would not recognize the instruction sequence. Solaris linker doesn't expect anything in the branch delay slot of the above call instruction. An optimized sequence using a delay slot of call instruction related to R_SPARC_TLS_GD_CALL relocation confuses the linker and causes it to produce invalid code. Solaris linker in GD->IE translation replaces call instruction by an 'add'. So the above example at offset 398 becomes: example by > +0x398: call +0 ----> add %g7, %o0, %o0 > +0x39c: mov %i2, %o0 ----> mov %i2, %o0 Before linking the "mov" instruction executes before "call" instruction (it is in branch delay slot). After linking the 'mov' instruction executes in program order, so the effect of 'add' instruction is lost and register %o0 value becomes corrupted leading to SIGSEGV on dereference in best case. 3952 /* Return nonzero if TRIAL can go into the call delay slot. */ 3953 3954 int 3955 eligible_for_call_delay (rtx_insn *trial) 3956 { 3957 rtx pat; 3958 3959 if (get_attr_in_branch_delay (trial) == IN_BRANCH_DELAY_FALSE) 3960 return 0; 3961 3962 /* The only problematic cases are TLS sequences with Sun as/ld. */ 3963 if ((TARGET_GNU_TLS && HAVE_GNU_LD) || !TARGET_TLS) 3964 return 1; 3965 3966 pat = PATTERN (trial); 3967 3968 /* We must reject tgd_add{32|64}, i.e. 3969 (set (reg) (plus (reg) (unspec [(reg) (symbol_ref)] UNSPEC_TLSGD))) 3970 and tldm_add{32|64}, i.e. 3971 (set (reg) (plus (reg) (unspec [(reg) (symbol_ref)] UNSPEC_TLSLDM))) 3972 for Sun as/ld. */ 3973 if (GET_CODE (pat) == SET 3974 && GET_CODE (SET_SRC (pat)) == PLUS) 3975 { 3976 rtx unspec = XEXP (SET_SRC (pat), 1); 3977 3978 if (GET_CODE (unspec) == UNSPEC 3979 && (XINT (unspec, 1) == UNSPEC_TLSGD 3980 || XINT (unspec, 1) == UNSPEC_TLSLDM)) 3981 return 0; 3982 } 3983 3984 return 1; 3985 } 3986 My hypothesis is that the block / check at line 3973 catches only 'add' instructions in TLS sequence, but not if the TLS sequence calculates a temporary value and stores it in register (intermediate code): mov 'reg_with_temp_value', %o0 <-- eligible_for_call_delay() return 1 call R_SPARC_TLS_GD_CALL the 'mov' bypasses branch delay slot eligibility check andcompiler produces .o: call R_SPARC_TLS_GD_CALL mov 'reg_with_temp_value', %o0 Then Solaris linker comes and optimizes GD -> IE like this: add %g7, %o0, %o0 mov 'reg_with_temp_value', %o0 And breaks the binary. The check in eligible_for_call_delay() on line 3963 updated in [2] is fallen through as the gcc used in example above is compiled to use GNU 'as' but Solaris linker. The decision whether the instruction is eligible for delay slot is then taken by 3973, but that check unfortunately handles simple cases only. [1] https://www.uclibc.org/docs/tls.pdf [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93704
To add some background: I've been experiencing ca. 250 test failures in the LLVM testsuite when building for sparcv9-sun-solaris2.11. As a workaround, I've posted [cmake] Don't build with -O3 -fPIC on Solaris/sparcv9 https://reviews.llvm.org/D85630 based on the observation that the SEGVs vanish if LLVM is compiled with -O -fPIC only instead of the default -O3 -fPIC. I've been using the following slightly reduced example (attached): * tqt.cpp is reduced from LLVM's llvm/unittests/Support/TaskQueueTest.cpp. * ThreadPool.cpp is LLVM 10.0.0 llvm/lib/Support/ThreadPool.cpp. They can be compiled and linked with the included tqt.sh; however that relies on LLVM 10 bundled with Solaris 11.4 for LLVM headers, libLLVMSupport and libLLVMDemangle. I haven't tried to produce a further reduced example yet.
Created attachment 49063 [details] Non-reduced test case.
It looks like the logic has been flawed since day #1 and we should be testing the insn which *has* the delay slot instead of the insn which is in the delay slot.
Fixing.
Is there any workaround for this issue? Some gcc switch to disable the new optimization which came in GCC 10? Thanks!
> Is there any workaround for this issue? Some gcc switch to disable the new > optimization which came in GCC 10? The optimization (-fdelayed-branch) has been there for years though and you can always pass the associated -fno-xxx switch.
Just to confirm that '-fno-delayed-branch' as workaround seems to work (at least based on provided test case). Probably better is to modify the code like this: --- gcc-10.2.0/gcc/config/sparc/sparc.c +++ gcc-10.2.0/gcc/config/sparc/sparc.c @@ -3963,6 +3963,8 @@ if ((TARGET_GNU_TLS && HAVE_GNU_LD) || !TARGET_TLS) return 1; + return 0; + pat = PATTERN (trial); /* We must reject tgd_add{32|64}, i.e. --- Of course proper fix is needed. Any suggestion to the condition (in code) below?
Sorry for dropping the ball, I'll get back to it momentarily.
The master branch has been updated by Eric Botcazou <ebotcazou@gcc.gnu.org>: https://gcc.gnu.org/g:294e72e9acbd0ff15ef5b18895de62cc173464ca commit r11-5445-g294e72e9acbd0ff15ef5b18895de62cc173464ca Author: Eric Botcazou <ebotcazou@adacore.com> Date: Thu Nov 26 16:38:35 2020 +0100 Fix PR target/96607 After 15 years trying to find out what can go into the delay slot of the call to __tls_get_addr with the Solaris linker, it's now time to concede defeat and consider it as not to be filled. gcc/ChangeLog: PR target/96607 * config/sparc/sparc-protos.h (eligible_for_call_delay): Delete. * config/sparc/sparc.c (eligible_for_call_delay): Likewise. * config/sparc/sparc.md (in_call_delay): Likewise. (tls_delay_slot): New attribute. (define_delay [call]): Use in_branch_delay. (tgd_call<P:mode>): Set type to call_no_delay_slot when tls_delay_slot is false. (tldm_call<P:mode>): Likewise.
The releases/gcc-10 branch has been updated by Eric Botcazou <ebotcazou@gcc.gnu.org>: https://gcc.gnu.org/g:d1fbbc13b5ac04163f8eda30834dc090349df5f7 commit r10-9091-gd1fbbc13b5ac04163f8eda30834dc090349df5f7 Author: Eric Botcazou <ebotcazou@adacore.com> Date: Thu Nov 26 16:38:35 2020 +0100 Fix PR target/96607 After 15 years trying to find out what can go into the delay slot of the call to __tls_get_addr with the Solaris linker, it's now time to concede defeat and consider it as not to be filled. gcc/ChangeLog: PR target/96607 * config/sparc/sparc-protos.h (eligible_for_call_delay): Delete. * config/sparc/sparc.c (eligible_for_call_delay): Likewise. * config/sparc/sparc.md (in_call_delay): Likewise. (tls_delay_slot): New attribute. (define_delay [call]): Use in_branch_delay. (tgd_call<P:mode>): Set type to call_no_delay_slot when tls_delay_slot is false. (tldm_call<P:mode>): Likewise.
The releases/gcc-9 branch has been updated by Eric Botcazou <ebotcazou@gcc.gnu.org>: https://gcc.gnu.org/g:2d9acb94cb78956b8451bc01cdda275926f6a1c2 commit r9-9077-g2d9acb94cb78956b8451bc01cdda275926f6a1c2 Author: Eric Botcazou <ebotcazou@adacore.com> Date: Thu Nov 26 16:38:35 2020 +0100 Fix PR target/96607 After 15 years trying to find out what can go into the delay slot of the call to __tls_get_addr with the Solaris linker, it's now time to concede defeat and consider it as not to be filled. gcc/ChangeLog: PR target/96607 * config/sparc/sparc-protos.h (eligible_for_call_delay): Delete. * config/sparc/sparc.c (eligible_for_call_delay): Likewise. * config/sparc/sparc.md (in_call_delay): Likewise. (tls_delay_slot): New attribute. (define_delay [call]): Use in_branch_delay. (tgd_call<P:mode>): Set type to call_no_delay_slot when tls_delay_slot is false. (tldm_call<P:mode>): Likewise.
Fixed on mainline, 10 and 9 branches.