Bug 96607 - GCC feeds SPARC/Solaris linker with unrecognized TLS sequences
Summary: GCC feeds SPARC/Solaris linker with unrecognized TLS sequences
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 10.1.0
: P3 normal
Target Milestone: 9.4
Assignee: Eric Botcazou
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-13 20:52 UTC by Vita Batrla
Modified: 2020-11-28 09:28 UTC (History)
2 users (show)

See Also:
Host:
Target: sparc-solaris
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-09-02 00:00:00


Attachments
Non-reduced test case. (333.03 KB, application/x-bzip)
2020-08-14 12:39 UTC, Rainer Orth
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Vita Batrla 2020-08-13 20:52:51 UTC
GCC generates invalid TLS sequences not understood by SPARC/Solaris linker so resulting binary dumps core. The following example shows dump of .o file containing TLS sequence that is not contiguous and uses register %i2 to hold temporary value. See an example:

  48:   35 00 00 00     sethi  %hi(0), %i2
  48: R_SPARC_TLS_GD_HI22 _ZSt15__once_callable
...
  50:   b4 06 a0 00     add  %i2, 0, %i2
  50: R_SPARC_TLS_GD_LO10 _ZSt15__once_callable
...
  68:   b4 05 c0 1a     add  %l7, %i2, %i2
  68: R_SPARC_TLS_GD_ADD  _ZSt15__once_callable
...
 398:   40 00 00 00     call  398 <_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN4llvm10ThreadPoolC4EjEUlvE_EEEEE6_M_runEv+0x398>
                         398: R_SPARC_TLS_GD_CALL        _ZSt15__once_callable
 39c:   90 10 00 1a     mov  %i2, %o0

Per spec [1] GD TLS sequnce should be 4 instructions in this order:

General Dynamic Model Code Sequence     Initial Relocation      Symbol
0x00 sethi %hi(@dtlndx(x)),     %o0     R_SPARC_TLS_GD_HI22     x
0x04 add   %o0,%lo(@dtlndx(x)), %o0     R_SPARC_TLS_GD_LO10     x
0x08 add   %l7,%o0,             %o0     R_SPARC_TLS_GD_ADD      x
0x0c call  __tls_get_addr               R_SPARC_TLS_GD_CALL     x

With comment:

> The code sequence must appear in the code as is. It is not possible to move
> the second add instruction in the delay slot of the call instruction since
> the linker would not recognize the instruction sequence.

Solaris linker doesn't expect anything in the branch delay slot of the above call instruction. An optimized sequence using a delay slot of call instruction related to R_SPARC_TLS_GD_CALL relocation confuses the linker and causes it to produce invalid code. Solaris linker in GD->IE translation replaces call instruction by an 'add'. So the above example at offset 398 becomes: example by 

> +0x398:   call +0        ----> add       %g7, %o0, %o0
> +0x39c:   mov  %i2, %o0  ----> mov       %i2, %o0

Before linking the "mov" instruction executes before "call" instruction (it is in branch delay slot). After linking the 'mov' instruction executes in program order, so the effect of 'add' instruction is lost and register %o0 value becomes corrupted leading to SIGSEGV on dereference in best case.

3952  /* Return nonzero if TRIAL can go into the call delay slot.  */
3953  
3954  int
3955  eligible_for_call_delay (rtx_insn *trial)
3956  {
3957    rtx pat;
3958  
3959    if (get_attr_in_branch_delay (trial) == IN_BRANCH_DELAY_FALSE)
3960      return 0;
3961  
3962    /* The only problematic cases are TLS sequences with Sun as/ld.  */
3963    if ((TARGET_GNU_TLS && HAVE_GNU_LD) || !TARGET_TLS)
3964      return 1;
3965  
3966    pat = PATTERN (trial);
3967  
3968    /* We must reject tgd_add{32|64}, i.e.
3969         (set (reg) (plus (reg) (unspec [(reg) (symbol_ref)] UNSPEC_TLSGD)))
3970       and tldm_add{32|64}, i.e.
3971         (set (reg) (plus (reg) (unspec [(reg) (symbol_ref)] UNSPEC_TLSLDM)))
3972       for Sun as/ld.  */
3973    if (GET_CODE (pat) == SET
3974        && GET_CODE (SET_SRC (pat)) == PLUS)
3975      {
3976        rtx unspec = XEXP (SET_SRC (pat), 1);
3977  
3978        if (GET_CODE (unspec) == UNSPEC
3979  	  && (XINT (unspec, 1) == UNSPEC_TLSGD
3980  	      || XINT (unspec, 1) == UNSPEC_TLSLDM))
3981  	return 0;
3982      }
3983  
3984    return 1;
3985  }
3986  

My hypothesis is that the block / check at line 3973 catches only 'add'
instructions in TLS sequence, but not if the TLS sequence calculates
a temporary value and stores it in register (intermediate code):

mov 'reg_with_temp_value', %o0 <-- eligible_for_call_delay() return 1
call R_SPARC_TLS_GD_CALL

the 'mov' bypasses branch delay slot eligibility check andcompiler produces .o:

call R_SPARC_TLS_GD_CALL
mov 'reg_with_temp_value', %o0

Then Solaris linker comes and optimizes GD -> IE like this:

add %g7, %o0, %o0
mov 'reg_with_temp_value', %o0

And breaks the binary. The check in eligible_for_call_delay() on line 3963 updated in [2] is fallen through as the gcc used in example above is compiled to use GNU 'as' but Solaris linker. The decision whether the instruction is eligible for delay slot is then taken by 3973, but that check unfortunately handles simple cases only.

[1] https://www.uclibc.org/docs/tls.pdf
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93704
Comment 1 Rainer Orth 2020-08-14 12:38:19 UTC
To add some background: I've been experiencing ca. 250 test failures in the LLVM
testsuite when building for sparcv9-sun-solaris2.11.

As a workaround, I've posted

  [cmake] Don't build with -O3 -fPIC on Solaris/sparcv9
  https://reviews.llvm.org/D85630

based on the observation that the SEGVs vanish if LLVM is compiled with -O -fPIC
only instead of the default -O3 -fPIC.

I've been using the following slightly reduced example (attached):

* tqt.cpp is reduced from LLVM's llvm/unittests/Support/TaskQueueTest.cpp.

* ThreadPool.cpp is LLVM 10.0.0 llvm/lib/Support/ThreadPool.cpp.

They can be compiled and linked with the included tqt.sh; however that relies on
LLVM 10 bundled with Solaris 11.4 for LLVM headers, libLLVMSupport and libLLVMDemangle.

I haven't tried to produce a further reduced example yet.
Comment 2 Rainer Orth 2020-08-14 12:39:18 UTC
Created attachment 49063 [details]
Non-reduced test case.
Comment 3 Eric Botcazou 2020-09-02 09:38:29 UTC
It looks like the logic has been flawed since day #1 and we should be testing the insn which *has* the delay slot instead of the insn which is in the delay slot.
Comment 4 Eric Botcazou 2020-09-02 09:38:51 UTC
Fixing.
Comment 5 Petr Sumbera 2020-10-06 11:04:29 UTC
Is there any workaround for this issue? Some gcc switch to disable the new optimization which came in GCC 10? Thanks!
Comment 6 Eric Botcazou 2020-10-06 11:26:44 UTC
> Is there any workaround for this issue? Some gcc switch to disable the new
> optimization which came in GCC 10?

The optimization (-fdelayed-branch) has been there for years though and you can always pass the associated -fno-xxx switch.
Comment 7 Petr Sumbera 2020-11-23 14:29:51 UTC
Just to confirm that '-fno-delayed-branch' as workaround seems to work (at least based on provided test case).

Probably better is to modify the code like this:

--- gcc-10.2.0/gcc/config/sparc/sparc.c
+++ gcc-10.2.0/gcc/config/sparc/sparc.c
@@ -3963,6 +3963,8 @@
   if ((TARGET_GNU_TLS && HAVE_GNU_LD) || !TARGET_TLS)
     return 1;

+  return 0;
+
   pat = PATTERN (trial);

   /* We must reject tgd_add{32|64}, i.e.

---

Of course proper fix is needed. Any suggestion to the condition (in code) below?
Comment 8 Eric Botcazou 2020-11-23 14:35:05 UTC
Sorry for dropping the ball, I'll get back to it momentarily.
Comment 9 GCC Commits 2020-11-26 15:42:47 UTC
The master branch has been updated by Eric Botcazou <ebotcazou@gcc.gnu.org>:

https://gcc.gnu.org/g:294e72e9acbd0ff15ef5b18895de62cc173464ca

commit r11-5445-g294e72e9acbd0ff15ef5b18895de62cc173464ca
Author: Eric Botcazou <ebotcazou@adacore.com>
Date:   Thu Nov 26 16:38:35 2020 +0100

    Fix PR target/96607
    
    After 15 years trying to find out what can go into the delay slot of
    the call to __tls_get_addr with the Solaris linker, it's now time to
    concede defeat and consider it as not to be filled.
    
    gcc/ChangeLog:
            PR target/96607
            * config/sparc/sparc-protos.h (eligible_for_call_delay): Delete.
            * config/sparc/sparc.c (eligible_for_call_delay): Likewise.
            * config/sparc/sparc.md (in_call_delay): Likewise.
            (tls_delay_slot): New attribute.
            (define_delay [call]): Use in_branch_delay.
            (tgd_call<P:mode>): Set type to call_no_delay_slot when
            tls_delay_slot is false.
            (tldm_call<P:mode>): Likewise.
Comment 10 GCC Commits 2020-11-28 09:26:41 UTC
The releases/gcc-10 branch has been updated by Eric Botcazou <ebotcazou@gcc.gnu.org>:

https://gcc.gnu.org/g:d1fbbc13b5ac04163f8eda30834dc090349df5f7

commit r10-9091-gd1fbbc13b5ac04163f8eda30834dc090349df5f7
Author: Eric Botcazou <ebotcazou@adacore.com>
Date:   Thu Nov 26 16:38:35 2020 +0100

    Fix PR target/96607
    
    After 15 years trying to find out what can go into the delay slot of
    the call to __tls_get_addr with the Solaris linker, it's now time to
    concede defeat and consider it as not to be filled.
    
    gcc/ChangeLog:
            PR target/96607
            * config/sparc/sparc-protos.h (eligible_for_call_delay): Delete.
            * config/sparc/sparc.c (eligible_for_call_delay): Likewise.
            * config/sparc/sparc.md (in_call_delay): Likewise.
            (tls_delay_slot): New attribute.
            (define_delay [call]): Use in_branch_delay.
            (tgd_call<P:mode>): Set type to call_no_delay_slot when
            tls_delay_slot is false.
            (tldm_call<P:mode>): Likewise.
Comment 11 GCC Commits 2020-11-28 09:27:30 UTC
The releases/gcc-9 branch has been updated by Eric Botcazou <ebotcazou@gcc.gnu.org>:

https://gcc.gnu.org/g:2d9acb94cb78956b8451bc01cdda275926f6a1c2

commit r9-9077-g2d9acb94cb78956b8451bc01cdda275926f6a1c2
Author: Eric Botcazou <ebotcazou@adacore.com>
Date:   Thu Nov 26 16:38:35 2020 +0100

    Fix PR target/96607
    
    After 15 years trying to find out what can go into the delay slot of
    the call to __tls_get_addr with the Solaris linker, it's now time to
    concede defeat and consider it as not to be filled.
    
    gcc/ChangeLog:
            PR target/96607
            * config/sparc/sparc-protos.h (eligible_for_call_delay): Delete.
            * config/sparc/sparc.c (eligible_for_call_delay): Likewise.
            * config/sparc/sparc.md (in_call_delay): Likewise.
            (tls_delay_slot): New attribute.
            (define_delay [call]): Use in_branch_delay.
            (tgd_call<P:mode>): Set type to call_no_delay_slot when
            tls_delay_slot is false.
            (tldm_call<P:mode>): Likewise.
Comment 12 Eric Botcazou 2020-11-28 09:28:01 UTC
Fixed on mainline, 10 and 9 branches.