Bug 84790 - Miscompilation for MIPS16 with -fpic and -Os or -O2
Summary: Miscompilation for MIPS16 with -fpic and -Os or -O2
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 7.3.0
: P3 normal
Target Milestone: ---
Assignee: YunQiang Su
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2018-03-09 19:40 UTC by Matthias Schiffer
Modified: 2024-05-30 02:17 UTC (History)
3 users (show)

See Also:
Host:
Target: mips16
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-05-22 00:00:00


Attachments
Reproducer C code (153 bytes, text/plain)
2018-03-09 19:40 UTC, Matthias Schiffer
Details
Proposed fix (283 bytes, patch)
2018-03-12 08:33 UTC, Felix Fietkau
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Schiffer 2018-03-09 19:40:51 UTC
Created attachment 43609 [details]
Reproducer C code

Compiling the following piece of C code:

void ext(void);

struct A {
        unsigned v;
};


__attribute__((noinline))
void foo(void) {
        ext();
}


__attribute__((noinline))
static void bar(struct A *a) {
        if (a)
                a->v--;
}

__attribute__((noinline))
static void baz(struct A *a) {
        bar(a);
}


void test(struct A *a) {
        baz(a);
        foo();
        baz(a);
}

using the command

mips-openwrt-linux-musl-gcc -Os -mips32r2 -mtune=24kc -mips16 -fpic -Wall -Wextra -c -o reproducer.o reproducer.c

(current OpenWrt toolchain) generates the following code:

00000000 <bar>:
   0:   2403            beqz    a0,8 <bar+0x8>
   2:   9c40            lw      v0,0(a0)
   4:   4aff            addiu   v0,-1
   6:   dc40            sw      v0,0(a0)
   8:   e8a0            jrc     ra
   a:   6500            nop

0000000c <baz>:
   c:   f000 6a00       li      v0,0
                        c: R_MIPS16_HI16        _gp_disp
  10:   f000 0b00       la      v1,10 <baz+0x4>
                        10: R_MIPS16_LO16       _gp_disp
  14:   f400 3240       sll     v0,16
  18:   e269            addu    v0,v1
  1a:   64c4            save    32,ra
  1c:   659a            move    gp,v0
  1e:   d204            sw      v0,16(sp)
  20:   675c            move    v0,gp
  22:   f000 9a40       lw      v0,0(v0)
                        22: R_MIPS16_GOT16      bar
  26:   f000 4a00       addiu   v0,0
                        26: R_MIPS16_LO16       bar
  2a:   ea40            jalr    v0
  2c:   653a            move    t9,v0
  2e:   6444            restore 32,ra
  30:   e8a0            jrc     ra
  32:   6500            nop

00000034 <foo>:
  34:   f000 6a00       li      v0,0
                        34: R_MIPS16_HI16       _gp_disp
  38:   f000 0b00       la      v1,38 <foo+0x4>
                        38: R_MIPS16_LO16       _gp_disp
  3c:   f400 3240       sll     v0,16
  40:   e269            addu    v0,v1
  42:   64c4            save    32,ra
  44:   659a            move    gp,v0
  46:   d204            sw      v0,16(sp)
  48:   675c            move    v0,gp
  4a:   f000 9a40       lw      v0,0(v0)
                        4a: R_MIPS16_CALL16     ext
  4e:   ea40            jalr    v0
  50:   653a            move    t9,v0
  52:   6444            restore 32,ra
  54:   e8a0            jrc     ra
  56:   6500            nop

00000058 <test>:
  58:   f000 6a00       li      v0,0
                        58: R_MIPS16_HI16       _gp_disp
  5c:   f000 0b00       la      v1,5c <test+0x4>
                        5c: R_MIPS16_LO16       _gp_disp
  60:   f400 3240       sll     v0,16
  64:   e269            addu    v0,v1
  66:   659a            move    gp,v0
  68:   677c            move    v1,gp
  6a:   64f5            save    40,ra,s0-s1
  6c:   f000 9b00       lw      s0,0(v1)
                        6c: R_MIPS16_GOT16      baz
  70:   d204            sw      v0,16(sp)
  72:   f000 4800       addiu   s0,0
                        72: R_MIPS16_LO16       baz
  76:   6538            move    t9,s0
  78:   e840            jalr    s0
  7a:   6724            move    s1,a0
  7c:   9604            lw      a2,16(sp)
  7e:   f000 9b60       lw      v1,0(v1)
                        7e: R_MIPS16_CALL16     foo
  82:   659e            move    gp,a2
  84:   eb40            jalr    v1
  86:   653b            move    t9,v1
  88:   6791            move    a0,s1
  8a:   e840            jalr    s0
  8c:   6538            move    t9,s0
  8e:   6475            restore 40,ra,s0-s1
  90:   e8a0            jrc     ra
  92:   6500            nop
  94:   6500            nop
  96:   6500            nop
  98:   6500            nop
  9a:   6500            nop
  9c:   6500            nop
  9e:   6500            nop

This is incorrect: The GOT lookup for foo at 7e assumes that v1 still contains the gp value set at 68, even though it is not valid anymore after the baz call at 78.

We noticed this issue as it started to affect real software with GCC 7, but for this reproducer code, GCC 5.4 and 5.5 produce identical code, so it is not a regression per se.

Version: mips-openwrt-linux-musl-gcc -v
Reading specs from /home/neoraider/Devel/OpenWrt/openwrt/staging_dir/toolchain-mips_24kc_gcc-7.3.0_musl/lib/gcc/mips-openwrt-linux-musl/7.3.0/specs
COLLECT_GCC=/home/neoraider/Devel/OpenWrt/openwrt/staging_dir/toolchain-mips_24kc_gcc-7.3.0_musl/bin/mips-openwrt-linux-musl-gcc
COLLECT_LTO_WRAPPER=/home/neoraider/Devel/OpenWrt/openwrt/staging_dir/toolchain-mips_24kc_gcc-7.3.0_musl/libexec/gcc/mips-openwrt-linux-musl/7.3.0/lto-wrapper
Target: mips-openwrt-linux-musl
Configured with: /home/neoraider/Devel/OpenWrt/openwrt/build_dir/toolchain-mips_24kc_gcc-7.3.0_musl/gcc-7.3.0/configure --with-bugurl=http://www.lede-project.org/bugs/ --with-pkgversion='OpenWrt GCC 7.3.0 r6401-edab12ec79aa' --prefix=/home/neoraider/Devel/OpenWrt/openwrt/staging_dir/toolchain-mips_24kc_gcc-7.3.0_musl --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=mips-openwrt-linux-musl --with-gnu-ld --enable-target-optspace --disable-libgomp --disable-libmudflap --disable-multilib --disable-libmpx --disable-nls --without-isl --without-cloog --with-host-libstdcxx=-lstdc++ --with-float=soft --with-gmp=/home/neoraider/Devel/OpenWrt/openwrt/staging_dir/host --with-mpfr=/home/neoraider/Devel/OpenWrt/openwrt/staging_dir/host --with-mpc=/home/neoraider/Devel/OpenWrt/openwrt/staging_dir/host --disable-decimal-float --with-mips-plt --with-diagnostics-color=auto-if-env --disable-libssp --enable-__cxa_atexit --with-headers=/home/neoraider/Devel/OpenWrt/openwrt/staging_dir/toolchain-mips_24kc_gcc-7.3.0_musl/include --disable-libsanitizer --enable-languages=c,c++ --enable-shared --enable-threads --with-slibdir=/home/neoraider/Devel/OpenWrt/openwrt/staging_dir/toolchain-mips_24kc_gcc-7.3.0_musl/lib --enable-lto --with-libelf=/home/neoraider/Devel/OpenWrt/openwrt/staging_dir/host
Thread model: posix
gcc version 7.3.0 (OpenWrt GCC 7.3.0 r6401-edab12ec79aa)
Comment 1 Matthias Schiffer 2018-03-10 11:51:54 UTC
Issue still present in gcc version 8.0.1 20180310 (experimental) (GCC). Again, output it identical to that of GCC 5 and 7.
Comment 2 Matthias Schiffer 2018-03-10 13:26:44 UTC
The problem seems to be that the gp init sequence

        li      $2,%hi(_gp_disp)
        addiu   $3,$pc,%lo(_gp_disp)
        sll     $2,16
        addu    $2,$3

is generated very late and does not appear in the RTL in any way, so optimizing passes are not aware of the $3 (and possibly $2?) clobber. I don't know enough about GCC internals for further analysis.
Comment 3 Felix Fietkau 2018-03-12 08:33:15 UTC
Created attachment 43627 [details]
Proposed fix

I've hacked up a patch that seems to fix this issue, but I have no idea if the approach is correct.
Comment 4 Eric Botcazou 2018-03-12 08:47:38 UTC
> I've hacked up a patch that seems to fix this issue, but I have no idea if
> the approach is correct.

This might slightly pessimize (maybe test mips_symbol_binds_local_p?) but only a MIPS maintainer can give an informed opinion here, so CCing them.
Comment 5 YunQiang Su 2024-05-22 14:28:23 UTC
Assign it to me.
Comment 6 YunQiang Su 2024-05-25 16:37:41 UTC
The attached patch cannot work now.

It is not correct, and it happened work due to good luck that the same register was allocated for these 2 instructions.
Comment 7 Matthias Schiffer 2024-05-26 00:05:39 UTC
(In reply to YunQiang Su from comment #6)
> The attached patch cannot work now.
> 
> It is not correct, and it happened work due to good luck that the same
> register was allocated for these 2 instructions.

I believe this is not the case. The gp init sequence is inserted very late, and no register allocation is involved - the use of registers $2 and $3 is hardcoded: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/mips/mips.cc;h=b63d40a357b7c1f294e2c82062f0ef75fc307ba8;hb=HEAD#l12164
Comment 8 YunQiang Su 2024-05-26 03:42:41 UTC
Ohh, In fact we should use $28 if TARGET_USE_GOT.

Can you help to test this patch?

```
diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index b63d40a357b..fe8641d3916 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -3342,7 +3342,7 @@ mips16_gp_pseudo_reg (void)
 rtx
 mips_pic_base_register (rtx temp)
 {
-  if (MIPS16_GP_LOADS ||!TARGET_MIPS16)
+  if (MIPS16_GP_LOADS || TARGET_USE_GOT ||!TARGET_MIPS16)
     return pic_offset_table_rtx;
 
   if (currently_expanding_to_rtl)
```
Comment 9 YunQiang Su 2024-05-26 03:45:01 UTC
(In reply to Matthias Schiffer from comment #7)
> (In reply to YunQiang Su from comment #6)
> > The attached patch cannot work now.
> > 
> > It is not correct, and it happened work due to good luck that the same
> > register was allocated for these 2 instructions.
> 
> I believe this is not the case. The gp init sequence is inserted very late,
> and no register allocation is involved - the use of registers $2 and $3 is
> hardcoded:

Here $6(a2) is hardcoded, while $3(v1) is not.
Comment 10 Matthias Schiffer 2024-05-26 08:55:21 UTC
(In reply to YunQiang Su from comment #8)
> Ohh, In fact we should use $28 if TARGET_USE_GOT.
> 
> Can you help to test this patch?
> 
> ```
> diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
> index b63d40a357b..fe8641d3916 100644
> --- a/gcc/config/mips/mips.cc
> +++ b/gcc/config/mips/mips.cc
> @@ -3342,7 +3342,7 @@ mips16_gp_pseudo_reg (void)
>  rtx
>  mips_pic_base_register (rtx temp)
>  {
> -  if (MIPS16_GP_LOADS ||!TARGET_MIPS16)
> +  if (MIPS16_GP_LOADS || TARGET_USE_GOT ||!TARGET_MIPS16)
>      return pic_offset_table_rtx;
>  
>    if (currently_expanding_to_rtl)
> ```

Testing might take a while, I haven't built GCC for some time.
Comment 11 YunQiang Su 2024-05-27 01:34:35 UTC
(In reply to YunQiang Su from comment #8)
> Ohh, In fact we should use $28 if TARGET_USE_GOT.
> 
> Can you help to test this patch?
> 
> ```
> diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
> index b63d40a357b..fe8641d3916 100644
> --- a/gcc/config/mips/mips.cc
> +++ b/gcc/config/mips/mips.cc
> @@ -3342,7 +3342,7 @@ mips16_gp_pseudo_reg (void)
>  rtx
>  mips_pic_base_register (rtx temp)
>  {
> -  if (MIPS16_GP_LOADS ||!TARGET_MIPS16)
> +  if (MIPS16_GP_LOADS || TARGET_USE_GOT ||!TARGET_MIPS16)
>      return pic_offset_table_rtx;
>  
>    if (currently_expanding_to_rtl)
> ```

This patch can trigger some ICE....
Comment 12 YunQiang Su 2024-05-27 02:10:09 UTC
You are right: the decision to use $6 is too late.
So let's force to use it in expand pass.

```
diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index b63d40a357b..84ff29cd62b 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -3318,7 +3318,11 @@ mips16_gp_pseudo_reg (void)
     {
       rtx_insn *scan;
 
-      cfun->machine->mips16_gp_pseudo_rtx = gen_reg_rtx (Pmode);
+      if (TARGET_USE_GOT)
+       cfun->machine->mips16_gp_pseudo_rtx
+               = gen_rtx_REG (Pmode, POST_CALL_TMP_REG);
+      else
+       cfun->machine->mips16_gp_pseudo_rtx = gen_reg_rtx (Pmode);
 
       push_topmost_sequence ();
 

```
Comment 13 Matthias Schiffer 2024-05-27 18:20:31 UTC
I don't think the register used matters -  changing it may hide the bug in specific instances, but it does not fix the root cause.

I've now built a simpler reproducer which still seems to exhibit the same issue with your latest patch (however I've only built a baremetal GCC with your patch and looked at the generated code, I've not actually run this example on the affected platforms - I might be overlooking something. Will try to get a full toolchain build in the next days).

The basic premise of the following code:

In test(), the return value `ret` must be moved from v0 to a different register temporarily for calling foo(). Using the inline asm, GCC is nudged to use v1 as this temporary register.

As GCC knows the contents of foo() and bar(), it assumes that the value of v1 is preserved across the call to foo(). This assumption is wrong because the gp setup code is inserted at the beginning of bar after all optimization and register allocation has already happened. As mentioned before, this setup code clobbers v1.

```
unsigned ext(void);

__attribute__((noinline))
static void foo(void) {
	/* Do not let the optimizer remove foo and bar */
	asm volatile("");
}

__attribute__((noinline))
static void bar(void) {
	foo();
}

unsigned test(void)
{
	unsigned ret = ext();

	register unsigned v1 asm("v1") = ret;
	asm volatile("" :: "r"(v1));

	bar();

	return ret;
}
```

`objdump -d -r` output (built using GCC commit 05daf617ea22e1d818295ed2d037456937e23530, with "-Os -mips32r2 -mtune=24kc -mabicalls -mips16 -fpic"):

```
Disassembly of section .text:

00000000 <foo>:
   0:	e8a0      	jrc	ra
   2:	6500      	nop

00000004 <bar>:
   4:	f000 6a00 	li	v0,0
			4: R_MIPS16_HI16	_gp_disp
   8:	f000 0b00 	la	v1,8 <bar+0x4>
			8: R_MIPS16_LO16	_gp_disp
   c:	f400 3240 	sll	v0,16
  10:	e269      	addu	v0,v1
  12:	64c4      	save	32,ra
  14:	659a      	move	gp,v0
  16:	d204      	sw	v0,16(sp)
  18:	675c      	move	v0,gp
  1a:	f000 9a40 	lw	v0,0(v0)
			1a: R_MIPS16_GOT16	foo
  1e:	f000 4a00 	addiu	v0,0
			1e: R_MIPS16_LO16	foo
  22:	ea40      	jalr	v0
  24:	653a      	move	t9,v0
  26:	6444      	restore	32,ra
  28:	e8a0      	jrc	ra
  2a:	6500      	nop

0000002c <test>:
  2c:	f000 6a00 	li	v0,0
			2c: R_MIPS16_HI16	_gp_disp
  30:	f000 0b00 	la	v1,30 <test+0x4>
			30: R_MIPS16_LO16	_gp_disp
  34:	f400 3240 	sll	v0,16
  38:	e269      	addu	v0,v1
  3a:	659a      	move	gp,v0
  3c:	64e4      	save	32,ra,s0
  3e:	671c      	move	s0,gp
  40:	d204      	sw	v0,16(sp)
  42:	f000 9840 	lw	v0,0(s0)
			42: R_MIPS16_CALL16	ext
  46:	ea40      	jalr	v0
  48:	653a      	move	t9,v0
  4a:	6762      	move	v1,v0
  4c:	f000 9800 	lw	s0,0(s0)
			4c: R_MIPS16_GOT16	bar
  50:	f000 4800 	addiu	s0,0
			50: R_MIPS16_LO16	bar
  54:	e840      	jalr	s0
  56:	6538      	move	t9,s0
  58:	6464      	restore	32,ra,s0
  5a:	e820      	jr	ra
  5c:	6743      	move	v0,v1
  5e:	6500      	nop
```

At 4a, the return value is moved to v1. At 5c, it is supposed to be moved back, but v1 has been clobbered in the mean time.
Comment 14 YunQiang Su 2024-05-28 07:26:27 UTC
Ohh, sorry for my misunderstanding. Your patch is correct.

The real problem is that, $3 is used by `mips_output_function_prologue`,
which is the final for output asm source code, and thus the IRA pass
cannot be aware that $3 is used.


So we have to emit some clobbers before IRA.

We have 2 choice:

1. Your choice, aka emit clobbers just before the the call function
2. the entrance every function that need to use GP

@@ -3329,6 +3331,8 @@ mips16_gp_pseudo_reg (void)
       rtx set = gen_load_const_gp (cfun->machine->mips16_gp_pseudo_rtx);
       rtx_insn *insn = emit_insn_after (set, scan);
       INSN_LOCATION (insn) = 0;
+      emit_clobber (MIPS16_PIC_TEMP);
+      emit_clobber (MIPS_PROLOGUE_TEMP (Pmode));
 
       pop_topmost_sequence ();
     }
Comment 15 GCC Commits 2024-05-29 17:16:11 UTC
The master branch has been updated by YunQiang Su <syq@gcc.gnu.org>:

https://gcc.gnu.org/g:915440eed21de367cb41857afb5273aff5bcb737

commit r15-911-g915440eed21de367cb41857afb5273aff5bcb737
Author: YunQiang Su <syq@gcc.gnu.org>
Date:   Wed May 29 02:28:25 2024 +0800

    MIPS16: Mark $2/$3 as clobbered if GP is used
    
    PR Target/84790.
    The gp init sequence
            li      $2,%hi(_gp_disp)
            addiu   $3,$pc,%lo(_gp_disp)
            sll     $2,16
            addu    $2,$3
    is generated directly in `mips_output_function_prologue`, and does
    not appear in the RTL.
    
    So the IRA/IPA passes are not aware that $2/$3 have been clobbered,
    so they may be used for cross (local) function call.
    
    Let's mark $2/$3 clobber both:
      - Just after the UNSPEC_GP RTL of a function;
      - Just after a function call.
    
    Reported-by: Matthias Schiffer <mschiffer@universe-factory.net>
    Origin-Patch-by: Felix Fietkau <nbd@nbd.name>.
    
    gcc
            * config/mips/mips.cc(mips16_gp_pseudo_reg): Mark
            MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered.
            (mips_emit_call_insn): Mark MIPS16_PIC_TEMP and
            MIPS_PROLOGUE_TEMP clobbered if MIPS16 and CALL_CLOBBERED_GP.
Comment 16 YunQiang Su 2024-05-29 17:17:21 UTC
Fixed by https://gcc.gnu.org/g:915440eed21de367cb41857afb5273aff5bcb737
Comment 17 Matthias Schiffer 2024-05-29 17:44:40 UTC
I have now verified replacing Felix's patch with your new patch in the OpenWrt toolchain (currently based on GCC 13.3) results in correct compilation, while a GCC 13.3 without these patches applied exhibits the reported issue.

Thanks!
Comment 18 GCC Commits 2024-05-30 01:48:34 UTC
The releases/gcc-14 branch has been updated by YunQiang Su <syq@gcc.gnu.org>:

https://gcc.gnu.org/g:201cfa725587d13867b4dc25955434ebe90aff7b

commit r14-10260-g201cfa725587d13867b4dc25955434ebe90aff7b
Author: YunQiang Su <syq@gcc.gnu.org>
Date:   Wed May 29 02:28:25 2024 +0800

    MIPS16: Mark $2/$3 as clobbered if GP is used
    
    PR Target/84790.
    The gp init sequence
            li      $2,%hi(_gp_disp)
            addiu   $3,$pc,%lo(_gp_disp)
            sll     $2,16
            addu    $2,$3
    is generated directly in `mips_output_function_prologue`, and does
    not appear in the RTL.
    
    So the IRA/IPA passes are not aware that $2/$3 have been clobbered,
    so they may be used for cross (local) function call.
    
    Let's mark $2/$3 clobber both:
      - Just after the UNSPEC_GP RTL of a function;
      - Just after a function call.
    
    Reported-by: Matthias Schiffer <mschiffer@universe-factory.net>
    Origin-Patch-by: Felix Fietkau <nbd@nbd.name>.
    
    gcc
            * config/mips/mips.cc(mips16_gp_pseudo_reg): Mark
            MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered.
            (mips_emit_call_insn): Mark MIPS16_PIC_TEMP and
            MIPS_PROLOGUE_TEMP clobbered if MIPS16 and CALL_CLOBBERED_GP.
    
    (cherry picked from commit 915440eed21de367cb41857afb5273aff5bcb737)
Comment 19 GCC Commits 2024-05-30 02:11:31 UTC
The releases/gcc-12 branch has been updated by YunQiang Su <syq@gcc.gnu.org>:

https://gcc.gnu.org/g:e26f16424f6279662efb210bc87c77148e956fed

commit r12-10480-ge26f16424f6279662efb210bc87c77148e956fed
Author: YunQiang Su <syq@gcc.gnu.org>
Date:   Wed May 29 02:28:25 2024 +0800

    MIPS16: Mark $2/$3 as clobbered if GP is used
    
    PR Target/84790.
    The gp init sequence
            li      $2,%hi(_gp_disp)
            addiu   $3,$pc,%lo(_gp_disp)
            sll     $2,16
            addu    $2,$3
    is generated directly in `mips_output_function_prologue`, and does
    not appear in the RTL.
    
    So the IRA/IPA passes are not aware that $2/$3 have been clobbered,
    so they may be used for cross (local) function call.
    
    Let's mark $2/$3 clobber both:
      - Just after the UNSPEC_GP RTL of a function;
      - Just after a function call.
    
    Reported-by: Matthias Schiffer <mschiffer@universe-factory.net>
    Origin-Patch-by: Felix Fietkau <nbd@nbd.name>.
    
    gcc
            * config/mips/mips.cc(mips16_gp_pseudo_reg): Mark
            MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered.
            (mips_emit_call_insn): Mark MIPS16_PIC_TEMP and
            MIPS_PROLOGUE_TEMP clobbered if MIPS16 and CALL_CLOBBERED_GP.
    
    (cherry picked from commit 915440eed21de367cb41857afb5273aff5bcb737)
Comment 20 GCC Commits 2024-05-30 02:11:36 UTC
The releases/gcc-13 branch has been updated by YunQiang Su <syq@gcc.gnu.org>:

https://gcc.gnu.org/g:3be8fa7b19d218ca5812d71801e3e83ee2260ea0

commit r13-8809-g3be8fa7b19d218ca5812d71801e3e83ee2260ea0
Author: YunQiang Su <syq@gcc.gnu.org>
Date:   Wed May 29 02:28:25 2024 +0800

    MIPS16: Mark $2/$3 as clobbered if GP is used
    
    PR Target/84790.
    The gp init sequence
            li      $2,%hi(_gp_disp)
            addiu   $3,$pc,%lo(_gp_disp)
            sll     $2,16
            addu    $2,$3
    is generated directly in `mips_output_function_prologue`, and does
    not appear in the RTL.
    
    So the IRA/IPA passes are not aware that $2/$3 have been clobbered,
    so they may be used for cross (local) function call.
    
    Let's mark $2/$3 clobber both:
      - Just after the UNSPEC_GP RTL of a function;
      - Just after a function call.
    
    Reported-by: Matthias Schiffer <mschiffer@universe-factory.net>
    Origin-Patch-by: Felix Fietkau <nbd@nbd.name>.
    
    gcc
            * config/mips/mips.cc(mips16_gp_pseudo_reg): Mark
            MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered.
            (mips_emit_call_insn): Mark MIPS16_PIC_TEMP and
            MIPS_PROLOGUE_TEMP clobbered if MIPS16 and CALL_CLOBBERED_GP.
    
    (cherry picked from commit 915440eed21de367cb41857afb5273aff5bcb737)
Comment 21 GCC Commits 2024-05-30 02:17:01 UTC
The releases/gcc-11 branch has been updated by YunQiang Su <syq@gcc.gnu.org>:

https://gcc.gnu.org/g:1bc4a777b21ae36b116e1842b7c482340ec929ef

commit r11-11457-g1bc4a777b21ae36b116e1842b7c482340ec929ef
Author: YunQiang Su <syq@gcc.gnu.org>
Date:   Wed May 29 02:28:25 2024 +0800

    MIPS16: Mark $2/$3 as clobbered if GP is used
    
    PR Target/84790.
    The gp init sequence
            li      $2,%hi(_gp_disp)
            addiu   $3,$pc,%lo(_gp_disp)
            sll     $2,16
            addu    $2,$3
    is generated directly in `mips_output_function_prologue`, and does
    not appear in the RTL.
    
    So the IRA/IPA passes are not aware that $2/$3 have been clobbered,
    so they may be used for cross (local) function call.
    
    Let's mark $2/$3 clobber both:
      - Just after the UNSPEC_GP RTL of a function;
      - Just after a function call.
    
    Reported-by: Matthias Schiffer <mschiffer@universe-factory.net>
    Origin-Patch-by: Felix Fietkau <nbd@nbd.name>.
    
    gcc
            * config/mips/mips.c(mips16_gp_pseudo_reg): Mark
            MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered.
            (mips_emit_call_insn): Mark MIPS16_PIC_TEMP and
            MIPS_PROLOGUE_TEMP clobbered if MIPS16 and CALL_CLOBBERED_GP.
    
    (cherry picked from commit 915440eed21de367cb41857afb5273aff5bcb737)