Bug 116778 - [lra][avr] Wrong code with -mlra (bitfld-lra.c)
Summary: [lra][avr] Wrong code with -mlra (bitfld-lra.c)
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 15.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: ra, wrong-code
Depends on:
Blocks: avr+ra 113934
  Show dependency treegraph
 
Reported: 2024-09-19 09:38 UTC by Georg-Johann Lay
Modified: 2024-12-13 10:33 UTC (History)
3 users (show)

See Also:
Host:
Target: avr
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-09-19 00:00:00


Attachments
bitfld-lra.c: C test case (200 bytes, text/plain)
2024-09-19 09:38 UTC, Georg-Johann Lay
Details
LRA dump file (24.21 KB, text/plain)
2024-12-05 15:58 UTC, Denis Chertykov
Details
Modified test case "bf.c" (206 bytes, text/plain)
2024-12-05 16:02 UTC, Denis Chertykov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Georg-Johann Lay 2024-09-19 09:38:33 UTC
Created attachment 59145 [details]
bitfld-lra.c: C test case

$ avr-gcc -mmcu=atmega128 -dumpbase "" -save-temps -dp  -Os -mlra bitfld-lra.c -o bitfld-lra.elf

produces wrong code for the attached test case

struct
{
  unsigned long long u33 : 33;
  unsigned long long u40 : 40;
  unsigned long long u41 : 41;
} a = { 0x100000, 0, 0x100000 }, b = { 1LL << 32, 0, 0 };
long long f = 1LL << 32, g = 1LL << 32;

int main (void)
{
  if (a.u33 * a.u33 != 0 || a.u33 * a.u40 * a.u33 != 0
      || a.u41 * a.u33 != 1LL << 40)
    __builtin_abort();

  if (b.u33 + g != 1LL << 33 || b.u33 + f != 1LL << 33 || g + g != 1LL << 33)
    __builtin_abort();
}

For the first addition "b.u33 + g != 1LL << 33" the asm reads something like

	ldd r13,Y+1	 ;  660	[c=4 l=1]  movqi_insn/3
	mov r14,r13	 ;  661	[c=4 l=1]  movqi_insn/0
	lds r15,g+5	 ;  662	[c=4 l=2]  movqi_insn/3
	lds r16,g+6	 ;  663	[c=4 l=2]  movqi_insn/3
	lds r17,g+7	 ;  664	[c=4 l=2]  movqi_insn/3
	call __adddi3	 ;  665	[c=32 l=2]  *adddi3_insn

where __adddi3 performs R10[8] + R18[8].  However, R13 has a wrong value of 0x01 due to the load insns 660 and 661 so that the value in R10[8] is 0x0101000000 instead of 0x0100000000 when __adddi3 is called.

Target: avr
Configured with: ../../source/gcc-master/configure --target=avr --disable-nls --with-dwarf2 --with-gnu-as --with-gnu-ld --disable-shared --enable-languages=c,c++
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 15.0.0 20240918 (experimental) (GCC)
Comment 1 Georg-Johann Lay 2024-09-19 09:52:45 UTC
...and without -mlra the code is correct, so likely caused by LRA.
Comment 2 Denis Chertykov 2024-12-05 15:58:29 UTC
Created attachment 59795 [details]
LRA dump file
Comment 3 Denis Chertykov 2024-12-05 16:02:00 UTC
Created attachment 59796 [details]
Modified test case "bf.c"

I worked with a modified test case (bf.c):

struct
{
  unsigned long long u33 : 33;
  unsigned long long u40 : 40;
  unsigned long long u41 : 41;
} a = { 0x100000, 0, 0x100000 }, b = { 1LL << 32, 0, 0 };
long long f = 1LL << 32, g = 1LL << 32;

int main (void)
{
  if (a.u33 * a.u33 != 0)
    goto abrt;
  if (a.u33 * a.u40 * a.u33 != 0)
    goto abrt;
  if (a.u41 * a.u33 != 1LL << 40)
    goto abrt;
  if (b.u33 + g != 1LL << 33)
    goto abrt;
  if (b.u33 + f != 1LL << 33)
    goto abrt;
  if (g + g != 1LL << 33)
    {
    abrt:
      __builtin_abort();
    }
}
Comment 4 Denis Chertykov 2024-12-05 16:40:47 UTC
The bug appears in LRA after rematerialization pass while creating live ranges.
File lra.cc:
*************************************************************
      /* Now we know what pseudos should be spilled.  Try to
	 rematerialize them first.  */
      if (lra_remat ())
	{
	  /* We need full live info -- see the comment above.  */
	  lra_create_live_ranges (lra_reg_spill_p, true);
*************************************************************
Wrong call `lra_create_live_ranges (lra_reg_spill_p, true)'
It have to be `lra_create_live_ranges (true, true)'

Long explanation:
**********************************
int main (void)
{
  if (a.u33 * a.u33 != 0)
------^^^^^^^^^^^^^
    goto abrt;
  if (a.u33 * a.u40 * a.u33 != 0)
**********************************
The bug appears here.

Part of the expression `a.u33 * a.u33'
Before LRA:
*************************************************************
(insn 13 11 15 2 (set (reg:QI 184 [ _1+3 ])
        (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2]  <var_decl 0x7c866435d000 a>)
                    (const_int 3 [0x3]))) [1 a+3 S1 A8])) "bf.c":11:8 86 {movqi_insn_split}
     (nil))
(insn 15 13 16 2 (set (reg:QI 64 [ a+4 ])
        (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2]  <var_decl 0x7c866435d000 a>)
                    (const_int 4 [0x4]))) [1 a+4 S1 A8])) "bf.c":11:8 86 {movqi_insn_split}
     (nil))
(insn 16 15 20 2 (set (reg:QI 185 [ _1+4 ])
        (zero_extract:QI (reg:QI 64 [ a+4 ])
            (const_int 1 [0x1])
            (const_int 0 [0]))) "bf.c":11:8 985 {*extzvqi_split}
     (nil))
*************************************************************

After LRA:
*************************************************************
(insn 587 11 13 2 (set (reg:QI 24 r24 [368])
        (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2]  <var_decl 0x7c866435d000 a>)
                    (const_int 3 [0x3]))) [1 a+3 S1 A8])) "bf.c":11:8 86 {movqi_insn_split}
     (nil))
(insn 13 587 15 2 (set (mem/c:QI (plus:HI (reg/f:HI 28 r28)
                (const_int 1 [0x1])) [4 %sfp+1 S1 A8])
        (reg:QI 24 r24 [368])) "bf.c":11:8 86 {movqi_insn_split}
     (nil))
(insn 15 13 16 2 (set (reg:QI 6 r6 [orig:64 a+4 ] [64])
        (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2]  <var_decl 0x7c866435d000 a>)
                    (const_int 4 [0x4]))) [1 a+4 S1 A8])) "bf.c":11:8 86 {movqi_insn_split}
     (nil))
(insn 16 15 572 2 (set (reg:QI 24 r24 [orig:185 _1+4 ] [185])
        (zero_extract:QI (reg:QI 6 r6 [orig:64 a+4 ] [64])
            (const_int 1 [0x1])
            (const_int 0 [0]))) "bf.c":11:8 985 {*extzvqi_split}
     (nil))
(insn 572 16 20 2 (set (mem/c:QI (plus:HI (reg/f:HI 28 r28)
                (const_int 1 [0x1])) [4 %sfp+1 S1 A8])
        (reg:QI 24 r24 [orig:185 _1+4 ] [185])) "bf.c":11:8 86 {movqi_insn_split}
     (nil))
*************************************************************
Insn 13 and insn 572 use sfp+1 as a spill slot, but in IRA pass it was a two
different pseudos r184 and r185.
Insns 13 use sfp+1 as a spill slot for r184
Insns 572 use the same slot for r185. It's wrong.

Here we have a rematerialization.

Fragment from bf.c.317r.reload:
**************************************************************************************
******** Rematerialization #1: ********

df_worklist_dataflow_doublequeue: n_basic_blocks 14 n_edges 18 count 14 (    1)
df_worklist_dataflow_doublequeue: n_basic_blocks 14 n_edges 18 count 14 (    1)

Cands:
0 (nop=0, remat_regno=185, reload_regno=359):
(insn 16 15 572 2 (set (reg:QI 359 [orig:185 _1+4 ] [185])
                    (zero_extract:QI (reg:QI 64 [ a+4 ])
                        (const_int 1 [0x1])
                        (const_int 0 [0]))) "bf.c":11:8 985 {*extzvqi_split}
                 (nil))

**************************************************************************************
[...]
**************************************************************************************
Ranges after the compression:
 r185: [0..1]
	   Frame pointer can not be eliminated anymore
	   Spilling non-eliminable hard regs: 28 29
	 Spilling r113(28)
	 Spilling r184(29)
	 Spilling r208(29)
	 Spilling r209(28)
  Slot 0 regnos (width = 0):	 185	 209	 208	 184	 113
**************************************************************************************

The bug is here: `r185: [0..1]' wrong live range after compression.
r185 and r184 can't have the same spill slot !

Rematerialization in bf.c.317r.reload looks like:
*************************************************************
   24: r14:QI=r185:QI
    Inserting rematerialization insn before:
  581: r14:QI=zero_extract(r64:QI,0x1,0)

deleting insn with uid = 24.
         Considering alt=0 of insn 16:   (0) =r  (1) rYil  (2) n
          overall=0,losers=0,rld_nregs=0
   32: r22:QI=r185:QI
    Inserting rematerialization insn before:
  582: r22:QI=zero_extract(r64:QI,0x1,0)

deleting insn with uid = 32.
*************************************************************


It's happened because:

Fragment from lra.c (lra):
*************************************************************************
      if (! live_p)
	{
	  /* We need full live info for spilling pseudos into
	     registers instead of memory.  */
	  lra_create_live_ranges (lra_reg_spill_p, true);
	  live_p = true;
	}
      /* We should check necessity for spilling here as the above live
	 range pass can remove spilled pseudos.  */
      if (! lra_need_for_spills_p ())
	break;
      /* Now we know what pseudos should be spilled.  Try to
	 rematerialize them first.  */
      if (lra_remat ())
	{
	  /* We need full live info -- see the comment above.  */
	  lra_create_live_ranges (lra_reg_spill_p, true);
----------------------------------^^^^^^^^^^^^^^^
	  live_p = true;
*************************************************************************
The bug is here.
Rematerialization sometimes can be like spilling pseudos into registers.
  582: r22:QI=zero_extract(r64:QI,0x1,0)

So, here we need a live ranges for all pseudos.

The patch:

diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc
index 66c6577e5d6..8d06a3285a5 100644
--- a/gcc/lra-lives.cc
+++ b/gcc/lra-lives.cc
@@ -61,9 +61,10 @@ int lra_hard_reg_usage[FIRST_PSEUDO_REGISTER];
 /* A global flag whose true value says to build live ranges for all
    pseudos, otherwise the live ranges only for pseudos got memory is
    build.  True value means also building copies and setting up hard
-   register preferences.  The complete info is necessary only for the
-   assignment pass.  The complete info is not needed for the
-   coalescing and spill passes.	 */
+   register preferences.  The complete info is necessary for
+   assignment, rematerialization and spill to register passes.  The
+   complete info is not needed for the coalescing and spill to memory
+   passes.  */
 static bool complete_info_p;
 
 /* Pseudos live at current point in the RTL scan.  */
diff --git a/gcc/lra.cc b/gcc/lra.cc
index bc46f56cf20..a38df0e9b7a 100644
--- a/gcc/lra.cc
+++ b/gcc/lra.cc
@@ -2552,7 +2552,7 @@ lra (FILE *f, int verbose)
       if (lra_remat ())
 	{
 	  /* We need full live info -- see the comment above.  */
-	  lra_create_live_ranges (lra_reg_spill_p, true);
+	  lra_create_live_ranges (true, true);
 	  live_p = true;
 	  if (! lra_need_for_spills_p ())
 	    {
Comment 5 Denis Chertykov 2024-12-05 17:40:30 UTC
Proposed patch

https://gcc.gnu.org/pipermail/gcc-patches/2024-December/670949.html
Comment 6 GCC Commits 2024-12-07 09:47:57 UTC
The master branch has been updated by Denis Chertykov <denisc@gcc.gnu.org>:

https://gcc.gnu.org/g:279b3c71702de150eade19635bdbd26ba440b8eb

commit r15-6008-g279b3c71702de150eade19635bdbd26ba440b8eb
Author: Denis Chertykov <chertykov@gmail.com>
Date:   Sat Dec 7 13:47:04 2024 +0400

    The fix for PR116778:
    
    Brief:
    The bug appears in LRA after rematerialization pass while creating live ranges.
    File lra.cc:
    *************************************************************
          /* Now we know what pseudos should be spilled.  Try to
             rematerialize them first.  */
          if (lra_remat ())
            {
              /* We need full live info -- see the comment above.  */
              lra_create_live_ranges (lra_reg_spill_p, true);
    *************************************************************
    Wrong call `lra_create_live_ranges (lra_reg_spill_p, true)'
    It have to be `lra_create_live_ranges (true, true)'.
    
    The explanation:
    **********************************
    int main (void)
    {
      if (a.u33 * a.u33 != 0)
    ------^^^^^^^^^^^^^
        goto abrt;
      if (a.u33 * a.u40 * a.u33 != 0)
    **********************************
    The bug appears here.
    
    Part of the expression `a.u33 * a.u33'
    Before LRA:
    *************************************************************
    (insn 13 11 15 2 (set (reg:QI 184 [ _1+3 ])
            (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2]  <var_decl 0x7c866435d000 a>)
                        (const_int 3 [0x3]))) [1 a+3 S1 A8])) "bf.c":11:8 86 {movqi_insn_split}
         (nil))
    (insn 15 13 16 2 (set (reg:QI 64 [ a+4 ])
            (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2]  <var_decl 0x7c866435d000 a>)
                        (const_int 4 [0x4]))) [1 a+4 S1 A8])) "bf.c":11:8 86 {movqi_insn_split}
         (nil))
    (insn 16 15 20 2 (set (reg:QI 185 [ _1+4 ])
            (zero_extract:QI (reg:QI 64 [ a+4 ])
                (const_int 1 [0x1])
                (const_int 0 [0]))) "bf.c":11:8 985 {*extzvqi_split}
         (nil))
    *************************************************************
    
    After LRA:
    *************************************************************
    (insn 587 11 13 2 (set (reg:QI 24 r24 [368])
            (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2]  <var_decl 0x7c866435d000 a>)
                        (const_int 3 [0x3]))) [1 a+3 S1 A8])) "bf.c":11:8 86 {movqi_insn_split}
         (nil))
    (insn 13 587 15 2 (set (mem/c:QI (plus:HI (reg/f:HI 28 r28)
                    (const_int 1 [0x1])) [4 %sfp+1 S1 A8])
            (reg:QI 24 r24 [368])) "bf.c":11:8 86 {movqi_insn_split}
         (nil))
    (insn 15 13 16 2 (set (reg:QI 6 r6 [orig:64 a+4 ] [64])
            (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2]  <var_decl 0x7c866435d000 a>)
                        (const_int 4 [0x4]))) [1 a+4 S1 A8])) "bf.c":11:8 86 {movqi_insn_split}
         (nil))
    (insn 16 15 572 2 (set (reg:QI 24 r24 [orig:185 _1+4 ] [185])
            (zero_extract:QI (reg:QI 6 r6 [orig:64 a+4 ] [64])
                (const_int 1 [0x1])
                (const_int 0 [0]))) "bf.c":11:8 985 {*extzvqi_split}
         (nil))
    (insn 572 16 20 2 (set (mem/c:QI (plus:HI (reg/f:HI 28 r28)
                    (const_int 1 [0x1])) [4 %sfp+1 S1 A8])
            (reg:QI 24 r24 [orig:185 _1+4 ] [185])) "bf.c":11:8 86 {movqi_insn_split}
         (nil))
    *************************************************************
    Insn 13 and insn 572 use sfp+1 as a spill slot, but in IRA pass it was a two
    different pseudos r184 and r185.
    Insns 13 use sfp+1 as a spill slot for r184
    Insns 572 use the same slot for r185. It's wrong.
    
    Here we have a rematerialization.
    
    Fragment from bf.c.317r.reload:
    **************************************************************************************
    ******** Rematerialization #1: ********
    
    df_worklist_dataflow_doublequeue: n_basic_blocks 14 n_edges 18 count 14 (    1)
    df_worklist_dataflow_doublequeue: n_basic_blocks 14 n_edges 18 count 14 (    1)
    
    Cands:
    0 (nop=0, remat_regno=185, reload_regno=359):
    (insn 16 15 572 2 (set (reg:QI 359 [orig:185 _1+4 ] [185])
                        (zero_extract:QI (reg:QI 64 [ a+4 ])
                            (const_int 1 [0x1])
                            (const_int 0 [0]))) "bf.c":11:8 985 {*extzvqi_split}
                     (nil))
    
    **************************************************************************************
    [...]
    **************************************************************************************
    Ranges after the compression:
     r185: [0..1]
               Frame pointer can not be eliminated anymore
               Spilling non-eliminable hard regs: 28 29
             Spilling r113(28)
             Spilling r184(29)
             Spilling r208(29)
             Spilling r209(28)
      Slot 0 regnos (width = 0):     185     209     208     184     113
    **************************************************************************************
    
    The bug is here: `r185: [0..1]' wrong live range after compression.
    r185 and r184 can't have the same spill slot !
    
    Rematerialization in bf.c.317r.reload looks like:
    *************************************************************
       24: r14:QI=r185:QI
        Inserting rematerialization insn before:
      581: r14:QI=zero_extract(r64:QI,0x1,0)
    
    deleting insn with uid = 24.
             Considering alt=0 of insn 16:   (0) =r  (1) rYil  (2) n
              overall=0,losers=0,rld_nregs=0
       32: r22:QI=r185:QI
        Inserting rematerialization insn before:
      582: r22:QI=zero_extract(r64:QI,0x1,0)
    
    deleting insn with uid = 32.
    *************************************************************
    
    It's happened because:
    
    Fragment from lra.c (lra):
    *************************************************************************
          if (! live_p)
            {
              /* We need full live info for spilling pseudos into
                 registers instead of memory.  */
              lra_create_live_ranges (lra_reg_spill_p, true);
              live_p = true;
            }
          /* We should check necessity for spilling here as the above live
             range pass can remove spilled pseudos.  */
          if (! lra_need_for_spills_p ())
            break;
          /* Now we know what pseudos should be spilled.  Try to
             rematerialize them first.  */
          if (lra_remat ())
            {
              /* We need full live info -- see the comment above.  */
              lra_create_live_ranges (lra_reg_spill_p, true);
    ----------------------------------^^^^^^^^^^^^^^^
              live_p = true;
    *************************************************************************
    
    The bug is here.
    Rematerialization sometimes can be like spilling pseudos into registers.
      582: r22:QI=zero_extract(r64:QI,0x1,0)
    
    So, here we need a live ranges for all pseudos.
    
    PS: the patch will not affect any target with usable definition of
        TARGET_SPILL_CLASS hook.
    
            PR target/116778
    gcc/
            * lra-lives.cc (complete_info_p): Clarification of the comment.
            * lra.cc (lra): Create a full live info after rematerialization.
Comment 7 Denis Chertykov 2024-12-07 09:53:32 UTC
Committed a partial solution.

https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671030.html
Comment 8 GCC Commits 2024-12-11 20:41:27 UTC
The master branch has been updated by Vladimir Makarov <vmakarov@gcc.gnu.org>:

https://gcc.gnu.org/g:fca0ab08cd936464b152e9b45356f625eba27575

commit r15-6122-gfca0ab08cd936464b152e9b45356f625eba27575
Author: Vladimir N. Makarov <vmakarov@redhat.com>
Date:   Wed Dec 11 15:36:21 2024 -0500

    [PR116778][LRA]: Check pseudos assigned to FP after rematerialization to build live ranges
    
    This is a better fix of the PR permitting to avoid building live
    ranges after rematerialization.  It checks that FP can not be
    eliminated now and that pseudos assigned to FP will be spilled.  In
    this case we need to build live ranges after rematerialization for
    correct assignments of stack slots to spilled pseudos involved in
    rematerialization.
    
    gcc/ChangeLog:
    
            PR rtl-optimization/116778
            * ira-int.h (x_ira_class_hard_reg_index): Fix comment typo.
            * lra-eliminations.cc (lra_fp_pseudo_p): New function.
            * lra-int.h (lra_fp_pseudo_p): External declaration.
            * lra-spills.cc (lra_need_for_spills_p): Fix formatting.
            * lra.cc (lra): Use lra_fp_pseudo_p in lra_create_live_range after
            lra_remat.