Created attachment 59145 [details] bitfld-lra.c: C test case $ avr-gcc -mmcu=atmega128 -dumpbase "" -save-temps -dp -Os -mlra bitfld-lra.c -o bitfld-lra.elf produces wrong code for the attached test case struct { unsigned long long u33 : 33; unsigned long long u40 : 40; unsigned long long u41 : 41; } a = { 0x100000, 0, 0x100000 }, b = { 1LL << 32, 0, 0 }; long long f = 1LL << 32, g = 1LL << 32; int main (void) { if (a.u33 * a.u33 != 0 || a.u33 * a.u40 * a.u33 != 0 || a.u41 * a.u33 != 1LL << 40) __builtin_abort(); if (b.u33 + g != 1LL << 33 || b.u33 + f != 1LL << 33 || g + g != 1LL << 33) __builtin_abort(); } For the first addition "b.u33 + g != 1LL << 33" the asm reads something like ldd r13,Y+1 ; 660 [c=4 l=1] movqi_insn/3 mov r14,r13 ; 661 [c=4 l=1] movqi_insn/0 lds r15,g+5 ; 662 [c=4 l=2] movqi_insn/3 lds r16,g+6 ; 663 [c=4 l=2] movqi_insn/3 lds r17,g+7 ; 664 [c=4 l=2] movqi_insn/3 call __adddi3 ; 665 [c=32 l=2] *adddi3_insn where __adddi3 performs R10[8] + R18[8]. However, R13 has a wrong value of 0x01 due to the load insns 660 and 661 so that the value in R10[8] is 0x0101000000 instead of 0x0100000000 when __adddi3 is called. Target: avr Configured with: ../../source/gcc-master/configure --target=avr --disable-nls --with-dwarf2 --with-gnu-as --with-gnu-ld --disable-shared --enable-languages=c,c++ Thread model: single Supported LTO compression algorithms: zlib gcc version 15.0.0 20240918 (experimental) (GCC)
...and without -mlra the code is correct, so likely caused by LRA.
Created attachment 59795 [details] LRA dump file
Created attachment 59796 [details] Modified test case "bf.c" I worked with a modified test case (bf.c): struct { unsigned long long u33 : 33; unsigned long long u40 : 40; unsigned long long u41 : 41; } a = { 0x100000, 0, 0x100000 }, b = { 1LL << 32, 0, 0 }; long long f = 1LL << 32, g = 1LL << 32; int main (void) { if (a.u33 * a.u33 != 0) goto abrt; if (a.u33 * a.u40 * a.u33 != 0) goto abrt; if (a.u41 * a.u33 != 1LL << 40) goto abrt; if (b.u33 + g != 1LL << 33) goto abrt; if (b.u33 + f != 1LL << 33) goto abrt; if (g + g != 1LL << 33) { abrt: __builtin_abort(); } }
The bug appears in LRA after rematerialization pass while creating live ranges. File lra.cc: ************************************************************* /* Now we know what pseudos should be spilled. Try to rematerialize them first. */ if (lra_remat ()) { /* We need full live info -- see the comment above. */ lra_create_live_ranges (lra_reg_spill_p, true); ************************************************************* Wrong call `lra_create_live_ranges (lra_reg_spill_p, true)' It have to be `lra_create_live_ranges (true, true)' Long explanation: ********************************** int main (void) { if (a.u33 * a.u33 != 0) ------^^^^^^^^^^^^^ goto abrt; if (a.u33 * a.u40 * a.u33 != 0) ********************************** The bug appears here. Part of the expression `a.u33 * a.u33' Before LRA: ************************************************************* (insn 13 11 15 2 (set (reg:QI 184 [ _1+3 ]) (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2] <var_decl 0x7c866435d000 a>) (const_int 3 [0x3]))) [1 a+3 S1 A8])) "bf.c":11:8 86 {movqi_insn_split} (nil)) (insn 15 13 16 2 (set (reg:QI 64 [ a+4 ]) (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2] <var_decl 0x7c866435d000 a>) (const_int 4 [0x4]))) [1 a+4 S1 A8])) "bf.c":11:8 86 {movqi_insn_split} (nil)) (insn 16 15 20 2 (set (reg:QI 185 [ _1+4 ]) (zero_extract:QI (reg:QI 64 [ a+4 ]) (const_int 1 [0x1]) (const_int 0 [0]))) "bf.c":11:8 985 {*extzvqi_split} (nil)) ************************************************************* After LRA: ************************************************************* (insn 587 11 13 2 (set (reg:QI 24 r24 [368]) (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2] <var_decl 0x7c866435d000 a>) (const_int 3 [0x3]))) [1 a+3 S1 A8])) "bf.c":11:8 86 {movqi_insn_split} (nil)) (insn 13 587 15 2 (set (mem/c:QI (plus:HI (reg/f:HI 28 r28) (const_int 1 [0x1])) [4 %sfp+1 S1 A8]) (reg:QI 24 r24 [368])) "bf.c":11:8 86 {movqi_insn_split} (nil)) (insn 15 13 16 2 (set (reg:QI 6 r6 [orig:64 a+4 ] [64]) (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2] <var_decl 0x7c866435d000 a>) (const_int 4 [0x4]))) [1 a+4 S1 A8])) "bf.c":11:8 86 {movqi_insn_split} (nil)) (insn 16 15 572 2 (set (reg:QI 24 r24 [orig:185 _1+4 ] [185]) (zero_extract:QI (reg:QI 6 r6 [orig:64 a+4 ] [64]) (const_int 1 [0x1]) (const_int 0 [0]))) "bf.c":11:8 985 {*extzvqi_split} (nil)) (insn 572 16 20 2 (set (mem/c:QI (plus:HI (reg/f:HI 28 r28) (const_int 1 [0x1])) [4 %sfp+1 S1 A8]) (reg:QI 24 r24 [orig:185 _1+4 ] [185])) "bf.c":11:8 86 {movqi_insn_split} (nil)) ************************************************************* Insn 13 and insn 572 use sfp+1 as a spill slot, but in IRA pass it was a two different pseudos r184 and r185. Insns 13 use sfp+1 as a spill slot for r184 Insns 572 use the same slot for r185. It's wrong. Here we have a rematerialization. Fragment from bf.c.317r.reload: ************************************************************************************** ******** Rematerialization #1: ******** df_worklist_dataflow_doublequeue: n_basic_blocks 14 n_edges 18 count 14 ( 1) df_worklist_dataflow_doublequeue: n_basic_blocks 14 n_edges 18 count 14 ( 1) Cands: 0 (nop=0, remat_regno=185, reload_regno=359): (insn 16 15 572 2 (set (reg:QI 359 [orig:185 _1+4 ] [185]) (zero_extract:QI (reg:QI 64 [ a+4 ]) (const_int 1 [0x1]) (const_int 0 [0]))) "bf.c":11:8 985 {*extzvqi_split} (nil)) ************************************************************************************** [...] ************************************************************************************** Ranges after the compression: r185: [0..1] Frame pointer can not be eliminated anymore Spilling non-eliminable hard regs: 28 29 Spilling r113(28) Spilling r184(29) Spilling r208(29) Spilling r209(28) Slot 0 regnos (width = 0): 185 209 208 184 113 ************************************************************************************** The bug is here: `r185: [0..1]' wrong live range after compression. r185 and r184 can't have the same spill slot ! Rematerialization in bf.c.317r.reload looks like: ************************************************************* 24: r14:QI=r185:QI Inserting rematerialization insn before: 581: r14:QI=zero_extract(r64:QI,0x1,0) deleting insn with uid = 24. Considering alt=0 of insn 16: (0) =r (1) rYil (2) n overall=0,losers=0,rld_nregs=0 32: r22:QI=r185:QI Inserting rematerialization insn before: 582: r22:QI=zero_extract(r64:QI,0x1,0) deleting insn with uid = 32. ************************************************************* It's happened because: Fragment from lra.c (lra): ************************************************************************* if (! live_p) { /* We need full live info for spilling pseudos into registers instead of memory. */ lra_create_live_ranges (lra_reg_spill_p, true); live_p = true; } /* We should check necessity for spilling here as the above live range pass can remove spilled pseudos. */ if (! lra_need_for_spills_p ()) break; /* Now we know what pseudos should be spilled. Try to rematerialize them first. */ if (lra_remat ()) { /* We need full live info -- see the comment above. */ lra_create_live_ranges (lra_reg_spill_p, true); ----------------------------------^^^^^^^^^^^^^^^ live_p = true; ************************************************************************* The bug is here. Rematerialization sometimes can be like spilling pseudos into registers. 582: r22:QI=zero_extract(r64:QI,0x1,0) So, here we need a live ranges for all pseudos. The patch: diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc index 66c6577e5d6..8d06a3285a5 100644 --- a/gcc/lra-lives.cc +++ b/gcc/lra-lives.cc @@ -61,9 +61,10 @@ int lra_hard_reg_usage[FIRST_PSEUDO_REGISTER]; /* A global flag whose true value says to build live ranges for all pseudos, otherwise the live ranges only for pseudos got memory is build. True value means also building copies and setting up hard - register preferences. The complete info is necessary only for the - assignment pass. The complete info is not needed for the - coalescing and spill passes. */ + register preferences. The complete info is necessary for + assignment, rematerialization and spill to register passes. The + complete info is not needed for the coalescing and spill to memory + passes. */ static bool complete_info_p; /* Pseudos live at current point in the RTL scan. */ diff --git a/gcc/lra.cc b/gcc/lra.cc index bc46f56cf20..a38df0e9b7a 100644 --- a/gcc/lra.cc +++ b/gcc/lra.cc @@ -2552,7 +2552,7 @@ lra (FILE *f, int verbose) if (lra_remat ()) { /* We need full live info -- see the comment above. */ - lra_create_live_ranges (lra_reg_spill_p, true); + lra_create_live_ranges (true, true); live_p = true; if (! lra_need_for_spills_p ()) {
Proposed patch https://gcc.gnu.org/pipermail/gcc-patches/2024-December/670949.html
The master branch has been updated by Denis Chertykov <denisc@gcc.gnu.org>: https://gcc.gnu.org/g:279b3c71702de150eade19635bdbd26ba440b8eb commit r15-6008-g279b3c71702de150eade19635bdbd26ba440b8eb Author: Denis Chertykov <chertykov@gmail.com> Date: Sat Dec 7 13:47:04 2024 +0400 The fix for PR116778: Brief: The bug appears in LRA after rematerialization pass while creating live ranges. File lra.cc: ************************************************************* /* Now we know what pseudos should be spilled. Try to rematerialize them first. */ if (lra_remat ()) { /* We need full live info -- see the comment above. */ lra_create_live_ranges (lra_reg_spill_p, true); ************************************************************* Wrong call `lra_create_live_ranges (lra_reg_spill_p, true)' It have to be `lra_create_live_ranges (true, true)'. The explanation: ********************************** int main (void) { if (a.u33 * a.u33 != 0) ------^^^^^^^^^^^^^ goto abrt; if (a.u33 * a.u40 * a.u33 != 0) ********************************** The bug appears here. Part of the expression `a.u33 * a.u33' Before LRA: ************************************************************* (insn 13 11 15 2 (set (reg:QI 184 [ _1+3 ]) (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2] <var_decl 0x7c866435d000 a>) (const_int 3 [0x3]))) [1 a+3 S1 A8])) "bf.c":11:8 86 {movqi_insn_split} (nil)) (insn 15 13 16 2 (set (reg:QI 64 [ a+4 ]) (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2] <var_decl 0x7c866435d000 a>) (const_int 4 [0x4]))) [1 a+4 S1 A8])) "bf.c":11:8 86 {movqi_insn_split} (nil)) (insn 16 15 20 2 (set (reg:QI 185 [ _1+4 ]) (zero_extract:QI (reg:QI 64 [ a+4 ]) (const_int 1 [0x1]) (const_int 0 [0]))) "bf.c":11:8 985 {*extzvqi_split} (nil)) ************************************************************* After LRA: ************************************************************* (insn 587 11 13 2 (set (reg:QI 24 r24 [368]) (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2] <var_decl 0x7c866435d000 a>) (const_int 3 [0x3]))) [1 a+3 S1 A8])) "bf.c":11:8 86 {movqi_insn_split} (nil)) (insn 13 587 15 2 (set (mem/c:QI (plus:HI (reg/f:HI 28 r28) (const_int 1 [0x1])) [4 %sfp+1 S1 A8]) (reg:QI 24 r24 [368])) "bf.c":11:8 86 {movqi_insn_split} (nil)) (insn 15 13 16 2 (set (reg:QI 6 r6 [orig:64 a+4 ] [64]) (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2] <var_decl 0x7c866435d000 a>) (const_int 4 [0x4]))) [1 a+4 S1 A8])) "bf.c":11:8 86 {movqi_insn_split} (nil)) (insn 16 15 572 2 (set (reg:QI 24 r24 [orig:185 _1+4 ] [185]) (zero_extract:QI (reg:QI 6 r6 [orig:64 a+4 ] [64]) (const_int 1 [0x1]) (const_int 0 [0]))) "bf.c":11:8 985 {*extzvqi_split} (nil)) (insn 572 16 20 2 (set (mem/c:QI (plus:HI (reg/f:HI 28 r28) (const_int 1 [0x1])) [4 %sfp+1 S1 A8]) (reg:QI 24 r24 [orig:185 _1+4 ] [185])) "bf.c":11:8 86 {movqi_insn_split} (nil)) ************************************************************* Insn 13 and insn 572 use sfp+1 as a spill slot, but in IRA pass it was a two different pseudos r184 and r185. Insns 13 use sfp+1 as a spill slot for r184 Insns 572 use the same slot for r185. It's wrong. Here we have a rematerialization. Fragment from bf.c.317r.reload: ************************************************************************************** ******** Rematerialization #1: ******** df_worklist_dataflow_doublequeue: n_basic_blocks 14 n_edges 18 count 14 ( 1) df_worklist_dataflow_doublequeue: n_basic_blocks 14 n_edges 18 count 14 ( 1) Cands: 0 (nop=0, remat_regno=185, reload_regno=359): (insn 16 15 572 2 (set (reg:QI 359 [orig:185 _1+4 ] [185]) (zero_extract:QI (reg:QI 64 [ a+4 ]) (const_int 1 [0x1]) (const_int 0 [0]))) "bf.c":11:8 985 {*extzvqi_split} (nil)) ************************************************************************************** [...] ************************************************************************************** Ranges after the compression: r185: [0..1] Frame pointer can not be eliminated anymore Spilling non-eliminable hard regs: 28 29 Spilling r113(28) Spilling r184(29) Spilling r208(29) Spilling r209(28) Slot 0 regnos (width = 0): 185 209 208 184 113 ************************************************************************************** The bug is here: `r185: [0..1]' wrong live range after compression. r185 and r184 can't have the same spill slot ! Rematerialization in bf.c.317r.reload looks like: ************************************************************* 24: r14:QI=r185:QI Inserting rematerialization insn before: 581: r14:QI=zero_extract(r64:QI,0x1,0) deleting insn with uid = 24. Considering alt=0 of insn 16: (0) =r (1) rYil (2) n overall=0,losers=0,rld_nregs=0 32: r22:QI=r185:QI Inserting rematerialization insn before: 582: r22:QI=zero_extract(r64:QI,0x1,0) deleting insn with uid = 32. ************************************************************* It's happened because: Fragment from lra.c (lra): ************************************************************************* if (! live_p) { /* We need full live info for spilling pseudos into registers instead of memory. */ lra_create_live_ranges (lra_reg_spill_p, true); live_p = true; } /* We should check necessity for spilling here as the above live range pass can remove spilled pseudos. */ if (! lra_need_for_spills_p ()) break; /* Now we know what pseudos should be spilled. Try to rematerialize them first. */ if (lra_remat ()) { /* We need full live info -- see the comment above. */ lra_create_live_ranges (lra_reg_spill_p, true); ----------------------------------^^^^^^^^^^^^^^^ live_p = true; ************************************************************************* The bug is here. Rematerialization sometimes can be like spilling pseudos into registers. 582: r22:QI=zero_extract(r64:QI,0x1,0) So, here we need a live ranges for all pseudos. PS: the patch will not affect any target with usable definition of TARGET_SPILL_CLASS hook. PR target/116778 gcc/ * lra-lives.cc (complete_info_p): Clarification of the comment. * lra.cc (lra): Create a full live info after rematerialization.
Committed a partial solution. https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671030.html
The master branch has been updated by Vladimir Makarov <vmakarov@gcc.gnu.org>: https://gcc.gnu.org/g:fca0ab08cd936464b152e9b45356f625eba27575 commit r15-6122-gfca0ab08cd936464b152e9b45356f625eba27575 Author: Vladimir N. Makarov <vmakarov@redhat.com> Date: Wed Dec 11 15:36:21 2024 -0500 [PR116778][LRA]: Check pseudos assigned to FP after rematerialization to build live ranges This is a better fix of the PR permitting to avoid building live ranges after rematerialization. It checks that FP can not be eliminated now and that pseudos assigned to FP will be spilled. In this case we need to build live ranges after rematerialization for correct assignments of stack slots to spilled pseudos involved in rematerialization. gcc/ChangeLog: PR rtl-optimization/116778 * ira-int.h (x_ira_class_hard_reg_index): Fix comment typo. * lra-eliminations.cc (lra_fp_pseudo_p): New function. * lra-int.h (lra_fp_pseudo_p): External declaration. * lra-spills.cc (lra_need_for_spills_p): Fix formatting. * lra.cc (lra): Use lra_fp_pseudo_p in lra_create_live_range after lra_remat.