This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)


On Mon, Nov 19, 2012 at 9:34 PM, Steven Bosscher wrote:
> On Mon, Nov 19, 2012 at 12:24 PM, Eric Botcazou wrote:
>>> Yes, I'll be looking into this soon.
>>
>> We have a related regression on SPARC:
>>
>> FAIL: gfortran.dg/minmaxloc_5.f90  -O3 -fomit-frame-pointer -funroll-loops
>> execution test
>> FAIL: gfortran.dg/minmaxloc_5.f90  -O3 -fomit-frame-pointer -funroll-all-loops
>> -finline-functions  execution test
>> FAIL: gfortran.dg/minmaxloc_6.f90  -O3 -fomit-frame-pointer -funroll-loops
>> execution test
>> FAIL: gfortran.dg/minmaxloc_6.f90  -O3 -fomit-frame-pointer -funroll-all-loops
>> -finline-functions  execution test
>>
>> whose failure mode is exactly the same as for Honza's testcase.  I even have a
>> more reduced testcase (6 lines, attached) in case you'd prefer working on it.
>>
>> Reproducible with a cross to sparc-linux with -mcpu=v8 -O3 -funroll-loops and
>> grepping for mem:SF (const_int 0 [0]) in the RTL dumps.
>
> Thanks for this reduced test case, that's saving me a lot of work!
>
> Can you please try and see if the following C test case also fails?
>
> It looks like the memcpy is expanded to a loop that is unrolled, and
> then mis-optimized. I haven't found out yet why...

It's another case of an invalid REG_EQUAL note. Shown below is an
excerpt from the C test case .loop2_done dump (after calling
df_analyze to update liveness). The note on insn 172 uses (reg:SI 132)
which is not live on entry to basic block 25.

;; basic block 25, loop depth 0, count 0, freq 2485, maybe hot
;;  prev block 20, next block 26, flags: (REACHABLE, RTL, MODIFIED)
;;  pred:       11 [50.5%]
;; bb 25 artificial_defs: { }
;; bb 25 artificial_uses: { u-1(14){ }u-1(30){ }u-1(101){ }}
;; lr  in        14 [%sp] 30 [%fp] 31 [%i7] 101 [%sfp] 138 158 165
;; lr  use       14 [%sp] 30 [%fp] 101 [%sfp] 138 158
;; lr  def       131 133
;; live  in      14 [%sp] 101 [%sfp] 138 158 165
;; live  gen     131 133
;; live  kill

(code_label 179 149 173 25 22 "" [1 uses])
(note 173 179 171 25 [bb 25] NOTE_INSN_BASIC_BLOCK)
(insn 171 173 172 25 (set (reg/v:SI 133 [ idsD.1386 ])
        (reg/v:SI 138 [ idsD.1386 ])) 40 {*movsi_insn}
     (expr_list:REG_UNUSED (reg/v:SI 133 [ idsD.1386 ])
        (nil)))
(insn 172 171 176 25 (set (reg/v:SF 131 [ limit3D.1388 ])
        (reg:SF 158 [ limit3D.1388 ])) t.c:62 -1
     (expr_list:REG_EQUAL (mem:SF (reg:SI 132 [ ivtmp.7D.1419 ]) [2
MEM[base: _23, offset: 0B]+0 S4 A32])
        (expr_list:REG_DEAD (reg:SF 158 [ limit3D.1388 ])
            (nil))))
;; lr  out       14 [%sp] 30 [%fp] 31 [%i7] 101 [%sfp] 131 133 138 165
;; live  out     14 [%sp] 101 [%sfp] 131 133 138 165


The use of the dead reg is renamed in the webizer:

(insn 172 171 176 25 (set (reg/v:SF 131 [ limit3D.1388 ])
        (reg:SF 172 [ limit3D.1388 ])) t.c:62 66 {*movsf_insn}
     (expr_list:REG_EQUAL (mem:SF (reg:SI 174 [ ivtmp.7D.1419 ]) [2
MEM[base: _23, offset: 0B]+0 S4 A32])
        (expr_list:REG_DEAD (reg:SF 158 [ limit3D.1388 ])
            (nil))))


Next, cse2 thinks it's a good idea to actually use the REG_EQUAL note,
turning the EQ-use of the uninitialized reg 174 into a real use:

(insn 172 171 176 17 (set (reg/v:SF 131 [ limit3D.1388 ])
        (mem:SF (reg:SI 174 [ ivtmp.7D.1419 ]) [2 MEM[base: _23,
offset: 0B]+0 S4 A32])) t.c:62 66 {*movsf_insn}
     (expr_list:REG_EQUAL (mem:SF (reg:SI 174 [ ivtmp.7D.1419 ]) [2
MEM[base: _23, offset: 0B]+0 S4 A32])
        (expr_list:REG_DEAD (reg:SF 158 [ limit3D.1388 ])
            (nil))))


Then init-regs concludes that reg 174 is used ininitialized:

adding initialization in ga4076 of reg 174 at in block 16 for insn 172.
(insn 206 171 172 16 (set (reg:SI 174 [ ivtmp.7D.1419 ])
        (const_int 0 [0])) t.c:62 -1
     (nil))
(insn 172 206 176 16 (set (reg/v:SF 131 [ limit3D.1388 ])
        (mem:SF (reg:SI 174 [ ivtmp.7D.1419 ]) [2 MEM[base: _23,
offset: 0B]+0 S4 A32])) t.c:62 66 {*movsf_insn}
     (expr_list:REG_DEAD (reg:SI 174 [ ivtmp.7D.1419 ])
        (expr_list:REG_EQUAL (mem:SF (reg:SI 174 [ ivtmp.7D.1419 ]) [2
MEM[base: _23, offset: 0B]+0 S4 A32])
            (nil))))


And finally, combine merges the above to yield the final result:
Trying 206 -> 172:
Successfully matched this instruction:
(set (reg/v:SF 131 [ limit3D.1388 ])
    (mem:SF (const_int 0 [0]) [2 MEM[base: _23, offset: 0B]+0 S4 A32]))


The root cause is the bad REG_EQUAL note. I think the most robust
solution is to make the webizer re-compute notes before renaming.
Patch for that is attached.

Ciao!
Steven


        PR rtl-optimization/55006
        * web.c (web_main): Add the DF_NOTE problem, and explain whatfor.

Index: web.c
===================================================================
--- web.c       (revision 193454)
+++ web.c       (working copy)
@@ -335,9 +335,16 @@ web_main (void)
   unsigned int uses_num = 0;
   rtx insn;

+  /* Add the flags and problems we need.  The DF_EQ_NOTES flag is set so
+     that uses of registers in REG_EQUAL and REG_EQUIV notes are included
+     in the web that their DEF belongs to, so that these uses are also
+     properly renamed.  The DF_NOTE problem is added to make sure that
+     all notes are up-to-date and valid: Re-computing the notes problem
+     also cleans up all dead REG_EQUAL notes.  */
   df_set_flags (DF_NO_HARD_REGS + DF_EQ_NOTES);
   df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
   df_chain_add_problem (DF_UD_CHAIN);
+  df_note_add_problem ();
   df_analyze ();
   df_set_flags (DF_DEFER_INSN_RESCAN);


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]