This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC PATCH, i386]: Improve STV pass by correcting the cost of moves to/from XMM reg


On Fri, 23 Aug 2019, Uros Bizjak wrote:

> On Fri, Aug 23, 2019 at 1:52 PM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Fri, 23 Aug 2019, Uros Bizjak wrote:
> >
> > > This is currently a heads-up patch that removes the minimum limitation
> > > of cost of moves to/from XMM reg. The immediate benefit is the removal
> > > of mismatched spills, caused by subreg usage.
> > >
> > > *If* the patch proves to be beneficial (as in "doesn't regress
> > > important benchmarks"), then we should be able to un-hide the
> > > inter-regset moves from RA and allow it to collapse some moves. As an
> > > example, patched compiler removes a movd in gcc.target/i386/minmax-6.c
> > > and still avoids mismatched spill.
> > >
> > > 2019-08-23  Uroš Bizjak  <ubizjak@gmail.com>
> > >
> > >     * config/i386/i386.c (ix86_register_move_cost): Do not
> > >     limit the cost of moves to/from XMM register to minimum 8.
> > >     * config/i386/i386-features.c
> > >     (general_scalar_chain::make_vector_copies): Do not generate
> > >     zeroing move from GPR to XMM register, use gen_move_insn
> > >     instead of gen_gpr_to_xmm_move_src.
> > >     (general_scalar_chain::convert_op): Ditto.
> > >     (gen_gpr_to_xmm_move_src): Remove.
> > >
> > > The patch was bootstrapped and regression tested on x86_64-linux-gnu
> > > {,-m32}, configured w/ and w/o -with-arch=ivybridge.
> > >
> > > The patch regresses PR80481 scan-asm-not (where the compiler generates
> > > unrelated XMM spill on register starved x86_32). However, during the
> > > analysis, I found that the original issue is not fixed, and is still
> > > visible without -funrol-loops [1].
> > >
> > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80481#c10
> > >
> > > So, I'd wait for the HJ's benchmark results of the cost to/from XMM
> > > change before proceeding with the patch.
> >
> > Otherwise looks good to me, it might clash (whitespace wise)
> > with my STV "rewrite" though.
> >
> > We might want to adjust ix86_register_move_cost separately from
> > the STV change to use regular moves though?  Just to make bisection
> > point to either of the two - STV "fallout"" is probably minor
> > compared to fallout elsewhere...
> 
> Yes, this is also my plan.

Btw, when testing w/ the costing disabled I run into

(insn 31 7 8 2 (set (subreg:V4SI (reg:SI 107) 0)
        (vec_merge:V4SI (vec_duplicate:V4SI (mem/c:SI (const:DI (plus:DI 
(symbol_ref:DI ("peakbuf") [flags 0x2]  <var_decl 0x7efe8dbfeab0 peakbuf>)
                            (const_int 400020000 [0x17d7d220]))) [1 
peakbuf.PeaksInBuf+0 S4 A64]))
            (const_vector:V4SI [
                    (const_int 0 [0]) repeated x4
                ])
            (const_int 1 [0x1])))

being not recognized (for the large immediate I guess).  So when
just doing

  (set (reg:SI 107) (mem:SI ...))

here we expect IRA to be able to allocate 107 into xmm, realizing
it needs to reload the RHS first?

For current code, is testing x86_64_general_operand like in the
following the correct thing to do?

Thanks,
Richard.

Index: config/i386/i386-features.c
===================================================================
--- config/i386/i386-features.c (revision 274926)
+++ config/i386/i386-features.c (working copy)
@@ -812,9 +812,15 @@ general_scalar_chain::convert_op (rtx *o
   else if (MEM_P (*op))
     {
       rtx tmp = gen_reg_rtx (GET_MODE (*op));
+      rtx tmp2 = *op;
 
+      if (!x86_64_general_operand (*op, smode))
+       {
+         tmp2 = gen_reg_rtx (smode);
+         emit_insn_before (gen_move_insn (tmp2, *op), insn);
+       }
       emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
-                                    gen_gpr_to_xmm_move_src (vmode, 
*op)),
+                                    gen_gpr_to_xmm_move_src (vmode, 
tmp2)),
                        insn);
       *op = gen_rtx_SUBREG (vmode, tmp, 0);
 

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]