This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal

From: Alan Lawrence <alan dot lawrence at arm dot com>
Cc: Segher Boessenkool <segher at kernel dot crashing dot org>, Michael Meissner <meissner at linux dot vnet dot ibm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, David Edelsohn <dje dot gcc at gmail dot com>
Date: Wed, 12 Nov 2014 18:49:52 +0000
Subject: Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal
Authentication-results: sourceware.org; auth=none
References: <544A3E0B dot 2000803 at arm dot com> <544A40D1 dot 1040605 at arm dot com> <20141110223624 dot GA19330 at ibm-tiger dot the-meissners dot org> <20141111071001 dot GA15842 at gate dot crashing dot org> <54635096 dot 1040508 at arm dot com>

Have run check-gcc on gcc110.fsffrance.org (powerpc64-unknown-linux-gnu) usingthis snippet on top of original patch; no regressions.


Alan Lawrence wrote:

So I'm no expert on RS6000 here, but following on from Segher's observationabout the change in pattern...so the difference in 'expand' is exactly that, avsx_reduc_splus_v2df followed by a vec_extract to DF, becomes avsx_reduc_splus_v2df_scalar - as I expected the combiner to produce by combiningthe two previous insns.
However, inspecting the logs from -fdump-rtl-combine-all, *without* my patch,when the combiner tries to put those two together, I see:
Trying 30 -> 31:
Failed to match this instruction:
(set (reg:DF 179 [ stmp_s_5.7D.2196 ])
     (vec_select:DF (plus:V2DF (vec_select:V2DF (reg:V2DF 173 [ vect_s_5.6D.2195 ])
                 (parallel [
                         (const_int 1 [0x1])
                         (const_int 0 [0])
                     ]))
             (reg:V2DF 173 [ vect_s_5.6D.2195 ]))
         (parallel [
                 (const_int 1 [0x1])
             ])))
That is, it looks like combine_simplify_rtx has transformed the (vec_concat(vec_select ... 1) (vec_select ... 0)) from the vsx_reduc_plus_v2df insn, into asingle vec_select, which does not match the vsx_reduc_plus_v2df_scalar insn.
So despite the comment (in vsx.md):

;; Combiner patterns with the vector reduction patterns that knows we can get
;; to the top element of the V2DF array without doing an extract.
It looks like the code generation prior to my patch, considered better, wasbecause the combiner didn't actually use the pattern?
In that case whilst you may want to dig into register allocation,cannot_change_mode_class, etc., for other reasons, I think the best fix formigrating to reduc_plus_scal... is simply to avoid using the "Combiner" patternsand just emit two insns, the old pattern followed by a vec_extract. The attachedsnippet does this (I won't call it a patch yet, and it applies on top of theprevious patch - I went the route of calling the two gen functions rather thancopying their RTL sequences, but could do the latter if that werepreferable???), and restores code generation to the original form on yourexample above; it bootstraps OK but I'm still running check-gcc on the CompileFarm...
However, again on your example above, I note that if I *remove* thereduc_plus_scal_v2df pattern altogether, I get:
.sum:
         li 10,512        # 52   *movdi_internal64/4     [length = 4]
         ld 9,.LC2@toc(2)         # 20   *movdi_internal64/2     [length = 4]
         xxlxor 0,0,0     # 17   *vsx_movv2df/12 [length = 4]
         mtctr 10         # 48   *movdi_internal64/11    [length = 4]
         .align 4
.L2:
         lxvd2x 12,0,9    # 23   *vsx_movv2df/2  [length = 4]
         addi 9,9,16      # 25   *adddi3_internal1/2     [length = 4]
         xvadddp 0,0,12   # 24   *vsx_addv2df3/1 [length = 4]
         bdnz .L2         # 47   *ctrdi_internal1/1      [length = 4]
         xxsldwi 12,0,0,2         # 30   vsx_xxsldwi_v2df        [length = 4]
         xvadddp 1,0,12   # 31   *vsx_addv2df3/1 [length = 4]
         nop      # 37   *vsx_extract_v2df_internal2/1   [length = 4]
         blr      # 55   return  [length = 4]
this is presumably using gcc's scalar reduction code, but (to my untrained eyeon powerpc!) it looks even better than the first form above (the same in theloop, and in the reduction, an xxpermdi is replaced by a nop !)...
--Alan


Segher Boessenkool wrote:
On Mon, Nov 10, 2014 at 05:36:24PM -0500, Michael Meissner wrote:
However, the double pattern is completely broken.  This cannot go in.
[snip]
It is unacceptable to have to do the inner loop doing a load, vector add, and
store in the loop.
Before the patch, the final reduction used *vsx_reduc_splus_v2df; after
the patch, it is *vsx_reduc_plus_v2df_scalar.  The former does a vector
add, the latter a float add.  And it uses the same pseudoregister for the
accumulator throughout.  IRA decides a register is more expensive than
memory for this, I suppose because it wants both V2DF and DF?  It doesn't
seem to like the subreg very much.

The new code does look nicer otherwise :-)


Segher

References:
- Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal
  - From: Michael Meissner
- Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal
  - From: Segher Boessenkool
- Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal
  - From: Alan Lawrence

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]