[PATCH, rs6000] Add expansions for min/max vector reductions

Fri Sep 18 13:54:00 GMT 2015

On Fri, 2015-09-18 at 15:15 +0200, Richard Biener wrote:
> On Fri, 18 Sep 2015, Bill Schmidt wrote:
> 
> > On Fri, 2015-09-18 at 10:38 +0200, Richard Biener wrote:
> > > On Thu, 17 Sep 2015, Segher Boessenkool wrote:
> > > 
> > > > On Thu, Sep 17, 2015 at 09:18:42AM -0500, Bill Schmidt wrote:
> > > > > On Thu, 2015-09-17 at 09:39 +0200, Richard Biener wrote:
> > > > > > So just to clarify - you need to reduce the vector with max to a scalar
> > > > > > but want the (same) result in all vector elements?
> > > > > 
> > > > > Yes.  Alan Hayward's cond-reduction patch is set up to perform a
> > > > > reduction to scalar, followed by a scalar broadcast to get the value
> > > > > into all positions.  It happens that our most efficient expansion to
> > > > > reduce to scalar will naturally produce the value in all positions.
> > > > 
> > > > It also is many insns after expand, so relying on combine to combine
> > > > all that plus the following splat (as Richard suggests below) is not
> > > > really going to work.
> > > > 
> > > > If there also are targets where the _scal version is cheaper, maybe
> > > > we should keep both, and have expand expand to whatever the target
> > > > supports?
> > > 
> > > Wait .. so you don't actually have an instruction to do, say,
> > > REDUC_MAX_EXPR (neither to scalar nor to vector)?  Then it's better
> > > to _not_ define such pattern and let the vectorizer generate
> > > its fallback code.  If the fallback code isn't "best" then better
> > > think of a way to make it choose the best variant out of its
> > > available ones (and maybe add another).  I think it tests
> > > availability of the building blocks for the variants and simply
> > > picks the first that works without checking the cost model.
> > 
> > That's what we were considering per Alan Lawrence's suggestion elsewhere
> > in this thread, but there isn't currently a way to represent a
> > whole-vector rotate in gimple.  So we'd either have to add that or fall
> > back to an inferior code sequence, I believe.
> 
> A whole-vector rotate is just a VEC_PERM with a proper constant mask.
> Of course the target would have to detect these cases and use
> vector rotate instructions (x86 does that for example).

Hm, yes, that's right.  And we should already have those special-permute
recognitions in place; I had just forgotten about them.  Ok, I agree
this is probably the best approach, then.

I'll have to refresh my memory on Alan H's patch, but we may need to add
logic to do this sort of epilogue expansion for his new reduction.  I
think right now it is just giving up if the REUC_MAX_EXPR isn't
supported.

Thanks,
Bill

> 
> Richard.
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>