This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
- From: Bill Schmidt <wschmidt at linux dot vnet dot ibm dot com>
- To: Alan Lawrence <alan dot lawrence at arm dot com>
- Cc: gcc-patches at gcc dot gnu dot org, Ramana Radhakrishnan <Ramana dot Radhakrishnan at arm dot com>, Richard Sandiford <Richard dot Sandiford at arm dot com>, Alan Hayward <Alan dot Hayward at arm dot com>
- Date: Mon, 14 Sep 2015 09:10:43 -0500
- Subject: Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
- Authentication-results: sourceware.org; auth=none
- References: <D217578B dot 7FE4%alan dot hayward at arm dot com> <1441923254 dot 4772 dot 37 dot camel at oc8801110288 dot ibm dot com> <D2184E16 dot 8003%alan dot hayward at arm dot com> <1441977591 dot 2795 dot 11 dot camel at gnopaine> <55F69799 dot 8010901 at arm dot com>
On Mon, 2015-09-14 at 10:47 +0100, Alan Lawrence wrote:
> On 11/09/15 14:19, Bill Schmidt wrote:
> >
> > A secondary concern for powerpc is that REDUC_MAX_EXPR produces a scalar
> > that has to be broadcast back to a vector, and the best way to implement
> > it for us already has the max value in all positions of a vector. But
> > that is something we should be able to fix with simplify-rtx in the back
> > end.
>
> Reading this thread again, this bit stands out as unaddressed. Yes PowerPC can
> "fix" this with simplify-rtx, but the vector cost model will not take this into
> account - it will think that the broadcast-back-to-a-vector requires an extra
> operation after the reduction, whereas in fact it will not.
>
> Does that suggest we should have a new entry in vect_cost_for_stmt for
> vec_to_scalar-and-back-to-vector (that defaults to vec_to_scalar+scalar_to_vec,
> but on some architectures e.g. PowerPC would be the same as vec_to_scalar)?
Ideally I think we need to do something for that, yeah. The back ends
could try to patch up the cost when finishing costs for the loop body,
epilogue, etc., but that would be somewhat of a guess; it would be
better to just be up-front that we're doing a reduction to a vector.
As part of this, I dislike the term "vec_to_scalar", which is somewhat
vague about what's going on (it sound like it could mean a vector
extract operation, which is more of an inverse of "scalar_to_vec" than a
reduction is). GIMPLE calls it a reduction, and the optabs call it a
reduction, so we ought to call it a reduction in the vectorizer cost
model, too.
To cover our bases for PowerPC and AArch32, we probably need:
plus_reduc_to_scalar
plus_reduc_to_vector
minmax_reduc_to_scalar
minmax_reduc_to_vector
although I think plus_reduc_to_vector wouldn't be used yet, so could be
omitted. If we go this route, then at that time we would change your
code to use minmax_reduc_to_vector and let the back ends determine
whether that requires a scalar reduction followed by a broadcast, or
whether it would be performed directly.
Using direct reduction to vector for MIN and MAX on PowerPC would be a
big cost savings over scalar reduction/broadcast.
Thanks,
Bill
>
> (I agree that if that's the limit of how "different" conditional reductions may
> be between architectures, then we should not have a vec_cost_for_stmt for a
> whole conditional reduction.)
>
> Cheers, Alan
>