This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)


Ramana Radhakrishnan <ramana.gcc@googlemail.com> writes:
> On Fri, Sep 11, 2015 at 2:19 PM, Bill Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
>> Hi Alan,
>>
>> I probably wasn't clear enough.  The implementation in the vectorizer is
>> fine and I'm not asking that to change per target.  What I'm objecting
>> to is the equivalence between a REDUC_MAX_EXPR and a cost associated
>> with vec_to_scalar.  This assumes that the back end will implement a
>> REDUC_MAX_EXPR in a specific way that at least some back ends cannot.
>> But those back ends should be free to model the cost of the
>> REDUC_MAX_EXPR appropriately.  Therefore I am asking for a new
>> vect_cost_for_stmt type to represent the cost of a REDUC_MAX_EXPR.  For
>> ARM, this cost will be the same as a vec_to_scalar.  For others, it may
>> not be; for powerpc, it certainly will not be.
>>
>> We can produce a perfectly fine sequence for a REDUC_MAX_EXPR during RTL
>> expansion, and therefore it is not correct for us to explode this in
>> tree-vect-generic.  This would expand the code size without providing
>> any significant optimization opportunity, and could reduce the ability
>> to, for instance, common REDUC_MAX_EXPRs.  It would also slow down the
>> gimple vectorizers.
>>
>> I apologize if my loose use of language confused the issue.  It isn't
>> the whole COND_REDUCTION I'm concerned with, but the REDUC_MAX_EXPRs
>> that are used by it.
>>
>> (The costs in powerpc won't be enormous, but they are definitely
>> mode-dependent in a way that vec_to_scalar is not.  We'll need 2*log(n)
>> instructions, where n is the number of elements in the mode being
>> vectorized.)
>
> IIUC, on AArch64 a reduc_max_expr matches with a single reduction
> operation but on AArch32 Neon a reduc_smax gets implemented as a
> sequence of vpmax instructions which sounds similar to the PowerPC
> example as well. Thus mapping a reduc_smax expression to the cost of a
> vec_to_scalar is probably not right in this particular situation.

But AIUI vec_to_scalar exists to represent reduction operations.
(I see it was also used for strided stores.)  So for better or worse,
I think the interface that Alan's patch uses is the defined interface
for measuring the cost of a reduction.

If a backend implemented reduc_umax_scal_optab in current sources,
without Alan's patch, then that optab would be used for a "natural"
unsigned max reduction (i.e. a reduction of a MAX_EXPR with unsigned
inputs).  vec_to_scalar would be used to weigh the cost of the epilogue
reduction statement in that case.

So if defining a new Power pattern might cause Alan's patch to trigger
in cases where the transformation is actually too expensive, I would
expect the same to be true for a natural umax without Alan's patch.
The two cases ought to underestimate the true cost by the same degree.

In other words, whether the cost interface is flexible enough is
definitely interesting but seems orthogonal to this patch.

Thanks,
Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]