[PATCH GCC]New vectorization pattern turning cond_expr into max/min and plus/minus

Bin.Cheng amker.cheng@gmail.com
Wed Oct 12 08:30:00 GMT 2016


On Wed, Oct 12, 2016 at 9:12 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Oct 11, 2016 at 5:03 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>> Hi,
>> Given below test case,
>> int foo (unsigned short a[], unsigned int x)
>> {
>>   unsigned int i;
>>   for (i = 0; i < 1000; i++)
>>     {
>>       x = a[i];
>>       a[i] = (unsigned short)(x >= 32768 ? x - 32768 : 0);
>>     }
>>   return x;
>> }
>>
>> it now can be vectorized on AArch64, but generated assembly is way from optimal:
>> .L4:
>>         ldr     q4, [x3, x1]
>>         add     w2, w2, 1
>>         cmp     w2, w0
>>         ushll   v1.4s, v4.4h, 0
>>         ushll2  v0.4s, v4.8h, 0
>>         add     v3.4s, v1.4s, v6.4s
>>         add     v2.4s, v0.4s, v6.4s
>>         cmhi    v1.4s, v1.4s, v5.4s
>>         cmhi    v0.4s, v0.4s, v5.4s
>>         and     v1.16b, v3.16b, v1.16b
>>         and     v0.16b, v2.16b, v0.16b
>>         xtn     v2.4h, v1.4s
>>         xtn2    v2.8h, v0.4s
>>         str     q2, [x3, x1]
>>         add     x1, x1, 16
>>         bcc     .L4
>>
>> The vectorized loop has 15 instructions, which can be greatly simplified by turning cond_expr into max_expr, as below:
>> .L4:
>>         ldr     q1, [x3, x1]
>>         add     w2, w2, 1
>>         cmp     w2, w0
>>         umax    v0.8h, v1.8h, v2.8h
>>         add     v0.8h, v0.8h, v2.8h
>>         str     q0, [x3, x1]
>>         add     x1, x1, 16
>>         bcc     .L4
>>
>> This patch addresses the issue by adding new vectorization pattern.
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>
> So the COND_EXPRs are generated this way by if-conversion, right?  I
Though ?: is used in source code, yes, it is if-conv regenerated COND_EXPR.
> believe that
> the MAX/MIN_EXPR form is always preferrable and thus it looks like if-conversion
> might want to either directly generate it or make sure to fold the
> introduced stmts
> (and have a match.pd pattern catching this).
Hmm, I also noticed saturation cases which should be better
transformed before vectorization in scalar optimizers.  But this case
is a bit different because there is additional computation involved
other than type conversion.  We need to prove the computation can be
done in either large or small types.  It is quite specific case and I
don't see good (general) solution in if-conv.  Vect-pattern looks like
a natural place doing this.  I am also looking at general saturation
cases, but this one is different?

Thanks,
bin
>
> Richard.
>
>> Thanks,
>> bin
>>
>> 2016-10-11  Bin Cheng  <bin.cheng@arm.com>
>>
>>         * tree-vect-patterns.c (vect_recog_min_max_modify_pattern): New.
>>         (vect_vect_recog_func_ptrs): New element for above pattern.
>>         * tree-vectorizer.h (NUM_PATTERNS): Increase by 1.
>>
>> gcc/testsuite/ChangeLog
>> 2016-10-11  Bin Cheng  <bin.cheng@arm.com>
>>
>>         * gcc.dg/vect/vect-umax-modify-pattern.c: New test.
>>         * gcc.dg/vect/vect-umin-modify-pattern.c: New test.



More information about the Gcc-patches mailing list