[PATCH GCC]New vectorization pattern turning cond_expr into max/min and plus/minus
Bin.Cheng
amker.cheng@gmail.com
Thu Oct 20 16:32:00 GMT 2016
On Wed, Oct 12, 2016 at 9:50 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Wed, Oct 12, 2016 at 10:29 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Wed, Oct 12, 2016 at 9:12 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, Oct 11, 2016 at 5:03 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>> Hi,
>>>> Given below test case,
>>>> int foo (unsigned short a[], unsigned int x)
>>>> {
>>>> unsigned int i;
>>>> for (i = 0; i < 1000; i++)
>>>> {
>>>> x = a[i];
>>>> a[i] = (unsigned short)(x >= 32768 ? x - 32768 : 0);
>>>> }
>>>> return x;
>>>> }
>>>>
>>>> it now can be vectorized on AArch64, but generated assembly is way from optimal:
>>>> .L4:
>>>> ldr q4, [x3, x1]
>>>> add w2, w2, 1
>>>> cmp w2, w0
>>>> ushll v1.4s, v4.4h, 0
>>>> ushll2 v0.4s, v4.8h, 0
>>>> add v3.4s, v1.4s, v6.4s
>>>> add v2.4s, v0.4s, v6.4s
>>>> cmhi v1.4s, v1.4s, v5.4s
>>>> cmhi v0.4s, v0.4s, v5.4s
>>>> and v1.16b, v3.16b, v1.16b
>>>> and v0.16b, v2.16b, v0.16b
>>>> xtn v2.4h, v1.4s
>>>> xtn2 v2.8h, v0.4s
>>>> str q2, [x3, x1]
>>>> add x1, x1, 16
>>>> bcc .L4
>>>>
>>>> The vectorized loop has 15 instructions, which can be greatly simplified by turning cond_expr into max_expr, as below:
>>>> .L4:
>>>> ldr q1, [x3, x1]
>>>> add w2, w2, 1
>>>> cmp w2, w0
>>>> umax v0.8h, v1.8h, v2.8h
>>>> add v0.8h, v0.8h, v2.8h
>>>> str q0, [x3, x1]
>>>> add x1, x1, 16
>>>> bcc .L4
>>>>
>>>> This patch addresses the issue by adding new vectorization pattern.
>>>> Bootstrap and test on x86_64 and AArch64. Is it OK?
>>>
>>> So the COND_EXPRs are generated this way by if-conversion, right? I
>> Though ?: is used in source code, yes, it is if-conv regenerated COND_EXPR.
>>> believe that
>>> the MAX/MIN_EXPR form is always preferrable and thus it looks like if-conversion
>>> might want to either directly generate it or make sure to fold the
>>> introduced stmts
>>> (and have a match.pd pattern catching this).
>> Hmm, I also noticed saturation cases which should be better
>> transformed before vectorization in scalar optimizers. But this case
>> is a bit different because there is additional computation involved
>> other than type conversion. We need to prove the computation can be
>> done in either large or small types. It is quite specific case and I
>> don't see good (general) solution in if-conv. Vect-pattern looks like
>> a natural place doing this. I am also looking at general saturation
>> cases, but this one is different?
>
> (vect-patterns should go away ...)
>
> But as if-conversion results may also prevail for scalar code doing the
> pattern in match.pd would be better - that is, "apply" the pattern
> already during if-conversion.
>
> Yes, if-conversion fails to fold the stmts it generates (it only uses
> generic folding on the trees it builds - it can need some TLC here).
Hi,
Sorry for being slow in replying, I looked into match.pd and can
transform simpler cond_expr into minmax expr successfully, but this
one is more complicated. It transforms 3 gimple statements into 2
result statements, but result of match&simplify pattern is an
expression. How should I write the pattern outputing two gimple
statement as result? Hmm, now I see the transform looks more like
gimple combine...
Thanks,
bin
More information about the Gcc-patches
mailing list