[Vectorizer] Support masking fold left reductions
Alejandro Martinez Vicente
Alejandro.MartinezVicente@arm.com
Wed Jun 12 15:23:00 GMT 2019
Hi,
This patch adds support in the vectorizer for masking fold left reductions.
This avoids the need to insert a conditional assignment with some identity
value.
For example, this C code:
double
f (double *restrict x, int n)
{
double res = 0.0;
for (int i = 0; i < n; i++)
{
res += x[i];
}
return res;
}
Produced this for SVE:
0000000000000000 <f>:
0: 2f00e400 movi d0, #0x0
4: 7100003f cmp w1, #0x0
8: 5400018d b.le 38 <f+0x38>
c: d2800002 mov x2, #0x0 // #0
10: 93407c21 sxtw x1, w1
14: 25f8c002 mov z2.d, #0
18: 25e11fe0 whilelo p0.d, xzr, x1
1c: 25d8e3e1 ptrue p1.d
20: a5e24001 ld1d {z1.d}, p0/z, [x0, x2, lsl #3]
24: 04f0e3e2 incd x2
28: 05e2c021 sel z1.d, p0, z1.d, z2.d
2c: 25e11c40 whilelo p0.d, x2, x1
30: 65d82420 fadda d0, p1, d0, z1.d
34: 54ffff61 b.ne 20 <f+0x20> // b.any
38: d65f03c0 ret
And now I get this:
0000000000000000 <f>:
0: 2f00e400 movi d0, #0x0
4: 7100003f cmp w1, #0x0
8: 5400012d b.le 2c <f+0x2c>
c: d2800002 mov x2, #0x0 // #0
10: 93407c21 sxtw x1, w1
14: 25e11fe0 whilelo p0.d, xzr, x1
18: a5e24001 ld1d {z1.d}, p0/z, [x0, x2, lsl #3]
1c: 04f0e3e2 incd x2
20: 65d82020 fadda d0, p0, d0, z1.d
24: 25e11c40 whilelo p0.d, x2, x1
28: 54ffff81 b.ne 18 <f+0x18> // b.any
2c: d65f03c0 ret
I've added a new test and run the regression testing. Ok for trunk?
Alejandro
2019-06-12 Alejandro Martinez <alejandro.martinezvicente@arm.com>
gcc/
* config/aarch64/aarch64-sve.md (mask_fold_left_plus_<mode>): Renamed
from "*fold_left_plus_<mode>", updated operands order.
* doc/md.texi (mask_fold_left_plus_@var{m}): Documented new optab.
* internal-fn.c (mask_fold_left_direct): New define.
(expand_mask_fold_left_optab_fn): Likewise.
(direct_mask_fold_left_optab_supported_p): Likewise.
* internal-fn.def (MASK_FOLD_LEFT_PLUS): New internal function.
* optabs.def (mask_fold_left_plus_optab): New optab.
* tree-vect-loop.c (mask_fold_left_plus_optab): New function to get a
masked internal_fn for a reduction ifn.
(vectorize_fold_left_reduction): Add support for masking reductions.
gcc/testsuite/
* gcc.target/aarch64/sve/fadda_1.c: New test.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mask_fold_left_v3.patch
Type: application/octet-stream
Size: 7602 bytes
Desc: mask_fold_left_v3.patch
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20190612/83c5cf53/attachment.obj>
More information about the Gcc-patches
mailing list