[Vectorizer] Support masking fold left reductions

Alejandro Martinez Vicente Alejandro.MartinezVicente@arm.com
Wed Jun 12 15:23:00 GMT 2019


Hi,

This patch adds support in the vectorizer for masking fold left reductions.
This avoids the need to insert a conditional assignment with some identity
value.

For example, this C code:

double
f (double *restrict x, int n)
{
  double res = 0.0;
  for (int i = 0; i < n; i++)
    {
      res += x[i];
    }
  return res;
}

Produced this for SVE:

0000000000000000 <f>:
   0:   2f00e400    movi    d0, #0x0
   4:   7100003f    cmp w1, #0x0
   8:   5400018d    b.le    38 <f+0x38>
   c:   d2800002    mov x2, #0x0                    // #0
  10:   93407c21    sxtw    x1, w1
  14:   25f8c002    mov z2.d, #0
  18:   25e11fe0    whilelo p0.d, xzr, x1
  1c:   25d8e3e1    ptrue   p1.d
  20:   a5e24001    ld1d    {z1.d}, p0/z, [x0, x2, lsl #3]
  24:   04f0e3e2    incd    x2
  28:   05e2c021    sel z1.d, p0, z1.d, z2.d
  2c:   25e11c40    whilelo p0.d, x2, x1
  30:   65d82420    fadda   d0, p1, d0, z1.d
  34:   54ffff61    b.ne    20 <f+0x20>  // b.any
  38:   d65f03c0    ret

And now I get this:

0000000000000000 <f>:
   0:   2f00e400    movi    d0, #0x0
   4:   7100003f    cmp w1, #0x0
   8:   5400012d    b.le    2c <f+0x2c>
   c:   d2800002    mov x2, #0x0                    // #0
  10:   93407c21    sxtw    x1, w1
  14:   25e11fe0    whilelo p0.d, xzr, x1
  18:   a5e24001    ld1d    {z1.d}, p0/z, [x0, x2, lsl #3]
  1c:   04f0e3e2    incd    x2
  20:   65d82020    fadda   d0, p0, d0, z1.d
  24:   25e11c40    whilelo p0.d, x2, x1
  28:   54ffff81    b.ne    18 <f+0x18>  // b.any
  2c:   d65f03c0    ret

I've added a new test and run the regression testing. Ok for trunk?

Alejandro

2019-06-12  Alejandro Martinez  <alejandro.martinezvicente@arm.com>

gcc/
	* config/aarch64/aarch64-sve.md (mask_fold_left_plus_<mode>): Renamed
	from "*fold_left_plus_<mode>", updated operands order.
	* doc/md.texi (mask_fold_left_plus_@var{m}): Documented new optab.
	* internal-fn.c (mask_fold_left_direct): New define.
	(expand_mask_fold_left_optab_fn): Likewise.
	(direct_mask_fold_left_optab_supported_p): Likewise.
	* internal-fn.def (MASK_FOLD_LEFT_PLUS): New internal function.
	* optabs.def (mask_fold_left_plus_optab): New optab.
	* tree-vect-loop.c (mask_fold_left_plus_optab): New function to get a
	masked internal_fn for a reduction ifn.
	(vectorize_fold_left_reduction): Add support for masking reductions.

gcc/testsuite/
	* gcc.target/aarch64/sve/fadda_1.c: New test.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mask_fold_left_v3.patch
Type: application/octet-stream
Size: 7602 bytes
Desc: mask_fold_left_v3.patch
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20190612/83c5cf53/attachment.obj>


More information about the Gcc-patches mailing list