This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Allow inner-loop reductions with variable-length vectors
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: gcc-patches at gcc dot gnu dot org,Richard Sandiford <richard dot sandiford at arm dot com>
- Date: Thu, 09 Aug 2018 17:42:22 +0200
- Subject: Re: Allow inner-loop reductions with variable-length vectors
- References: <87tvo31u3a.fsf@arm.com>
On August 9, 2018 4:40:41 PM GMT+02:00, Richard Sandiford <richard.sandiford@arm.com> wrote:
>While working on PR 86871, I noticed we were being overly restrictive
>when handling variable-length vectors. For:
>
> for (i : ...)
> {
> res = ...;
> for (j : ...)
> res op= ...;
> a[i] = res;
> }
>
>we don't need a reduction operation (although we do for double
>reductions like:
>
> res = ...;
> for (i : ...)
> for (j : ...)
> res op= ...;
> a[i] = res;
>
>which must still be rejected).
>
>Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf and
>x86_64-linux-gnu. OK to install?
OK.
Richard.
>Richard
>
>
>2018-08-09 Richard Sandiford <richard.sandiford@arm.com>
>
>gcc/
> * tree-vect-loop.c (vectorizable_reduction): Allow inner-loop
> reductions for variable-length vectors.
>
>gcc/testsuite/
> * gcc.target/aarch64/sve/reduc_8.c: New test.
>
>Index: gcc/tree-vect-loop.c
>===================================================================
>--- gcc/tree-vect-loop.c 2018-08-01 16:14:50.227052736 +0100
>+++ gcc/tree-vect-loop.c 2018-08-09 15:38:35.230258362 +0100
>@@ -6711,6 +6711,7 @@ vectorizable_reduction (stmt_vec_info st
> }
>
> if (reduction_type != EXTRACT_LAST_REDUCTION
>+ && (!nested_cycle || double_reduc)
> && reduc_fn == IFN_LAST
> && !nunits_out.is_constant ())
> {
>Index: gcc/testsuite/gcc.target/aarch64/sve/reduc_8.c
>===================================================================
>--- /dev/null 2018-07-26 10:26:13.137955424 +0100
>+++ gcc/testsuite/gcc.target/aarch64/sve/reduc_8.c 2018-08-09
>15:38:35.230258362 +0100
>@@ -0,0 +1,20 @@
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -ftree-vectorize" } */
>+
>+int
>+reduc (int *restrict a, int *restrict b, int *restrict c)
>+{
>+ for (int i = 0; i < 100; ++i)
>+ {
>+ int res = 0;
>+ for (int j = 0; j < 100; ++j)
>+ if (b[i + j] != 0)
>+ res = c[i + j];
>+ a[i] = res;
>+ }
>+}
>+
>+/* { dg-final { scan-assembler-times {\tcmpne\tp[0-9]+\.s, } 1 } } */
>+/* We ought to use the CMPNE result for the SEL too. */
>+/* { dg-final { scan-assembler-not {\tcmpeq\tp[0-9]+\.s, } { xfail
>*-*-* } } } */
>+/* { dg-final { scan-assembler-times {\tsel\tz[0-9]+\.s, } 1 } } */