This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[PATCH][vectorizer][2/2] PR 65951: Hook up mult synthesis logic into vectorisation of mult-by-constant

From: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>
To: GCC Patches <gcc-patches at gcc dot gnu dot org>
Date: Mon, 13 Jun 2016 13:23:22 +0100
Subject: [PATCH][vectorizer][2/2] PR 65951: Hook up mult synthesis logic into vectorisation of mult-by-constant
Authentication-results: sourceware.org; auth=none

Hi all,

This patch allows the vectoriser to synthesize multiplications by an integer constant using
the algorithms determined by choose_mult_variant from expmed.c.
choose_mult_variant returns an algorithm structure that is a linked list of steps describing
how to synthesize an integer multiplication by any constant using shifts, adds, subs, and negation.

The new function vect_synth_mult_by_constant that does all the hard work is very similar in structure
to expand_mult_const from expmed.c but it operates on gimple SSA rather than RTL.

Note that we synthesize the multiplications if the target does not support a vector multiplication
in the current vector mode we're processing. So, for aarch64 this effectively means V2DI (aarch64
has a vector multiply instruction for narrower inner modes).

This allows us to vectorise more 64-bit multiplications on aarch64. For example:
foo (long long *arr)
{
  for (int i = 0; i < N; i++)
    arr[i] *= 5;
}

will now generate:
.L3:
        ldr     q0, [x3, x1]
        add     w2, w2, 1
        cmp     w2, w4
        shl     v1.2d, v0.2d, 2
        add     v0.2d, v0.2d, v1.2d
        str     q0, [x3, x1]
        add     x1, x1, 16
        bcc     .L3

in the vectorised loop whereas before we would not vectorise this and generate:
.L2:
        ldr     x1, [x0]
        add     x1, x1, x1, lsl 2
        str     x1, [x0], 8
        cmp     x0, x2
        bne     .L2


A multiplication with a more complex immediate also works.
For example:
foo (long long *arr)
{
  for (int i = 0; i < N; i++)
    arr[i] *= 19594LL;
}

produces 9 add/shift vector instructions but is rejected by the vector cost model.

I've added a couple of execute testcases but added a target check for aarch64 for the
scan-dump checks because I expect these testcases to synthesize the 64-bit vector
multiplication on targets that don't have a 64-bit vector multiply. I can't use ! vect_int_mult
because aarch64 does have vector multiplication, just not 64-bit.
The testcases don't vectorise on arm with NEON because V2DI shifts are disabled in neon.md:
; TODO: V2DI shifts are current disabled because there are bugs in the
; generic vectorizer code.  It ends up creating a V2DI constructor with
; SImode elements.

(define_insn "vashl<mode>3"

That's something separate to look at another time

What do you think of this approach?

Bootstrapped and tested on arm, aarch64, x86_64.

This code didn't trigger much in SPEC2006 as the place where it was important to convert mults
into shifts was in hmmer and Venkat had already implemented that transformation for constants
that are powers of two.  This patch just extends that functionality to arbitrary constants.

Thanks,
Kyrill

2016-06-13  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

    PR target/65951
    * tree-vect-patterns.c: Include mult-synthesis.h
    (target_supports_mult_synth_alg): New function.
    (apply_binop_and_append_stmt): Likewise.
    (vect_synth_mult_by_constant): Likewise.
    (target_has_vecop_for_code): Likewise.
    (vect_recog_mult_pattern): Use above functions to synthesize vector
    multiplication by integer constants.

2016-06-13  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

    * gcc.dg/vect/vect-mult-const-pattern-1.c: New test.
    * gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..e5dba82d7fa955a6a37a0eabf980127e464ac77b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 256
+
+__attribute__ ((noinline)) void
+foo (long long *arr)
+{
+  for (int i = 0; i < N; i++)
+    arr[i] *= 123;
+}
+
+int
+main (void)
+{
+  check_vect ();
+  long long data[N];
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      data[i] = i;
+      __asm__ volatile ("");
+    }
+
+  foo (data);
+  for (i = 0; i < N; i++)
+    {
+      if (data[i] / 123 != i)
+      __builtin_abort ();
+      __asm__ volatile ("");
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect"  { target aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target aarch64*-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..83019c96910b866e364a7c2e00261a1ded13cb53
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 256
+
+__attribute__ ((noinline)) void
+foo (long long *arr)
+{
+  for (int i = 0; i < N; i++)
+    arr[i] *= -19594LL;
+}
+
+int
+main (void)
+{
+  check_vect ();
+  long long data[N];
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      data[i] = i;
+      __asm__ volatile ("");
+    }
+
+  foo (data);
+  for (i = 0; i < N; i++)
+    {
+      if (data[i] / -19594LL != i)
+      __builtin_abort ();
+      __asm__ volatile ("");
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect"  { target aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target aarch64*-*-* } } } */
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 712b34c39c3db3adbf25208cb926ec5663929296..2439b5b6d617622a803dc4fb3daa64252520e465 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "internal-fn.h"
 #include "case-cfn-macros.h"
+#include "mult-synthesis.h"
 
 /* Pattern recognition functions  */
 static gimple *vect_recog_widen_sum_pattern (vec<gimple *> *, tree *,
@@ -2115,32 +2116,204 @@ vect_recog_vector_vector_shift_pattern (vec<gimple *> *stmts,
   return pattern_stmt;
 }
 
-/* Detect multiplication by constant which are postive or negatives of power 2,
-   and convert them to shift patterns.
+/* Return true iff the target has a vector optab implementing the operation
+   CODE on type VECTYPE.  */
 
-   Mult with constants that are postive power of two.
-   type a_t;
-   type b_t
-   S1: b_t = a_t * n
+static bool
+target_has_vecop_for_code (tree_code code, tree vectype)
+{
+  optab voptab = optab_for_tree_code (code, vectype, optab_vector);
+  return voptab
+	 && optab_handler (voptab, TYPE_MODE (vectype)) != CODE_FOR_nothing;
+}
 
-   or
+/* Verify that the target has optabs of VECTYPE to perform all the steps
+   needed by the multiplication-by-immediate synthesis algorithm described
+   by ALG and VAR.  Return true iff that is the case.  */
+
+static bool
+target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
+				 tree vectype)
+{
+  if (alg->op[0] != alg_zero && alg->op[0] != alg_m)
+    return false;
+
+  /* All synthesis algorithms require shifts, so bail out early if
+     target cannot vectorize them.  */
+  if (!target_has_vecop_for_code (LSHIFT_EXPR, vectype))
+    return false;
+
+  bool supports_vminus = target_has_vecop_for_code (MINUS_EXPR, vectype);
+  bool supports_vplus = target_has_vecop_for_code (PLUS_EXPR, vectype);
+
+  if (var == negate_variant
+      && !target_has_vecop_for_code (NEGATE_EXPR, vectype))
+    return false;
+
+  if (var == add_variant && !supports_vplus)
+    return false;
 
-   Mult with constants that are negative power of two.
-   S2: b_t = a_t * -n
+  for (int i = 1; i < alg->ops; i++)
+    {
+      switch (alg->op[i])
+	{
+	case alg_shift:
+	  break;
+	case alg_add_t_m2:
+	case alg_add_t2_m:
+	case alg_add_factor:
+	  if (!supports_vplus)
+	    return false;
+	  break;
+	case alg_sub_t_m2:
+	case alg_sub_t2_m:
+	case alg_sub_factor:
+	  if (!supports_vminus)
+	    return false;
+	  break;
+	case alg_unknown:
+	case alg_m:
+	case alg_zero:
+	case alg_impossible:
+	  return false;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+  return true;
+}
+
+/* Helper for vect_synth_mult_by_constant.  Apply a binary operation
+   CODE to operands OP1 and OP2, creating a new temporary SSA var in
+   the process.  Append the resulting assignment statement to the sequence
+   in STMT_VINFO.  Return the newly created SSA variable which is the result
+   of the binary operation.  */
+
+static tree
+apply_binop_and_append_stmt (tree_code code, tree op1, tree op2,
+			     stmt_vec_info stmt_vinfo)
+{
+  tree itype = TREE_TYPE (op1);
+  tree tmp_var = vect_recog_temp_ssa_var (itype, NULL);
+  gimple *stmt = gimple_build_assign (tmp_var, code, op1, op2);
+  append_pattern_def_seq (stmt_vinfo, stmt);
+  return tmp_var;
+}
+
+/* Synthesize a multiplication of OP by an INTEGER_CST VAL using shifts
+   and simple arithmetic operations to be vectorized.  Record the statements
+   produced in STMT_VINFO and return the last statement in the sequence or
+   NULL if it's not possible to synthesize such a multiplication.
+   This function mirrors the behavior of expand_mult_const in expmed.c but
+   works on tree-ssa form.  */
+
+static gimple *
+vect_synth_mult_by_constant (tree op, tree val,
+			     stmt_vec_info stmt_vinfo)
+{
+  tree itype = TREE_TYPE (op);
+  machine_mode mode = TYPE_MODE (itype);
+  struct algorithm alg;
+  mult_variant variant;
+  if (!tree_fits_shwi_p (val))
+    return NULL;
+
+  HOST_WIDE_INT hwval = tree_to_shwi (val);
+  /* Use MAX_COST here as we don't want to limit the sequence on rtx costs.
+     The vectorizer's benefit analysis will decide whether it's beneficial
+     to do this.  */
+  bool possible = choose_mult_variant (mode, hwval, &alg,
+					&variant, MAX_COST);
+  if (!possible)
+    return NULL;
+
+  tree vectype = get_vectype_for_scalar_type (itype);
+  if (!vectype || !target_supports_mult_synth_alg (&alg, variant, vectype))
+    return NULL;
+
+  tree accumulator = alg.op[0] == alg_zero ? build_int_cst (itype, 0) : op;
+  gimple *stmt = NULL;
+  bool needs_fixup = (variant == negate_variant)
+		      || (variant == add_variant);
+
+  /* Clear out the sequence of statements so we can populate it below.  */
+  STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo) = NULL;
+  for (int i = 1; i < alg.ops; i++)
+    {
+      tree shft_log = build_int_cst (itype, alg.log[i]);
+      tree accum_tmp = vect_recog_temp_ssa_var (itype, NULL);
+      tree tmp_var = NULL_TREE;
+
+      switch (alg.op[i])
+	{
+	case alg_shift:
+	  stmt = gimple_build_assign (accum_tmp, LSHIFT_EXPR, accumulator,
+				       shft_log);
+	  break;
+	case alg_add_t_m2:
+	  tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, op,
+						  shft_log, stmt_vinfo);
+	  stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator,
+				       tmp_var);
+	  break;
+	case alg_sub_t_m2:
+	  tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, op,
+						  shft_log, stmt_vinfo);
+	  stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, accumulator,
+				       tmp_var);
+	  break;
+	case alg_add_t2_m:
+	  tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator,
+						  shft_log, stmt_vinfo);
+	  stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, tmp_var, op);
+	  break;
+	case alg_sub_t2_m:
+	  tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator,
+						  shft_log, stmt_vinfo);
+	  stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, tmp_var, op);
+	  break;
+	case alg_add_factor:
+	  tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator,
+						  shft_log, stmt_vinfo);
+	  stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator,
+				       tmp_var);
+	  break;
+	case alg_sub_factor:
+	  tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator,
+						  shft_log, stmt_vinfo);
+	  stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, tmp_var,
+				      accumulator);
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+      /* We don't want to append the last stmt in the sequence to stmt_vinfo
+	 but rather return it directly.  */
+      if ((i < alg.ops - 1) || needs_fixup)
+	append_pattern_def_seq (stmt_vinfo, stmt);
+      accumulator = accum_tmp;
+    }
+  if (variant == negate_variant)
+    {
+      tree accum_tmp = vect_recog_temp_ssa_var (itype, NULL);
+      stmt = gimple_build_assign (accum_tmp, NEGATE_EXPR, accumulator);
+    }
+  else if (variant == add_variant)
+    {
+      tree accum_tmp = vect_recog_temp_ssa_var (itype, NULL);
+      stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator, op);
+    }
+  return stmt;
+}
+
+/* Detect multiplication by constant and convert it into a sequence of
+   shifts and additions, subtractions, negations.  We reuse the
+   choose_mult_variant algorithms from expmed.c
 
    Input/Output:
 
    STMTS: Contains a stmt from which the pattern search begins,
-   i.e. the mult stmt.  Convert the mult operation to LSHIFT if
-   constant operand is a power of 2.
-   type a_t, b_t
-   S1': b_t = a_t << log2 (n)
-
-   Convert the mult operation to LSHIFT and followed by a NEGATE
-   if constant operand is a negative power of 2.
-   type a_t, b_t, res_T;
-   S2': b_t = a_t << log2 (n)
-   S3': res_T  = - (b_t)
+   i.e. the mult stmt.
 
  Output:
 
@@ -2148,8 +2321,8 @@ vect_recog_vector_vector_shift_pattern (vec<gimple *> *stmts,
 
   * TYPE_OUT: The type of the output of this pattern.
 
-  * Return value: A new stmt that will be used to replace the multiplication
-    S1 or S2 stmt.  */
+  * Return value: A new stmt that will be used to replace
+    the multiplication.  */
 
 static gimple *
 vect_recog_mult_pattern (vec<gimple *> *stmts,
@@ -2157,11 +2330,8 @@ vect_recog_mult_pattern (vec<gimple *> *stmts,
 {
   gimple *last_stmt = stmts->pop ();
   tree oprnd0, oprnd1, vectype, itype;
-  gimple *pattern_stmt, *def_stmt;
-  optab optab;
+  gimple *pattern_stmt;
   stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
-  int power2_val, power2_neg_val;
-  tree shift;
 
   if (!is_gimple_assign (last_stmt))
     return NULL;
@@ -2185,52 +2355,17 @@ vect_recog_mult_pattern (vec<gimple *> *stmts,
 
   /* If the target can handle vectorized multiplication natively,
      don't attempt to optimize this.  */
-  optab = optab_for_tree_code (MULT_EXPR, vectype, optab_default);
-  if (optab != unknown_optab)
+  optab mul_optab = optab_for_tree_code (MULT_EXPR, vectype, optab_default);
+  if (mul_optab != unknown_optab)
     {
       machine_mode vec_mode = TYPE_MODE (vectype);
-      int icode = (int) optab_handler (optab, vec_mode);
+      int icode = (int) optab_handler (mul_optab, vec_mode);
       if (icode != CODE_FOR_nothing)
-	return NULL;
+       return NULL;
     }
 
-  /* If target cannot handle vector left shift then we cannot
-     optimize and bail out.  */
-  optab = optab_for_tree_code (LSHIFT_EXPR, vectype, optab_vector);
-  if (!optab
-      || optab_handler (optab, TYPE_MODE (vectype)) == CODE_FOR_nothing)
-    return NULL;
-
-  power2_val = wi::exact_log2 (oprnd1);
-  power2_neg_val = wi::exact_log2 (wi::neg (oprnd1));
-
-  /* Handle constant operands that are postive or negative powers of 2.  */
-  if (power2_val != -1)
-    {
-      shift = build_int_cst (itype, power2_val);
-      pattern_stmt
-	= gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
-			       LSHIFT_EXPR, oprnd0, shift);
-    }
-  else if (power2_neg_val != -1)
-    {
-      /* If the target cannot handle vector NEGATE then we cannot
-	 do the optimization.  */
-      optab = optab_for_tree_code (NEGATE_EXPR, vectype, optab_vector);
-      if (!optab
-	  || optab_handler (optab, TYPE_MODE (vectype)) == CODE_FOR_nothing)
-	return NULL;
-
-      shift = build_int_cst (itype, power2_neg_val);
-      def_stmt
-	= gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
-			       LSHIFT_EXPR, oprnd0, shift);
-      new_pattern_def_seq (stmt_vinfo, def_stmt);
-      pattern_stmt
-	 = gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
-				NEGATE_EXPR, gimple_assign_lhs (def_stmt));
-    }
-  else
+  pattern_stmt = vect_synth_mult_by_constant (oprnd0, oprnd1, stmt_vinfo);
+  if (!pattern_stmt)
     return NULL;
 
   /* Pattern detected.  */

Follow-Ups:
- Re: [PATCH][vectorizer][2/2] PR 65951: Hook up mult synthesis logic into vectorisation of mult-by-constant
  - From: Marc Glisse
- Re: [PATCH][vectorizer][2/2] PR 65951: Hook up mult synthesis logic into vectorisation of mult-by-constant
  - From: Marc Glisse
- Re: [PATCH][vectorizer][2/2] PR 65951: Hook up mult synthesis logic into vectorisation of mult-by-constant
  - From: Marc Glisse

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]