This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: Add support for in-order addition reduction using SVE FADDA

From: Richard Sandiford <richard dot sandiford at linaro dot org>
To: Richard Biener <richard dot guenther at gmail dot com>
Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
Date: Tue, 21 Nov 2017 16:38:02 +0000
Subject: Re: Add support for in-order addition reduction using SVE FADDA
Authentication-results: sourceware.org; auth=none
References: <87y3n4x2xf.fsf@linaro.org> <CAFiYyc2a-uCooS9nWiOnRksbTyt-WF4usZymVd_cgicveC_aWA@mail.gmail.com> <87ine5w1o9.fsf@linaro.org> <CAFiYyc038hBEbujx52bn8iv1f_Luih5+M11bjw9PSFxz-K-1NA@mail.gmail.com>
Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, Nov 20, 2017 at 1:54 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Fri, Nov 17, 2017 at 5:53 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> This patch adds support for in-order floating-point addition reductions,
>>>> which are suitable even in strict IEEE mode.
>>>>
>>>> Previously vect_is_simple_reduction would reject any cases that forbid
>>>> reassociation.  The idea is instead to tentatively accept them as
>>>> "FOLD_LEFT_REDUCTIONs" and only fail later if there is no target
>>>> support for them.  Although this patch only handles the particular
>>>> case of plus and minus on floating-point types, there's no reason in
>>>> principle why targets couldn't handle other cases.
>>>>
>>>> The vect_force_simple_reduction change makes it simpler for parloops
>>>> to read the type of reduction.
>>>>
>>>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
>>>> and powerpc64le-linux-gnu.  OK to install?
>>>
>>> I don't like that you add a new tree code for this.  A new IFN looks more
>>> suitable to me.
>>
>> OK.
>
> Thanks.  I'd like to eventually get rid of other vectorizer tree codes as well,
> like the REDUC_*_EXPR, DOT_PROD_EXPR and SAD_EXPR.  IFNs
> are now really the way to go for "target instructions on GIMPLE".

Glad you said that.  I ended up having to convert REDUC_*_EXPRs too,
since it was too ugly trying to support some reductions based on tree
codes and some on internal functions.  (I did try using code_helper,
but even then...)

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Thanks,
Richard

PS. This applies at the same point in the series as the FADDA patch.
I can rejig it to apply onto current trunk if that seems better.


2017-11-21  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR)
	(REDUC_AND_EXPR, REDUC_IOR_EXPR, REDUC_XOR_EXPR): Delete.
	* doc/generic.texi (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR)
	(REDUC_AND_EXPR, REDUC_IOR_EXPR, REDUC_XOR_EXPR): Delete.
	* cfgexpand.c (expand_debug_expr): Remove handling for them.
	* expr.c (expand_expr_real_2): Likewise.
	* fold-const.c (const_unop): Likewise.
	* optabs-tree.c (optab_for_tree_code): Likewise.
	* tree-cfg.c (verify_gimple_assign_unary): Likewise.
	* tree-inline.c (estimate_operator_cost): Likewise.
	* tree-pretty-print.c (dump_generic_node): Likewise.
	(op_code_prio): Likewise.
	(op_symbol_code): Likewise.
	* internal-fn.def (DEF_INTERNAL_SIGNED_OPTAB_FN): Define.
	(IFN_REDUC_PLUS, IFN_REDUC_MAX, IFN_REDUC_MIN, IFN_REDUC_AND)
	(IFN_REDUC_IOR, IFN_REDUC_XOR): New internal functions.
	* internal-fn.c (direct_internal_fn_optab): New function.
	(direct_internal_fn_array, direct_internal_fn_supported_p
	(internal_fn_expanders): Handle DEF_INTERNAL_SIGNED_OPTAB_FN.
	* fold-const-call.c (fold_const_reduction): New function.
	(fold_const_call): Handle CFN_REDUC_PLUS, CFN_REDUC_MAX, CFN_REDUC_MIN,
	CFN_REDUC_AND, CFN_REDUC_IOR and CFN_REDUC_XOR.
	* tree-vect-loop.c (reduction_code_for_scalar_code): Rename to...
	(reduction_fn_for_scalar_code): ...this and return an internal
	function.
	(vect_model_reduction_cost): Take an internal_fn rather than
	a tree_code.
	(vect_create_epilog_for_reduction): Likewise.  Build calls rather
	than assignments.
	(vectorizable_reduction): Use internal functions rather than tree
	codes for the reduction operation.  Update calls to the functions
	above.
	* config/aarch64/aarch64-builtins.c (aarch64_gimple_fold_builtin):
	Use calls to internal functions rather than REDUC tree codes.
	* config/aarch64/aarch64-simd.md: Update comment accordingly.

Index: gcc/tree.def
===================================================================
--- gcc/tree.def	2017-11-21 16:31:28.695326387 +0000
+++ gcc/tree.def	2017-11-21 16:31:49.729927809 +0000
@@ -1287,21 +1287,6 @@ DEFTREECODE (OMP_CLAUSE, "omp_clause", t
    Operand 0: BODY: contains body of the transaction.  */
 DEFTREECODE (TRANSACTION_EXPR, "transaction_expr", tcc_expression, 1)
 
-/* Reduction operations.
-   Operations that take a vector of elements and "reduce" it to a scalar
-   result (e.g. summing the elements of the vector, finding the minimum over
-   the vector elements, etc).
-   Operand 0 is a vector.
-   The expression returns a scalar, with type the same as the elements of the
-   vector, holding the result of the reduction of all elements of the operand.
-   */
-DEFTREECODE (REDUC_MAX_EXPR, "reduc_max_expr", tcc_unary, 1)
-DEFTREECODE (REDUC_MIN_EXPR, "reduc_min_expr", tcc_unary, 1)
-DEFTREECODE (REDUC_PLUS_EXPR, "reduc_plus_expr", tcc_unary, 1)
-DEFTREECODE (REDUC_AND_EXPR, "reduc_and_expr", tcc_unary, 1)
-DEFTREECODE (REDUC_IOR_EXPR, "reduc_ior_expr", tcc_unary, 1)
-DEFTREECODE (REDUC_XOR_EXPR, "reduc_xor_expr", tcc_unary, 1)
-
 /* Widening dot-product.
    The first two arguments are of type t1.
    The third argument and the result are of type t2, such that t2 is at least
Index: gcc/doc/generic.texi
===================================================================
--- gcc/doc/generic.texi	2017-11-21 16:31:28.695326387 +0000
+++ gcc/doc/generic.texi	2017-11-21 16:31:49.723928786 +0000
@@ -1740,12 +1740,6 @@ a value from @code{enum annot_expr_kind}
 @tindex VEC_PACK_FIX_TRUNC_EXPR
 @tindex VEC_COND_EXPR
 @tindex SAD_EXPR
-@tindex REDUC_MAX_EXPR
-@tindex REDUC_MIN_EXPR
-@tindex REDUC_PLUS_EXPR
-@tindex REDUC_AND_EXPR
-@tindex REDUC_IOR_EXPR
-@tindex REDUC_XOR_EXPR
 
 @table @code
 @item VEC_DUPLICATE_EXPR
@@ -1846,21 +1840,6 @@ must have the same type.  The size of th
 operand must be at lease twice of the size of the vector element of the
 first and second one.  The SAD is calculated between the first and second
 operands, added to the third operand, and returned.
-
-@item REDUC_MAX_EXPR
-@itemx REDUC_MIN_EXPR
-@itemx REDUC_PLUS_EXPR
-@itemx REDUC_AND_EXPR
-@itemx REDUC_IOR_EXPR
-@itemx REDUC_XOR_EXPR
-These nodes represent operations that take a vector input and repeatedly
-apply a binary operator on pairs of elements until only one scalar remains.
-For example, @samp{REDUC_PLUS_EXPR <@var{x}>} returns the sum of
-the elements in @var{x} and @samp{REDUC_MAX_EXPR <@var{x}>} returns
-the maximum element in @var{x}.  The associativity of the operation
-is unspecified; for example, @samp{REDUC_PLUS_EXPR <@var{x}>} could
-sum floating-point @var{x} in forward order, in reverse order,
-using a tree, or in some other way.
 @end table
 
 
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c	2017-11-21 16:31:28.695326387 +0000
+++ gcc/cfgexpand.c	2017-11-21 16:31:49.722928949 +0000
@@ -5066,12 +5066,6 @@ expand_debug_expr (tree exp)
 
     /* Vector stuff.  For most of the codes we don't have rtl codes.  */
     case REALIGN_LOAD_EXPR:
-    case REDUC_MAX_EXPR:
-    case REDUC_MIN_EXPR:
-    case REDUC_PLUS_EXPR:
-    case REDUC_AND_EXPR:
-    case REDUC_IOR_EXPR:
-    case REDUC_XOR_EXPR:
     case VEC_COND_EXPR:
     case VEC_PACK_FIX_TRUNC_EXPR:
     case VEC_PACK_SAT_EXPR:
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-11-21 16:31:28.695326387 +0000
+++ gcc/expr.c	2017-11-21 16:31:49.724928624 +0000
@@ -9440,29 +9440,6 @@ #define REDUCE_BIT_FIELD(expr)	(reduce_b
         return target;
       }
 
-    case REDUC_MAX_EXPR:
-    case REDUC_MIN_EXPR:
-    case REDUC_PLUS_EXPR:
-    case REDUC_AND_EXPR:
-    case REDUC_IOR_EXPR:
-    case REDUC_XOR_EXPR:
-      {
-        op0 = expand_normal (treeop0);
-        this_optab = optab_for_tree_code (code, type, optab_default);
-        machine_mode vec_mode = TYPE_MODE (TREE_TYPE (treeop0));
-
-	struct expand_operand ops[2];
-	enum insn_code icode = optab_handler (this_optab, vec_mode);
-
-	create_output_operand (&ops[0], target, mode);
-	create_input_operand (&ops[1], op0, vec_mode);
-	expand_insn (icode, 2, ops);
-	target = ops[0].value;
-	if (GET_MODE (target) != mode)
-	  return gen_lowpart (tmode, target);
-	return target;
-      }
-
     case VEC_UNPACK_HI_EXPR:
     case VEC_UNPACK_LO_EXPR:
       {
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-11-21 16:31:28.695326387 +0000
+++ gcc/fold-const.c	2017-11-21 16:31:49.725928461 +0000
@@ -1866,42 +1866,6 @@ const_unop (enum tree_code code, tree ty
 	return build_vector (type, elts);
       }
 
-    case REDUC_MIN_EXPR:
-    case REDUC_MAX_EXPR:
-    case REDUC_PLUS_EXPR:
-    case REDUC_AND_EXPR:
-    case REDUC_IOR_EXPR:
-    case REDUC_XOR_EXPR:
-      {
-	unsigned int nelts, i;
-	enum tree_code subcode;
-
-	if (TREE_CODE (arg0) != VECTOR_CST)
-	  return NULL_TREE;
-	nelts = VECTOR_CST_NELTS (arg0);
-
-	switch (code)
-	  {
-	  case REDUC_MIN_EXPR: subcode = MIN_EXPR; break;
-	  case REDUC_MAX_EXPR: subcode = MAX_EXPR; break;
-	  case REDUC_PLUS_EXPR: subcode = PLUS_EXPR; break;
-	  case REDUC_AND_EXPR: subcode = BIT_AND_EXPR; break;
-	  case REDUC_IOR_EXPR: subcode = BIT_IOR_EXPR; break;
-	  case REDUC_XOR_EXPR: subcode = BIT_XOR_EXPR; break;
-	  default: gcc_unreachable ();
-	  }
-
-	tree res = VECTOR_CST_ELT (arg0, 0);
-	for (i = 1; i < nelts; i++)
-	  {
-	    res = const_binop (subcode, res, VECTOR_CST_ELT (arg0, i));
-	    if (res == NULL_TREE || !CONSTANT_CLASS_P (res))
-	      return NULL_TREE;
-	  }
-
-	return res;
-      }
-
     case VEC_DUPLICATE_EXPR:
       if (CONSTANT_CLASS_P (arg0))
 	return build_vector_from_val (type, arg0);
Index: gcc/optabs-tree.c
===================================================================
--- gcc/optabs-tree.c	2017-11-21 16:31:28.695326387 +0000
+++ gcc/optabs-tree.c	2017-11-21 16:31:49.726928298 +0000
@@ -146,26 +146,6 @@ optab_for_tree_code (enum tree_code code
     case FMA_EXPR:
       return fma_optab;
 
-    case REDUC_MAX_EXPR:
-      return TYPE_UNSIGNED (type)
-	     ? reduc_umax_scal_optab : reduc_smax_scal_optab;
-
-    case REDUC_MIN_EXPR:
-      return TYPE_UNSIGNED (type)
-	     ? reduc_umin_scal_optab : reduc_smin_scal_optab;
-
-    case REDUC_PLUS_EXPR:
-      return reduc_plus_scal_optab;
-
-    case REDUC_AND_EXPR:
-      return reduc_and_scal_optab;
-
-    case REDUC_IOR_EXPR:
-      return reduc_ior_scal_optab;
-
-    case REDUC_XOR_EXPR:
-      return reduc_xor_scal_optab;
-
     case VEC_WIDEN_MULT_HI_EXPR:
       return TYPE_UNSIGNED (type) ?
 	vec_widen_umult_hi_optab : vec_widen_smult_hi_optab;
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	2017-11-21 16:31:28.695326387 +0000
+++ gcc/tree-cfg.c	2017-11-21 16:31:49.727928135 +0000
@@ -3774,21 +3774,6 @@ verify_gimple_assign_unary (gassign *stm
 
         return false;
       }
-    case REDUC_MAX_EXPR:
-    case REDUC_MIN_EXPR:
-    case REDUC_PLUS_EXPR:
-    case REDUC_AND_EXPR:
-    case REDUC_IOR_EXPR:
-    case REDUC_XOR_EXPR:
-      if (!VECTOR_TYPE_P (rhs1_type)
-	  || !useless_type_conversion_p (lhs_type, TREE_TYPE (rhs1_type)))
-        {
-	  error ("reduction should convert from vector to element type");
-	  debug_generic_expr (lhs_type);
-	  debug_generic_expr (rhs1_type);
-	  return true;
-	}
-      return false;
 
     case VEC_UNPACK_HI_EXPR:
     case VEC_UNPACK_LO_EXPR:
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	2017-11-21 16:31:28.695326387 +0000
+++ gcc/tree-inline.c	2017-11-21 16:31:49.727928135 +0000
@@ -3875,12 +3875,6 @@ estimate_operator_cost (enum tree_code c
 
     case REALIGN_LOAD_EXPR:
 
-    case REDUC_MAX_EXPR:
-    case REDUC_MIN_EXPR:
-    case REDUC_PLUS_EXPR:
-    case REDUC_AND_EXPR:
-    case REDUC_IOR_EXPR:
-    case REDUC_XOR_EXPR:
     case WIDEN_SUM_EXPR:
     case WIDEN_MULT_EXPR:
     case DOT_PROD_EXPR:
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	2017-11-21 16:31:28.695326387 +0000
+++ gcc/tree-pretty-print.c	2017-11-21 16:31:49.727928135 +0000
@@ -3252,12 +3252,6 @@ dump_generic_node (pretty_printer *pp, t
       break;
 
     case VEC_DUPLICATE_EXPR:
-    case REDUC_MAX_EXPR:
-    case REDUC_MIN_EXPR:
-    case REDUC_PLUS_EXPR:
-    case REDUC_AND_EXPR:
-    case REDUC_IOR_EXPR:
-    case REDUC_XOR_EXPR:
       pp_space (pp);
       for (str = get_tree_code_name (code); *str; str++)
 	pp_character (pp, TOUPPER (*str));
@@ -3628,9 +3622,6 @@ op_code_prio (enum tree_code code)
     case ABS_EXPR:
     case REALPART_EXPR:
     case IMAGPART_EXPR:
-    case REDUC_MAX_EXPR:
-    case REDUC_MIN_EXPR:
-    case REDUC_PLUS_EXPR:
     case VEC_UNPACK_HI_EXPR:
     case VEC_UNPACK_LO_EXPR:
     case VEC_UNPACK_FLOAT_HI_EXPR:
@@ -3749,9 +3740,6 @@ op_symbol_code (enum tree_code code)
     case PLUS_EXPR:
       return "+";
 
-    case REDUC_PLUS_EXPR:
-      return "r+";
-
     case WIDEN_SUM_EXPR:
       return "w+";
 
Index: gcc/internal-fn.def
===================================================================
--- gcc/internal-fn.def	2017-11-21 16:31:19.983714206 +0000
+++ gcc/internal-fn.def	2017-11-21 16:31:49.726928298 +0000
@@ -30,6 +30,8 @@ along with GCC; see the file COPYING3.
 
      DEF_INTERNAL_FN (NAME, FLAGS, FNSPEC)
      DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE)
+     DEF_INTERNAL_SIGNED_OPTAB_FN (NAME, FLAGS, SELECTOR, SIGNED_OPTAB,
+				   UNSIGNED_OPTAB, TYPE)
      DEF_INTERNAL_COND_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE)
      DEF_INTERNAL_FLT_FN (NAME, FLAGS, OPTAB, TYPE)
      DEF_INTERNAL_INT_FN (NAME, FLAGS, OPTAB, TYPE)
@@ -57,6 +59,12 @@ along with GCC; see the file COPYING3.
 
    - cond_binary: a conditional binary optab, such as add<mode>cc
 
+   DEF_INTERNAL_SIGNED_OPTAB_FN defines an internal function that
+   maps to one of two optabs, depending on the signedness of an input.
+   SIGNED_OPTAB and UNSIGNED_OPTAB are the optabs for signed and
+   unsigned inputs respectively, both without the trailing "_optab".
+   SELECTOR says which type in the tree_pair determines the signedness.
+
    DEF_INTERNAL_COND_OPTAB_FN defines a conditional function COND_<NAME>,
    with optab cond_<OPTAB> and type cond_<TYPE>.  All these functions
    are predicated and take the predicate as the first argument.
@@ -87,6 +95,12 @@ along with GCC; see the file COPYING3.
   DEF_INTERNAL_FN (NAME, FLAGS | ECF_LEAF, NULL)
 #endif
 
+#ifndef DEF_INTERNAL_SIGNED_OPTAB_FN
+#define DEF_INTERNAL_SIGNED_OPTAB_FN(NAME, FLAGS, SELECTOR, SIGNED_OPTAB, \
+				     UNSIGNED_OPTAB, TYPE) \
+  DEF_INTERNAL_FN (NAME, FLAGS | ECF_LEAF, NULL)
+#endif
+
 #define DEF_INTERNAL_COND_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \
   DEF_INTERNAL_OPTAB_FN (COND_##NAME, FLAGS, cond_##OPTAB, cond_##TYPE)
 
@@ -142,6 +156,19 @@ DEF_INTERNAL_COND_OPTAB_FN (XOR, ECF_CON
 
 DEF_INTERNAL_OPTAB_FN (RSQRT, ECF_CONST, rsqrt, unary)
 
+DEF_INTERNAL_OPTAB_FN (REDUC_PLUS, ECF_CONST | ECF_NOTHROW,
+		       reduc_plus_scal, unary)
+DEF_INTERNAL_SIGNED_OPTAB_FN (REDUC_MAX, ECF_CONST | ECF_NOTHROW, first,
+			      reduc_smax_scal, reduc_umax_scal, unary)
+DEF_INTERNAL_SIGNED_OPTAB_FN (REDUC_MIN, ECF_CONST | ECF_NOTHROW, first,
+			      reduc_smin_scal, reduc_umin_scal, unary)
+DEF_INTERNAL_OPTAB_FN (REDUC_AND, ECF_CONST | ECF_NOTHROW,
+		       reduc_and_scal, unary)
+DEF_INTERNAL_OPTAB_FN (REDUC_IOR, ECF_CONST | ECF_NOTHROW,
+		       reduc_ior_scal, unary)
+DEF_INTERNAL_OPTAB_FN (REDUC_XOR, ECF_CONST | ECF_NOTHROW,
+		       reduc_xor_scal, unary)
+
 /* Extract the last active element from a vector.  */
 DEF_INTERNAL_OPTAB_FN (EXTRACT_LAST, ECF_CONST | ECF_NOTHROW,
 		       extract_last, cond_unary)
@@ -290,5 +317,6 @@ DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_FLT_FLOATN_FN
 #undef DEF_INTERNAL_COND_OPTAB_FN
+#undef DEF_INTERNAL_SIGNED_OPTAB_FN
 #undef DEF_INTERNAL_OPTAB_FN
 #undef DEF_INTERNAL_FN
Index: gcc/internal-fn.c
===================================================================
--- gcc/internal-fn.c	2017-11-21 16:31:19.983714206 +0000
+++ gcc/internal-fn.c	2017-11-21 16:31:49.726928298 +0000
@@ -96,6 +96,8 @@ #define fold_extract_direct { 2, 2, fals
 const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct,
 #define DEF_INTERNAL_OPTAB_FN(CODE, FLAGS, OPTAB, TYPE) TYPE##_direct,
+#define DEF_INTERNAL_SIGNED_OPTAB_FN(CODE, FLAGS, SELECTOR, SIGNED_OPTAB, \
+				     UNSIGNED_OPTAB, TYPE) TYPE##_direct,
 #include "internal-fn.def"
   not_direct
 };
@@ -2921,6 +2923,30 @@ #define direct_mask_store_lanes_optab_su
 #define direct_while_optab_supported_p convert_optab_supported_p
 #define direct_fold_extract_optab_supported_p direct_optab_supported_p
 
+/* Return the optab used by internal function FN.  */
+
+static optab
+direct_internal_fn_optab (internal_fn fn, tree_pair types)
+{
+  switch (fn)
+    {
+#define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \
+    case IFN_##CODE: break;
+#define DEF_INTERNAL_OPTAB_FN(CODE, FLAGS, OPTAB, TYPE) \
+    case IFN_##CODE: return OPTAB##_optab;
+#define DEF_INTERNAL_SIGNED_OPTAB_FN(CODE, FLAGS, SELECTOR, SIGNED_OPTAB, \
+				     UNSIGNED_OPTAB, TYPE)		\
+    case IFN_##CODE: return (TYPE_UNSIGNED (types.SELECTOR)		\
+			     ? UNSIGNED_OPTAB ## _optab			\
+			     : SIGNED_OPTAB ## _optab);
+#include "internal-fn.def"
+
+    case IFN_LAST:
+      break;
+    }
+  gcc_unreachable ();
+}
+
 /* Return true if FN is supported for the types in TYPES when the
    optimization type is OPT_TYPE.  The types are those associated with
    the "type0" and "type1" fields of FN's direct_internal_fn_info
@@ -2938,6 +2964,16 @@ #define DEF_INTERNAL_OPTAB_FN(CODE, FLAG
     case IFN_##CODE: \
       return direct_##TYPE##_optab_supported_p (OPTAB##_optab, types, \
 						opt_type);
+#define DEF_INTERNAL_SIGNED_OPTAB_FN(CODE, FLAGS, SELECTOR, SIGNED_OPTAB, \
+				     UNSIGNED_OPTAB, TYPE)		\
+    case IFN_##CODE:							\
+      {									\
+	optab which_optab = (TYPE_UNSIGNED (types.SELECTOR)		\
+			     ? UNSIGNED_OPTAB ## _optab			\
+			     : SIGNED_OPTAB ## _optab);			\
+	return direct_##TYPE##_optab_supported_p (which_optab, types,	\
+						  opt_type);		\
+      }
 #include "internal-fn.def"
 
     case IFN_LAST:
@@ -2977,6 +3013,15 @@ #define DEF_INTERNAL_OPTAB_FN(CODE, FLAG
   {							\
     expand_##TYPE##_optab_fn (fn, stmt, OPTAB##_optab);	\
   }
+#define DEF_INTERNAL_SIGNED_OPTAB_FN(CODE, FLAGS, SELECTOR, SIGNED_OPTAB, \
+				     UNSIGNED_OPTAB, TYPE)		\
+  static void								\
+  expand_##CODE (internal_fn fn, gcall *stmt)				\
+  {									\
+    tree_pair types = direct_internal_fn_types (fn, stmt);		\
+    optab which_optab = direct_internal_fn_optab (fn, types);		\
+    expand_##TYPE##_optab_fn (fn, stmt, which_optab);			\
+  }
 #include "internal-fn.def"
 
 /* Routines to expand each internal function, indexed by function number.
Index: gcc/fold-const-call.c
===================================================================
--- gcc/fold-const-call.c	2017-11-01 08:07:13.156996103 +0000
+++ gcc/fold-const-call.c	2017-11-21 16:31:49.724928624 +0000
@@ -583,6 +583,25 @@ fold_const_builtin_nan (tree type, tree
   return NULL_TREE;
 }
 
+/* Fold a call to IFN_REDUC_<CODE> (ARG), returning a value of type TYPE.  */
+
+static tree
+fold_const_reduction (tree type, tree arg, tree_code code)
+{
+  if (TREE_CODE (arg) != VECTOR_CST)
+    return NULL_TREE;
+
+  tree res = VECTOR_CST_ELT (arg, 0);
+  unsigned int nelts = VECTOR_CST_NELTS (arg);
+  for (unsigned int i = 1; i < nelts; i++)
+    {
+      res = const_binop (code, type, res, VECTOR_CST_ELT (arg, i));
+      if (res == NULL_TREE || !CONSTANT_CLASS_P (res))
+	return NULL_TREE;
+    }
+  return res;
+}
+
 /* Try to evaluate:
 
       *RESULT = FN (*ARG)
@@ -1148,6 +1167,24 @@ fold_const_call (combined_fn fn, tree ty
     CASE_FLT_FN_FLOATN_NX (CFN_BUILT_IN_NANS):
       return fold_const_builtin_nan (type, arg, false);
 
+    case CFN_REDUC_PLUS:
+      return fold_const_reduction (type, arg, PLUS_EXPR);
+
+    case CFN_REDUC_MAX:
+      return fold_const_reduction (type, arg, MAX_EXPR);
+
+    case CFN_REDUC_MIN:
+      return fold_const_reduction (type, arg, MIN_EXPR);
+
+    case CFN_REDUC_AND:
+      return fold_const_reduction (type, arg, BIT_AND_EXPR);
+
+    case CFN_REDUC_IOR:
+      return fold_const_reduction (type, arg, BIT_IOR_EXPR);
+
+    case CFN_REDUC_XOR:
+      return fold_const_reduction (type, arg, BIT_XOR_EXPR);
+
     default:
       return fold_const_call_1 (fn, type, arg);
     }
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2017-11-21 16:31:28.695326387 +0000
+++ gcc/tree-vect-loop.c	2017-11-21 16:31:49.728927972 +0000
@@ -2574,52 +2574,51 @@ vect_analyze_loop (struct loop *loop, lo
 }
 
 
-/* Function reduction_code_for_scalar_code
+/* Function reduction_fn_for_scalar_code
 
    Input:
    CODE - tree_code of a reduction operations.
 
    Output:
-   REDUC_CODE - the corresponding tree-code to be used to reduce the
-      vector of partial results into a single scalar result, or ERROR_MARK
+   REDUC_FN - the corresponding internal function to be used to reduce the
+      vector of partial results into a single scalar result, or IFN_LAST
       if the operation is a supported reduction operation, but does not have
-      such a tree-code.
+      such an internal function.
 
    Return FALSE if CODE currently cannot be vectorized as reduction.  */
 
 static bool
-reduction_code_for_scalar_code (enum tree_code code,
-                                enum tree_code *reduc_code)
+reduction_fn_for_scalar_code (enum tree_code code, internal_fn *reduc_fn)
 {
   switch (code)
     {
       case MAX_EXPR:
-        *reduc_code = REDUC_MAX_EXPR;
+        *reduc_fn = IFN_REDUC_MAX;
         return true;
 
       case MIN_EXPR:
-        *reduc_code = REDUC_MIN_EXPR;
+        *reduc_fn = IFN_REDUC_MIN;
         return true;
 
       case PLUS_EXPR:
-        *reduc_code = REDUC_PLUS_EXPR;
+        *reduc_fn = IFN_REDUC_PLUS;
         return true;
 
       case BIT_AND_EXPR:
-	*reduc_code = REDUC_AND_EXPR;
+	*reduc_fn = IFN_REDUC_AND;
 	return true;
 
       case BIT_IOR_EXPR:
-	*reduc_code = REDUC_IOR_EXPR;
+	*reduc_fn = IFN_REDUC_IOR;
 	return true;
 
       case BIT_XOR_EXPR:
-	*reduc_code = REDUC_XOR_EXPR;
+	*reduc_fn = IFN_REDUC_XOR;
 	return true;
 
       case MULT_EXPR:
       case MINUS_EXPR:
-        *reduc_code = ERROR_MARK;
+        *reduc_fn = IFN_LAST;
         return true;
 
       default:
@@ -4029,7 +4028,7 @@ have_whole_vector_shift (machine_mode mo
    the loop, and the epilogue code that must be generated.  */
 
 static void
-vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code reduc_code,
+vect_model_reduction_cost (stmt_vec_info stmt_info, internal_fn reduc_fn,
 			   int ncopies)
 {
   int prologue_cost = 0, epilogue_cost = 0, inside_cost;
@@ -4097,7 +4096,7 @@ vect_model_reduction_cost (stmt_vec_info
 
   if (!loop || !nested_in_vect_loop_p (loop, orig_stmt))
     {
-      if (reduc_code != ERROR_MARK)
+      if (reduc_fn != IFN_LAST)
 	{
 	  if (reduction_type == COND_REDUCTION)
 	    {
@@ -4581,7 +4580,7 @@ get_initial_defs_for_reduction (slp_tree
      we have to generate more than one vector stmt - i.e - we need to "unroll"
      the vector stmt by a factor VF/nunits.  For more details see documentation
      in vectorizable_operation.
-   REDUC_CODE is the tree-code for the epilog reduction.
+   REDUC_FN is the internal function for the epilog reduction.
    REDUCTION_PHIS is a list of the phi-nodes that carry the reduction 
      computation.
    REDUC_INDEX is the index of the operand in the right hand side of the 
@@ -4599,7 +4598,7 @@ get_initial_defs_for_reduction (slp_tree
       The loop-latch argument is taken from VECT_DEFS - the vector of partial 
       sums.
    2. "Reduces" each vector of partial results VECT_DEFS into a single result,
-      by applying the operation specified by REDUC_CODE if available, or by 
+      by calling the function specified by REDUC_FN if available, or by
       other means (whole-vector shifts or a scalar loop).
       The function also creates a new phi node at the loop exit to preserve
       loop-closed form, as illustrated below.
@@ -4634,7 +4633,7 @@ get_initial_defs_for_reduction (slp_tree
 static void
 vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple *stmt,
 				  gimple *reduc_def_stmt,
-				  int ncopies, enum tree_code reduc_code,
+				  int ncopies, internal_fn reduc_fn,
 				  vec<gimple *> reduction_phis,
                                   bool double_reduc, 
 				  slp_tree slp_node,
@@ -4885,7 +4884,7 @@ vect_create_epilog_for_reduction (vec<tr
         step 3: adjust the scalar result (s_out3) if needed.
 
         Step 1 can be accomplished using one the following three schemes:
-          (scheme 1) using reduc_code, if available.
+          (scheme 1) using reduc_fn, if available.
           (scheme 2) using whole-vector shifts, if available.
           (scheme 3) using a scalar loop. In this case steps 1+2 above are
                      combined.
@@ -4965,7 +4964,7 @@ vect_create_epilog_for_reduction (vec<tr
   exit_gsi = gsi_after_labels (exit_bb);
 
   /* 2.2 Get the relevant tree-code to use in the epilog for schemes 2,3
-         (i.e. when reduc_code is not available) and in the final adjustment
+         (i.e. when reduc_fn is not available) and in the final adjustment
 	 code (if needed).  Also get the original scalar reduction variable as
          defined in the loop.  In case STMT is a "pattern-stmt" (i.e. - it
          represents a reduction pattern), the tree-code and scalar-def are
@@ -5017,7 +5016,7 @@ vect_create_epilog_for_reduction (vec<tr
 
   /* True if we should implement SLP_REDUC using native reduction operations
      instead of scalar operations.  */
-  direct_slp_reduc = (reduc_code != ERROR_MARK
+  direct_slp_reduc = (reduc_fn != IFN_LAST
 		      && slp_reduc
 		      && !TYPE_VECTOR_SUBPARTS (vectype).is_constant ());
 
@@ -5077,7 +5076,7 @@ vect_create_epilog_for_reduction (vec<tr
     new_phi_result = PHI_RESULT (new_phis[0]);
 
   if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
-      && reduc_code != ERROR_MARK)
+      && reduc_fn != IFN_LAST)
     {
       /* For condition reductions, we have a vector (NEW_PHI_RESULT) containing
 	 various data values where the condition matched and another vector
@@ -5115,8 +5114,9 @@ vect_create_epilog_for_reduction (vec<tr
 
       /* Find maximum value from the vector of found indexes.  */
       tree max_index = make_ssa_name (index_scalar_type);
-      gimple *max_index_stmt = gimple_build_assign (max_index, REDUC_MAX_EXPR,
-						    induction_index);
+      gcall *max_index_stmt = gimple_build_call_internal (IFN_REDUC_MAX,
+							  1, induction_index);
+      gimple_call_set_lhs (max_index_stmt, max_index);
       gsi_insert_before (&exit_gsi, max_index_stmt, GSI_SAME_STMT);
 
       /* Vector of {max_index, max_index, max_index,...}.  */
@@ -5171,13 +5171,9 @@ vect_create_epilog_for_reduction (vec<tr
 
       /* Reduce down to a scalar value.  */
       tree data_reduc = make_ssa_name (scalar_type_unsigned);
-      optab ot = optab_for_tree_code (REDUC_MAX_EXPR, vectype_unsigned,
-				      optab_default);
-      gcc_assert (optab_handler (ot, TYPE_MODE (vectype_unsigned))
-		  != CODE_FOR_nothing);
-      gimple *data_reduc_stmt = gimple_build_assign (data_reduc,
-						     REDUC_MAX_EXPR,
-						     vec_cond_cast);
+      gcall *data_reduc_stmt = gimple_build_call_internal (IFN_REDUC_MAX,
+							   1, vec_cond_cast);
+      gimple_call_set_lhs (data_reduc_stmt, data_reduc);
       gsi_insert_before (&exit_gsi, data_reduc_stmt, GSI_SAME_STMT);
 
       /* Convert the reduced value back to the result type and set as the
@@ -5189,9 +5185,9 @@ vect_create_epilog_for_reduction (vec<tr
       scalar_results.safe_push (new_temp);
     }
   else if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
-	   && reduc_code == ERROR_MARK)
+	   && reduc_fn == IFN_LAST)
     {
-      /* Condition redution without supported REDUC_MAX_EXPR.  Generate
+      /* Condition redution without supported IFN_REDUC_MAX.  Generate
 	 idx = 0;
          idx_val = induction_index[0];
 	 val = data_reduc[0];
@@ -5264,7 +5260,7 @@ vect_create_epilog_for_reduction (vec<tr
   /* 2.3 Create the reduction code, using one of the three schemes described
          above. In SLP we simply need to extract all the elements from the 
          vector (without reducing them), so we use scalar shifts.  */
-  else if (reduc_code != ERROR_MARK && !slp_reduc)
+  else if (reduc_fn != IFN_LAST && !slp_reduc)
     {
       tree tmp;
       tree vec_elem_type;
@@ -5279,22 +5275,27 @@ vect_create_epilog_for_reduction (vec<tr
       vec_elem_type = TREE_TYPE (TREE_TYPE (new_phi_result));
       if (!useless_type_conversion_p (scalar_type, vec_elem_type))
 	{
-          tree tmp_dest =
-	      vect_create_destination_var (scalar_dest, vec_elem_type);
-	  tmp = build1 (reduc_code, vec_elem_type, new_phi_result);
-	  epilog_stmt = gimple_build_assign (tmp_dest, tmp);
+	  tree tmp_dest
+	    = vect_create_destination_var (scalar_dest, vec_elem_type);
+	  epilog_stmt = gimple_build_call_internal (reduc_fn, 1,
+						    new_phi_result);
+	  gimple_set_lhs (epilog_stmt, tmp_dest);
 	  new_temp = make_ssa_name (tmp_dest, epilog_stmt);
-	  gimple_assign_set_lhs (epilog_stmt, new_temp);
+	  gimple_set_lhs (epilog_stmt, new_temp);
 	  gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
 
-	  tmp = build1 (NOP_EXPR, scalar_type, new_temp);
+	  epilog_stmt = gimple_build_assign (new_scalar_dest, NOP_EXPR,
+					     new_temp);
 	}
       else
-	tmp = build1 (reduc_code, scalar_type, new_phi_result);
+	{
+	  epilog_stmt = gimple_build_call_internal (reduc_fn, 1,
+						    new_phi_result);
+	  gimple_set_lhs (epilog_stmt, new_scalar_dest);
+	}
 
-      epilog_stmt = gimple_build_assign (new_scalar_dest, tmp);
       new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
-      gimple_assign_set_lhs (epilog_stmt, new_temp);
+      gimple_set_lhs (epilog_stmt, new_temp);
       gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
 
       if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
@@ -5383,8 +5384,10 @@ vect_create_epilog_for_reduction (vec<tr
 				   sel, new_phi_result, vector_identity);
 
 	  /* Do the reduction and convert it to the appropriate type.  */
-	  tree scalar = gimple_build (&seq, reduc_code,
-				      TREE_TYPE (vectype), vec);
+	  gcall *call = gimple_build_call_internal (reduc_fn, 1, vec);
+	  tree scalar = make_ssa_name (TREE_TYPE (vectype));
+	  gimple_call_set_lhs (call, scalar);
+	  gimple_seq_add_stmt (&seq, call);
 	  scalar = gimple_convert (&seq, scalar_type, scalar);
 	  scalar_results.safe_push (scalar);
 	}
@@ -5992,10 +5995,11 @@ vectorizable_reduction (gimple *stmt, gi
   tree vectype_in = NULL_TREE;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  enum tree_code code, orig_code, epilog_reduc_code;
+  enum tree_code code, orig_code;
+  internal_fn reduc_fn;
   machine_mode vec_mode;
   int op_type;
-  optab optab, reduc_optab;
+  optab optab;
   tree new_temp = NULL_TREE;
   gimple *def_stmt;
   enum vect_def_type dt, cond_reduc_dt = vect_unknown_def_type;
@@ -6552,31 +6556,23 @@ vectorizable_reduction (gimple *stmt, gi
         double_reduc = true;
     }
 
-  epilog_reduc_code = ERROR_MARK;
+  reduc_fn = IFN_LAST;
 
   if (reduction_type == TREE_CODE_REDUCTION
       || reduction_type == INTEGER_INDUC_COND_REDUCTION
       || reduction_type == CONST_COND_REDUCTION)
     {
-      if (reduction_code_for_scalar_code (orig_code, &epilog_reduc_code))
+      if (reduction_fn_for_scalar_code (orig_code, &reduc_fn))
 	{
-	  reduc_optab = optab_for_tree_code (epilog_reduc_code, vectype_out,
-                                         optab_default);
-	  if (!reduc_optab)
-	    {
-	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-				 "no optab for reduction.\n");
-
-	      epilog_reduc_code = ERROR_MARK;
-	    }
-	  else if (optab_handler (reduc_optab, vec_mode) == CODE_FOR_nothing)
+	  if (reduc_fn != IFN_LAST
+	      && !direct_internal_fn_supported_p (reduc_fn, vectype_out,
+						  OPTIMIZE_FOR_SPEED))
 	    {
 	      if (dump_enabled_p ())
 		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 				 "reduc op not supported by target.\n");
 
-	      epilog_reduc_code = ERROR_MARK;
+	      reduc_fn = IFN_LAST;
 	    }
 	}
       else
@@ -6599,15 +6595,13 @@ vectorizable_reduction (gimple *stmt, gi
       cr_index_vector_type = build_vector_type (cr_index_scalar_type,
 						nunits_out);
 
-      optab = optab_for_tree_code (REDUC_MAX_EXPR, cr_index_vector_type,
-				   optab_default);
-      if (optab_handler (optab, TYPE_MODE (cr_index_vector_type))
-	  != CODE_FOR_nothing)
-	epilog_reduc_code = REDUC_MAX_EXPR;
+      if (direct_internal_fn_supported_p (IFN_REDUC_MAX, cr_index_vector_type,
+					  OPTIMIZE_FOR_SPEED))
+	reduc_fn = IFN_REDUC_MAX;
     }
 
   if (reduction_type != EXTRACT_LAST_REDUCTION
-      && epilog_reduc_code == ERROR_MARK
+      && reduc_fn == IFN_LAST
       && !nunits_out.is_constant ())
     {
       if (dump_enabled_p ())
@@ -6804,7 +6798,7 @@ vectorizable_reduction (gimple *stmt, gi
   if (!vec_stmt) /* transformation not required.  */
     {
       if (first_p)
-	vect_model_reduction_cost (stmt_info, epilog_reduc_code, ncopies);
+	vect_model_reduction_cost (stmt_info, reduc_fn, ncopies);
       if (loop_vinfo && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo))
 	{
 	  if (cond_fn == IFN_LAST
@@ -7008,8 +7002,7 @@ vectorizable_reduction (gimple *stmt, gi
     vect_defs[0] = gimple_get_lhs (*vec_stmt);
 
   vect_create_epilog_for_reduction (vect_defs, stmt, reduc_def_stmt,
-				    epilog_copies,
-                                    epilog_reduc_code, phis,
+				    epilog_copies, reduc_fn, phis,
 				    double_reduc, slp_node, slp_node_instance,
 				    neutral_op);
 
Index: gcc/config/aarch64/aarch64-builtins.c
===================================================================
--- gcc/config/aarch64/aarch64-builtins.c	2017-11-21 16:30:57.913175994 +0000
+++ gcc/config/aarch64/aarch64-builtins.c	2017-11-21 16:31:49.722928949 +0000
@@ -1601,24 +1601,27 @@ aarch64_gimple_fold_builtin (gimple_stmt
 			? gimple_call_arg_ptr (stmt, 0)
 			: &error_mark_node);
 
-	  /* We use gimple's REDUC_(PLUS|MIN|MAX)_EXPRs for float, signed int
+	  /* We use gimple's IFN_REDUC_(PLUS|MIN|MAX)s for float, signed int
 	     and unsigned int; it will distinguish according to the types of
 	     the arguments to the __builtin.  */
 	  switch (fcode)
 	    {
 	      BUILTIN_VALL (UNOP, reduc_plus_scal_, 10)
-	        new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
-						REDUC_PLUS_EXPR, args[0]);
+	        new_stmt = gimple_build_call_internal (IFN_REDUC_PLUS,
+						       1, args[0]);
+		gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
 		break;
 	      BUILTIN_VDQIF (UNOP, reduc_smax_scal_, 10)
 	      BUILTIN_VDQ_BHSI (UNOPU, reduc_umax_scal_, 10)
-		new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
-						REDUC_MAX_EXPR, args[0]);
+	        new_stmt = gimple_build_call_internal (IFN_REDUC_MAX,
+						       1, args[0]);
+		gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
 		break;
 	      BUILTIN_VDQIF (UNOP, reduc_smin_scal_, 10)
 	      BUILTIN_VDQ_BHSI (UNOPU, reduc_umin_scal_, 10)
-		new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
-						REDUC_MIN_EXPR, args[0]);
+	        new_stmt = gimple_build_call_internal (IFN_REDUC_MIN,
+						       1, args[0]);
+		gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
 		break;
 	      BUILTIN_GPF (BINOP, fmulx, 0)
 		{
Index: gcc/config/aarch64/aarch64-simd.md
===================================================================
--- gcc/config/aarch64/aarch64-simd.md	2017-11-21 16:30:57.914175832 +0000
+++ gcc/config/aarch64/aarch64-simd.md	2017-11-21 16:31:49.723928786 +0000
@@ -2338,7 +2338,7 @@ (define_insn "popcount<mode>2"
 ;; 'across lanes' max and min ops.
 
 ;; Template for outputting a scalar, so we can create __builtins which can be
-;; gimple_fold'd to the REDUC_(MAX|MIN)_EXPR tree code.  (This is FP smax/smin).
+;; gimple_fold'd to the IFN_REDUC_(MAX|MIN) function.  (This is FP smax/smin).
 (define_expand "reduc_<maxmin_uns>_scal_<mode>"
   [(match_operand:<VEL> 0 "register_operand")
    (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")]
References:
- Add support for in-order addition reduction using SVE FADDA
  - From: Richard Sandiford
- Re: Add support for in-order addition reduction using SVE FADDA
  - From: Richard Biener
- Re: Add support for in-order addition reduction using SVE FADDA
  - From: Richard Sandiford
- Re: Add support for in-order addition reduction using SVE FADDA
  - From: Richard Biener
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]