This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC][Scalar masks 1/x] Introduce GEN_MASK_EXPR.


Hi,

This patch starts a series introducing scalar masks support in the vectorizer.  It was discussed on the recent Cauldron and changes overiew is available here: https://gcc.gnu.org/wiki/cauldron2015?action=AttachFile&do=view&target=Vectorization+for+Intel+AVX-512.pdf.  Here is shortly a list of changes introduced by this series:

 - Add new tree expr to produce scalar masks in a vectorized code
 - Fix-up if-conversion to use bool predicates instead of integer masks
 - Disable some bool patterns to avoid bool to int conversion where masks can be used
 - Support bool operands in vectorization factor computation
 - Support scalar masks in MASK_LOAD, MASK_STORE and VEC_COND_EXPR by adding new optabs
 - Support vectorization for statements which are now not transformed by bool patterns
 - Add target support (hooks, optabs, expands)

This patch introduces GEN_MASK_EXPR code.  Intitially I wanted to use a comparison as an operand for it directly mapping it into AVX-512 comparison instruction.  But a feedback was to simplify new code's semantics and use it for converting vectors into scalar masks.  Therefore if we want to compare two vectors into a scalar masks we use two statements:

  vect.18_87 = vect__5.13_81 > vect__6.16_86;
  mask__ifc__23.17_88 = GEN_MASK <vect.18_87>;
 
Trying it in practice I found it producing worse code. The problem is that on target first comparison is expanded into two instructions: cmp with mask result + masked move to get a vector. GEN_MASK is then expanded into another comparison with zero vector.  Thus I get two comparisons + move instead of a single comparison and have to optimize this out on a target side (current optimizers can't handle it).  That's actually what I wanted to avoid.  For now I changed GEN_MASK_EXPR to get a vector value as an operand but didn't change expand pattern which has four opernads: two vectors to compare + cmp operator + result.  On expand I try to detect GEN_MASK uses a result of comparison and thus avoid double comparison generation.

Patch series is not actually fully finished yet.  I still have several type conversion tests not being vectorized and it wasn't widely tested.  That's what I'm working on now.

Will be glad to any comments.

Thanks,
Ilya
--
2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* expr.c (expand_expr_real_2): Support GEN_MASK_EXPR.
	* gimple-pretty-print.c (dump_unary_rhs): Likewise.
	* gimple.c (get_gimple_rhs_num_ops): Likewise.
	* optabs.c: Include gimple.h.
	(vector_compare_rtx): Add OPNO arg.
	(get_gen_mask_icode): New.
	(expand_gen_mask_expr_p): New.
	(expand_gen_mask_expr): New.
	(expand_vec_cond_expr): Adjust vector_compare_rtx call.
	* optabs.def (gen_mask_optab): New.
	(gen_masku_optab): New.
	* optabs.h (expand_gen_mask_expr_p): New.
	(expand_gen_mask_expr): New.
	* tree-cfg.c (verify_gimple_assign_unary): Support GEN_MASK_EXPR.
	* tree-inline.c (estimate_operator_cost): Likewise.
	* tree-pretty-print.c (dump_generic_node): Likewise.
	* tree-ssa-operands.c (get_expr_operands): Likewise.
	* tree.def (GEN_MASK_EXPR): New.


diff --git a/gcc/expr.c b/gcc/expr.c
index 31b4573..8af5926 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9180,6 +9180,10 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	return temp;
       }
 
+    case GEN_MASK_EXPR:
+      target = expand_gen_mask_expr (type, treeop0, target);
+      return target;
+
     case VEC_COND_EXPR:
       target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
       return target;
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 53900dd..ac25b79 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -300,6 +300,12 @@ dump_unary_rhs (pretty_printer *buffer, gassign *gs, int spc, int flags)
       pp_greater (buffer);
       break;
 
+    case GEN_MASK_EXPR:
+      pp_string (buffer, "GEN_MASK <");
+      dump_generic_node (buffer, rhs, spc, flags, false);
+      pp_greater (buffer);
+      break;
+
     default:
       if (TREE_CODE_CLASS (rhs_code) == tcc_declaration
 	  || TREE_CODE_CLASS (rhs_code) == tcc_constant
diff --git a/gcc/gimple.c b/gcc/gimple.c
index cca328a..93caf01 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -2005,7 +2005,8 @@ get_gimple_rhs_num_ops (enum tree_code code)
    : ((SYM) == TRUTH_AND_EXPR						    \
       || (SYM) == TRUTH_OR_EXPR						    \
       || (SYM) == TRUTH_XOR_EXPR) ? GIMPLE_BINARY_RHS			    \
-   : (SYM) == TRUTH_NOT_EXPR ? GIMPLE_UNARY_RHS				    \
+   : ((SYM) == TRUTH_NOT_EXPR						    \
+      || (SYM) == GEN_MASK_EXPR) ? GIMPLE_UNARY_RHS			    \
    : ((SYM) == COND_EXPR						    \
       || (SYM) == WIDEN_MULT_PLUS_EXPR					    \
       || (SYM) == WIDEN_MULT_MINUS_EXPR					    \
diff --git a/gcc/optabs.c b/gcc/optabs.c
index a6ca706..bf466ca 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "recog.h"
 #include "reload.h"
 #include "target.h"
+#include "gimple.h"
 
 struct target_optabs default_target_optabs;
 struct target_libfuncs default_target_libfuncs;
@@ -6488,11 +6489,13 @@ get_rtx_code (enum tree_code tcode, bool unsignedp)
 }
 
 /* Return comparison rtx for COND. Use UNSIGNEDP to select signed or
-   unsigned operators. Do not generate compare instruction.  */
+   unsigned operators.  OPNO holds an index of the first comparison
+   operand in insn with code ICODE.  Do not generate compare instruction.  */
 
 static rtx
 vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
-		    bool unsignedp, enum insn_code icode)
+		    bool unsignedp, enum insn_code icode,
+		    unsigned int opno)
 {
   struct expand_operand ops[2];
   rtx rtx_op0, rtx_op1;
@@ -6518,7 +6521,7 @@ vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
 
   create_input_operand (&ops[0], rtx_op0, m0);
   create_input_operand (&ops[1], rtx_op1, m1);
-  if (!maybe_legitimize_operands (icode, 4, 2, ops))
+  if (!maybe_legitimize_operands (icode, opno, 2, ops))
     gcc_unreachable ();
   return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
 }
@@ -6789,6 +6792,77 @@ expand_vec_perm (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
   return tmp;
 }
 
+/* Return insn code for a comparison operator with MODE,
+   unsigned if UNS is true.  */
+
+static inline enum insn_code
+get_gen_mask_icode (machine_mode mode, bool uns)
+{
+  optab tab = uns ? gen_masku_optab : gen_mask_optab;
+  return optab_handler (tab, mode);
+}
+
+/* Return TRUE if appropriate vector insn is available
+   for vector comparison expr with vector type VALUE_TYPE.  */
+
+bool
+expand_gen_mask_expr_p (tree value_type)
+{
+  enum insn_code icode = get_gen_mask_icode (TYPE_MODE (value_type),
+					     TYPE_UNSIGNED (value_type));
+  return (icode != CODE_FOR_nothing);
+}
+
+/* Generate insns for a GEN_MASK_EXPR, given its TYPE and operand.  */
+
+rtx
+expand_gen_mask_expr (tree type, tree op0, rtx target)
+{
+  struct expand_operand ops[4];
+  enum insn_code icode;
+  rtx comparison;
+  machine_mode mode = TYPE_MODE (type);
+  machine_mode cmp_op_mode;
+  bool unsignedp;
+  tree op0a, op0b;
+  enum tree_code tcode;
+  gimple def_stmt;
+
+  /* Avoid double comparison.  */
+  if (TREE_CODE (op0) == SSA_NAME
+      && (def_stmt = SSA_NAME_DEF_STMT (op0))
+      && is_gimple_assign (def_stmt)
+      && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt)) == tcc_comparison)
+    {
+      op0a = gimple_assign_rhs1 (def_stmt);
+      op0b = gimple_assign_rhs2 (def_stmt);
+      tcode = gimple_assign_rhs_code (def_stmt);
+    }
+  else
+    {
+      op0a = op0;
+      op0b = build_zero_cst (TREE_TYPE (op0));
+      tcode = NE_EXPR;
+    }
+
+  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
+  cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
+
+  gcc_assert (GET_MODE_BITSIZE (mode) >= GET_MODE_NUNITS (cmp_op_mode));
+
+  icode = get_gen_mask_icode (cmp_op_mode, unsignedp);
+  if (icode == CODE_FOR_nothing)
+    return 0;
+
+  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 2);
+  create_output_operand (&ops[0], target, mode);
+  create_fixed_operand (&ops[1], comparison);
+  create_fixed_operand (&ops[2], XEXP (comparison, 0));
+  create_fixed_operand (&ops[3], XEXP (comparison, 1));
+  expand_insn (icode, 4, ops);
+  return ops[0].value;
+}
+
 /* Return insn code for a conditional operator with a comparison in
    mode CMODE, unsigned if UNS is true, resulting in a value of mode VMODE.  */
 
@@ -6861,7 +6935,7 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
   if (icode == CODE_FOR_nothing)
     return 0;
 
-  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode);
+  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 4);
   rtx_op1 = expand_normal (op1);
   rtx_op2 = expand_normal (op2);
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 888b21c..dc664d1 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -266,6 +266,8 @@ OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
 OPTAB_D (maskload_optab, "maskload$a")
 OPTAB_D (maskstore_optab, "maskstore$a")
+OPTAB_D (gen_mask_optab, "gen_mask$a")
+OPTAB_D (gen_masku_optab, "gen_masku$a")
 OPTAB_D (vec_extract_optab, "vec_extract$a")
 OPTAB_D (vec_init_optab, "vec_init$a")
 OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 95f5cbc..d7b7fb0 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -495,6 +495,12 @@ extern bool can_vec_perm_p (machine_mode, bool, const unsigned char *);
 /* Generate code for VEC_PERM_EXPR.  */
 extern rtx expand_vec_perm (machine_mode, rtx, rtx, rtx, rtx);
 
+/* Return tree if target supports vector operations for comparison.  */
+extern bool expand_gen_mask_expr_p (tree);
+
+/* Generate code for GEN_MASK_EXPR.  */
+extern rtx expand_gen_mask_expr (tree, tree, rtx);
+
 /* Return tree if target supports vector operations for COND_EXPR.  */
 bool expand_vec_cond_expr_p (tree, tree);
 
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 588ab69..5bae411 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3646,6 +3646,16 @@ verify_gimple_assign_unary (gassign *stmt)
     case CONJ_EXPR:
       break;
 
+    case GEN_MASK_EXPR:
+	if (!INTEGRAL_TYPE_P (lhs_type) || !VECTOR_TYPE_P (rhs1_type))
+	  {
+	    error ("invalid types in gen_mask");
+	    debug_generic_expr (lhs_type);
+	    debug_generic_expr (rhs1_type);
+	    return true;
+	  }
+	return false;
+
     default:
       gcc_unreachable ();
     }
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index e1ceea4..052c055 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -3850,6 +3850,7 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights,
     case COMPLEX_EXPR:
     case PAREN_EXPR:
     case VIEW_CONVERT_EXPR:
+    case GEN_MASK_EXPR:
       return 0;
 
     /* Assign cost of 1 to usual operations.
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 7cd1fe7..5b6cdf0 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -2485,6 +2485,12 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, int flags,
       pp_greater (pp);
       break;
 
+    case GEN_MASK_EXPR:
+      pp_string (pp, " GEN_MASK_EXPR < ");
+      dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (pp, " > ");
+      break;
+
     case VEC_COND_EXPR:
       pp_string (pp, " VEC_COND_EXPR < ");
       dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
diff --git a/gcc/tree-ssa-operands.c b/gcc/tree-ssa-operands.c
index b1e3f99..7151ad1 100644
--- a/gcc/tree-ssa-operands.c
+++ b/gcc/tree-ssa-operands.c
@@ -837,6 +837,7 @@ get_expr_operands (struct function *fn, gimple stmt, tree *expr_p, int flags)
 	gimple_set_has_volatile_ops (stmt, true);
       /* FALLTHRU */
 
+    case GEN_MASK_EXPR:
     case VIEW_CONVERT_EXPR:
     do_unary:
       get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 0), flags);
diff --git a/gcc/tree.def b/gcc/tree.def
index 56580af..42984ca 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -536,6 +536,10 @@ DEFTREECODE (TARGET_EXPR, "target_expr", tcc_expression, 4)
    1 and 2 are NULL.  The operands are then taken from the cfg edges. */
 DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3)
 
+/* Generate a scalar bitmask for a vector value.  Mask gets bits set
+   for nonzero vector elements.  */
+DEFTREECODE (GEN_MASK_EXPR, "gen_mask_expr", tcc_expression, 1)
+
 /* Vector conditional expression. It is like COND_EXPR, but with
    vector operands.
 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]