[RFC][Scalar masks 1/x] Introduce GEN_MASK_EXPR.

Richard Biener richard.guenther@gmail.com
Tue Aug 18 10:47:00 GMT 2015


On Mon, Aug 17, 2015 at 6:22 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> Hi,
>
> This patch starts a series introducing scalar masks support in the vectorizer.  It was discussed on the recent Cauldron and changes overiew is available here: https://gcc.gnu.org/wiki/cauldron2015?action=AttachFile&do=view&target=Vectorization+for+Intel+AVX-512.pdf.  Here is shortly a list of changes introduced by this series:
>
>  - Add new tree expr to produce scalar masks in a vectorized code
>  - Fix-up if-conversion to use bool predicates instead of integer masks
>  - Disable some bool patterns to avoid bool to int conversion where masks can be used
>  - Support bool operands in vectorization factor computation
>  - Support scalar masks in MASK_LOAD, MASK_STORE and VEC_COND_EXPR by adding new optabs
>  - Support vectorization for statements which are now not transformed by bool patterns
>  - Add target support (hooks, optabs, expands)
>
> This patch introduces GEN_MASK_EXPR code.  Intitially I wanted to use a comparison as an operand for it directly mapping it into AVX-512 comparison instruction.  But a feedback was to simplify new code's semantics and use it for converting vectors into scalar masks.  Therefore if we want to compare two vectors into a scalar masks we use two statements:
>
>   vect.18_87 = vect__5.13_81 > vect__6.16_86;
>   mask__ifc__23.17_88 = GEN_MASK <vect.18_87>;
>
> Trying it in practice I found it producing worse code. The problem is that on target first comparison is expanded into two instructions: cmp with mask result + masked move to get a vector. GEN_MASK is then expanded into another comparison with zero vector.  Thus I get two comparisons + move instead of a single comparison and have to optimize this out on a target side (current optimizers can't handle it).  That's actually what I wanted to avoid.  For now I changed GEN_MASK_EXPR to get a vector value as an operand but didn't change expand pattern which has four opernads: two vectors to compare + cmp operator + result.  On expand I try to detect GEN_MASK uses a result of comparison and thus avoid double comparison generation.
>
> Patch series is not actually fully finished yet.  I still have several type conversion tests not being vectorized and it wasn't widely tested.  That's what I'm working on now.
>
> Will be glad to any comments.

You need to expand the documentation of GEN_MASK_EXPR to say something
about which bits correspond
to which vector elements.  You also should add constant-folding of
GEN_MASK_EXPR to const_unop in fold-const.c
(which would also document the above).

I agree that you need to combine the two GIMPLE stmts at expand time.
How do intrinsics expose
those masks btw?

One other idea that crossed my mind is that we might want to support
vector<bool> and map that to
an integer mode thus effectively allowing vect1 > vect2 generating a
vector<bool> directly.  But I'm not
sure what kind of issues you run into with that approach.

> Thanks,
> Ilya
> --
> 2015-08-17  Ilya Enkovich  <enkovich.gnu@gmail.com>
>
>         * expr.c (expand_expr_real_2): Support GEN_MASK_EXPR.
>         * gimple-pretty-print.c (dump_unary_rhs): Likewise.
>         * gimple.c (get_gimple_rhs_num_ops): Likewise.
>         * optabs.c: Include gimple.h.
>         (vector_compare_rtx): Add OPNO arg.
>         (get_gen_mask_icode): New.
>         (expand_gen_mask_expr_p): New.
>         (expand_gen_mask_expr): New.
>         (expand_vec_cond_expr): Adjust vector_compare_rtx call.
>         * optabs.def (gen_mask_optab): New.
>         (gen_masku_optab): New.
>         * optabs.h (expand_gen_mask_expr_p): New.
>         (expand_gen_mask_expr): New.
>         * tree-cfg.c (verify_gimple_assign_unary): Support GEN_MASK_EXPR.
>         * tree-inline.c (estimate_operator_cost): Likewise.
>         * tree-pretty-print.c (dump_generic_node): Likewise.
>         * tree-ssa-operands.c (get_expr_operands): Likewise.
>         * tree.def (GEN_MASK_EXPR): New.
>
>
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 31b4573..8af5926 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -9180,6 +9180,10 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
>         return temp;
>        }
>
> +    case GEN_MASK_EXPR:
> +      target = expand_gen_mask_expr (type, treeop0, target);
> +      return target;
> +
>      case VEC_COND_EXPR:
>        target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
>        return target;
> diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
> index 53900dd..ac25b79 100644
> --- a/gcc/gimple-pretty-print.c
> +++ b/gcc/gimple-pretty-print.c
> @@ -300,6 +300,12 @@ dump_unary_rhs (pretty_printer *buffer, gassign *gs, int spc, int flags)
>        pp_greater (buffer);
>        break;
>
> +    case GEN_MASK_EXPR:
> +      pp_string (buffer, "GEN_MASK <");
> +      dump_generic_node (buffer, rhs, spc, flags, false);
> +      pp_greater (buffer);
> +      break;
> +
>      default:
>        if (TREE_CODE_CLASS (rhs_code) == tcc_declaration
>           || TREE_CODE_CLASS (rhs_code) == tcc_constant
> diff --git a/gcc/gimple.c b/gcc/gimple.c
> index cca328a..93caf01 100644
> --- a/gcc/gimple.c
> +++ b/gcc/gimple.c
> @@ -2005,7 +2005,8 @@ get_gimple_rhs_num_ops (enum tree_code code)
>     : ((SYM) == TRUTH_AND_EXPR                                              \
>        || (SYM) == TRUTH_OR_EXPR                                                    \
>        || (SYM) == TRUTH_XOR_EXPR) ? GIMPLE_BINARY_RHS                      \
> -   : (SYM) == TRUTH_NOT_EXPR ? GIMPLE_UNARY_RHS                                    \
> +   : ((SYM) == TRUTH_NOT_EXPR                                              \
> +      || (SYM) == GEN_MASK_EXPR) ? GIMPLE_UNARY_RHS                        \
>     : ((SYM) == COND_EXPR                                                   \
>        || (SYM) == WIDEN_MULT_PLUS_EXPR                                     \
>        || (SYM) == WIDEN_MULT_MINUS_EXPR                                            \
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index a6ca706..bf466ca 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "recog.h"
>  #include "reload.h"
>  #include "target.h"
> +#include "gimple.h"
>
>  struct target_optabs default_target_optabs;
>  struct target_libfuncs default_target_libfuncs;
> @@ -6488,11 +6489,13 @@ get_rtx_code (enum tree_code tcode, bool unsignedp)
>  }
>
>  /* Return comparison rtx for COND. Use UNSIGNEDP to select signed or
> -   unsigned operators. Do not generate compare instruction.  */
> +   unsigned operators.  OPNO holds an index of the first comparison
> +   operand in insn with code ICODE.  Do not generate compare instruction.  */
>
>  static rtx
>  vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
> -                   bool unsignedp, enum insn_code icode)
> +                   bool unsignedp, enum insn_code icode,
> +                   unsigned int opno)
>  {
>    struct expand_operand ops[2];
>    rtx rtx_op0, rtx_op1;
> @@ -6518,7 +6521,7 @@ vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
>
>    create_input_operand (&ops[0], rtx_op0, m0);
>    create_input_operand (&ops[1], rtx_op1, m1);
> -  if (!maybe_legitimize_operands (icode, 4, 2, ops))
> +  if (!maybe_legitimize_operands (icode, opno, 2, ops))
>      gcc_unreachable ();
>    return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
>  }
> @@ -6789,6 +6792,77 @@ expand_vec_perm (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
>    return tmp;
>  }
>
> +/* Return insn code for a comparison operator with MODE,
> +   unsigned if UNS is true.  */
> +
> +static inline enum insn_code
> +get_gen_mask_icode (machine_mode mode, bool uns)
> +{
> +  optab tab = uns ? gen_masku_optab : gen_mask_optab;

Aww, ok - now you need two optabs of course.  I suppose combine
wasn't able to combine the two compares?

> +  return optab_handler (tab, mode);
> +}
> +
> +/* Return TRUE if appropriate vector insn is available
> +   for vector comparison expr with vector type VALUE_TYPE.  */
> +
> +bool
> +expand_gen_mask_expr_p (tree value_type)
> +{
> +  enum insn_code icode = get_gen_mask_icode (TYPE_MODE (value_type),
> +                                            TYPE_UNSIGNED (value_type));
> +  return (icode != CODE_FOR_nothing);
> +}
> +
> +/* Generate insns for a GEN_MASK_EXPR, given its TYPE and operand.  */
> +
> +rtx
> +expand_gen_mask_expr (tree type, tree op0, rtx target)
> +{
> +  struct expand_operand ops[4];
> +  enum insn_code icode;
> +  rtx comparison;
> +  machine_mode mode = TYPE_MODE (type);
> +  machine_mode cmp_op_mode;
> +  bool unsignedp;
> +  tree op0a, op0b;
> +  enum tree_code tcode;
> +  gimple def_stmt;
> +
> +  /* Avoid double comparison.  */
> +  if (TREE_CODE (op0) == SSA_NAME
> +      && (def_stmt = SSA_NAME_DEF_STMT (op0))
> +      && is_gimple_assign (def_stmt)
> +      && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt)) == tcc_comparison)

You need to use get_gimple_for_ssa_name (or related functions) here, you can't
simply re-use the def stmts operands because of eventual coalescing
that happened
making their ops no longer valid.

> +    {
> +      op0a = gimple_assign_rhs1 (def_stmt);
> +      op0b = gimple_assign_rhs2 (def_stmt);
> +      tcode = gimple_assign_rhs_code (def_stmt);
> +    }
> +  else
> +    {
> +      op0a = op0;
> +      op0b = build_zero_cst (TREE_TYPE (op0));
> +      tcode = NE_EXPR;
> +    }
> +
> +  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
> +  cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
> +
> +  gcc_assert (GET_MODE_BITSIZE (mode) >= GET_MODE_NUNITS (cmp_op_mode));
> +
> +  icode = get_gen_mask_icode (cmp_op_mode, unsignedp);
> +  if (icode == CODE_FOR_nothing)
> +    return 0;
> +
> +  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 2);
> +  create_output_operand (&ops[0], target, mode);
> +  create_fixed_operand (&ops[1], comparison);
> +  create_fixed_operand (&ops[2], XEXP (comparison, 0));
> +  create_fixed_operand (&ops[3], XEXP (comparison, 1));
> +  expand_insn (icode, 4, ops);
> +  return ops[0].value;
> +}
> +
>  /* Return insn code for a conditional operator with a comparison in
>     mode CMODE, unsigned if UNS is true, resulting in a value of mode VMODE.  */
>
> @@ -6861,7 +6935,7 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
>    if (icode == CODE_FOR_nothing)
>      return 0;
>
> -  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode);
> +  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 4);
>    rtx_op1 = expand_normal (op1);
>    rtx_op2 = expand_normal (op2);
>
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 888b21c..dc664d1 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -266,6 +266,8 @@ OPTAB_D (usad_optab, "usad$I$a")
>  OPTAB_D (ssad_optab, "ssad$I$a")
>  OPTAB_D (maskload_optab, "maskload$a")
>  OPTAB_D (maskstore_optab, "maskstore$a")
> +OPTAB_D (gen_mask_optab, "gen_mask$a")
> +OPTAB_D (gen_masku_optab, "gen_masku$a")
>  OPTAB_D (vec_extract_optab, "vec_extract$a")
>  OPTAB_D (vec_init_optab, "vec_init$a")
>  OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
> diff --git a/gcc/optabs.h b/gcc/optabs.h
> index 95f5cbc..d7b7fb0 100644
> --- a/gcc/optabs.h
> +++ b/gcc/optabs.h
> @@ -495,6 +495,12 @@ extern bool can_vec_perm_p (machine_mode, bool, const unsigned char *);
>  /* Generate code for VEC_PERM_EXPR.  */
>  extern rtx expand_vec_perm (machine_mode, rtx, rtx, rtx, rtx);
>
> +/* Return tree if target supports vector operations for comparison.  */
> +extern bool expand_gen_mask_expr_p (tree);
> +
> +/* Generate code for GEN_MASK_EXPR.  */
> +extern rtx expand_gen_mask_expr (tree, tree, rtx);
> +
>  /* Return tree if target supports vector operations for COND_EXPR.  */
>  bool expand_vec_cond_expr_p (tree, tree);
>
> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index 588ab69..5bae411 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -3646,6 +3646,16 @@ verify_gimple_assign_unary (gassign *stmt)
>      case CONJ_EXPR:
>        break;
>
> +    case GEN_MASK_EXPR:
> +       if (!INTEGRAL_TYPE_P (lhs_type) || !VECTOR_TYPE_P (rhs1_type))
> +         {
> +           error ("invalid types in gen_mask");

Does the LHS need to have at least as many bits as the RHS has elements?
Also what about non-integral vector types on the RHS?  I suppose only
integral ones are valid?

> +           debug_generic_expr (lhs_type);
> +           debug_generic_expr (rhs1_type);
> +           return true;
> +         }
> +       return false;
> +
>      default:
>        gcc_unreachable ();
>      }
> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
> index e1ceea4..052c055 100644
> --- a/gcc/tree-inline.c
> +++ b/gcc/tree-inline.c
> @@ -3850,6 +3850,7 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights,
>      case COMPLEX_EXPR:
>      case PAREN_EXPR:
>      case VIEW_CONVERT_EXPR:
> +    case GEN_MASK_EXPR:

eh...  is that important here?  After all expansion might fail to combine with
a previous compare.  I'd rather use the default.

>        return 0;
>
>      /* Assign cost of 1 to usual operations.
> diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
> index 7cd1fe7..5b6cdf0 100644
> --- a/gcc/tree-pretty-print.c
> +++ b/gcc/tree-pretty-print.c
> @@ -2485,6 +2485,12 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, int flags,
>        pp_greater (pp);
>        break;
>
> +    case GEN_MASK_EXPR:
> +      pp_string (pp, " GEN_MASK_EXPR < ");
> +      dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
> +      pp_string (pp, " > ");
> +      break;
> +
>      case VEC_COND_EXPR:
>        pp_string (pp, " VEC_COND_EXPR < ");
>        dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
> diff --git a/gcc/tree-ssa-operands.c b/gcc/tree-ssa-operands.c
> index b1e3f99..7151ad1 100644
> --- a/gcc/tree-ssa-operands.c
> +++ b/gcc/tree-ssa-operands.c
> @@ -837,6 +837,7 @@ get_expr_operands (struct function *fn, gimple stmt, tree *expr_p, int flags)
>         gimple_set_has_volatile_ops (stmt, true);
>        /* FALLTHRU */
>
> +    case GEN_MASK_EXPR:
>      case VIEW_CONVERT_EXPR:
>      do_unary:
>        get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 0), flags);
> diff --git a/gcc/tree.def b/gcc/tree.def
> index 56580af..42984ca 100644
> --- a/gcc/tree.def
> +++ b/gcc/tree.def
> @@ -536,6 +536,10 @@ DEFTREECODE (TARGET_EXPR, "target_expr", tcc_expression, 4)
>     1 and 2 are NULL.  The operands are then taken from the cfg edges. */
>  DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3)
>
> +/* Generate a scalar bitmask for a vector value.  Mask gets bits set
> +   for nonzero vector elements.  */
> +DEFTREECODE (GEN_MASK_EXPR, "gen_mask_expr", tcc_expression, 1)
> +
>  /* Vector conditional expression. It is like COND_EXPR, but with
>     vector operands.
>



More information about the Gcc-patches mailing list