This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Fold VEC_COND_EXPRs to IFN_COND_* where possible


On Thu, May 24, 2018 at 11:28 AM Richard Sandiford <
richard.sandiford@linaro.org> wrote:

> This patch adds the folds:

>    (vec_cond COND (foo A B) C) -> (IFN_COND_FOO COND A B C)
>    (vec_cond COND C (foo A B)) -> (IFN_COND_FOO (!COND) A B C)

> with the usual implicit restriction that the target must support
> the produced IFN_COND_FOO.

> The results of these folds don't have identical semantics, since
> the reverse transform would be invalid if (FOO A[i] B[i]) faults when
> COND[i] is false.  But this direction is OK since we're simply dropping
> faults for operations whose results aren't needed.

> The new gimple_resimplify4 doesn't try to do any constant folding
> on the IFN_COND_*s.  This is because a later patch will handle it
> by folding the associated unconditional operation.

> Doing this in gimple is better than doing it in .md patterns,
> since the second form (with the inverted condition) is much more
> common than the first, and it's better to fold away the inversion
> in gimple and optimise the result before entering expand.

> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
> and x86_64-linux-gnu.  OK to install?

> Richard


> 2018-05-24  Richard Sandiford  <richard.sandiford@linaro.org>

> gcc/
>          * doc/sourcebuild.texi (vect_double_cond_arith: Document.
>          * gimple-match.h (gimple_match_op::MAX_NUM_OPS): Bump to 4.
>          (gimple_match_op::gimple_match_op): Add an overload for 4
operands.
>          (gimple_match_op::set_op): Likewise.
>          (gimple_resimplify4): Declare.
>          * genmatch.c (commutative_op): Handle CFN_COND_* functions.

^^^  you don't seem to use that and I don't see how those are commutative
in operands 1 and 2 without inverting operand 0.  So w/o adjusting the
parsing part I think that people can write (cond_foo:c ...) and likely
be surprised that it isn't rejected.  It is of course required to make :C
work.

The patch is ok if you drop this hunk for now.  You can re-introduce it
as followup if you make sure to make :c error on those IFNs.

Thanks,
Richard.

>          (get_operand_type, expr::gen_transform): Likewise.
>          (decision_tree::gen): Generate a simplification routine for 4
operands.
>          * gimple-match-head.c (gimple_simplify): Add an overload for
>          4 operands.  In the top-level function, handle up to 4 call
>          arguments and call gimple_resimplify4.
>          (gimple_resimplify4): New function.
>          (build_call_internal): Pass a fourth operand.
>          (maybe_push_to_seq): Likewise.
>          * match.pd (UNCOND_BINARY, COND_BINARY): New operator lists.
>          Fold VEC_COND_EXPRs of an operation and a default value into
>          an IFN_COND_* function if possible.
>          * config/aarch64/iterators.md (UNSPEC_COND_MAX, UNSPEC_COND_MIN):
>          New unspecs.
>          (SVE_COND_FP_BINARY): Include them.
>          (optab, sve_fp_op): Handle them.
>          (SVE_INT_BINARY_REV): New code iterator.
>          (SVE_COND_FP_BINARY_REV): New int iterator.
>          (commutative): New int attribute.
>          * config/aarch64/aarch64-protos.h
(aarch64_sve_prepare_conditional_op):
>          Declare.
>          * config/aarch64/aarch64.c (aarch64_sve_prepare_conditional_op):
New
>          function.
>          * config/aarch64/aarch64-sve.md (cond_<optab><mode>): Use it.
>          (*cond_<optab><mode>): New patterns for reversed operands.

> gcc/testsuite/
>          * lib/target-supports.exp
>          (check_effective_target_vect_double_cond_arith): New proc.
>          * gcc.dg/vect/vect-cond-arith-1.c: New test.
>          * gcc.target/aarch64/sve/vcond_8.c: Likewise.
>          * gcc.target/aarch64/sve/vcond_8_run.c: Likewise.
>          * gcc.target/aarch64/sve/vcond_9.c: Likewise.
>          * gcc.target/aarch64/sve/vcond_9_run.c: Likewise.
>          * gcc.target/aarch64/sve/vcond_12.c: Likewise.
>          * gcc.target/aarch64/sve/vcond_12_run.c: Likewise.

> Index: gcc/doc/sourcebuild.texi
> ===================================================================
> --- gcc/doc/sourcebuild.texi    2018-05-24 09:02:24.987538940 +0100
> +++ gcc/doc/sourcebuild.texi    2018-05-24 09:54:37.508451387 +0100
> @@ -1425,6 +1425,10 @@ have different type from the value opera
>   @item vect_double
>   Target supports hardware vectors of @code{double}.

> +@item vect_double_cond_arith
> +Target supports conditional addition, subtraction, minimum and maximum
> +on vectors of @code{double}, via the @code{cond_} optabs.
> +
>   @item vect_element_align_preferred
>   The target's preferred vector alignment is the same as the element
>   alignment.
> Index: gcc/gimple-match.h
> ===================================================================
> --- gcc/gimple-match.h  2018-05-24 09:02:28.764328414 +0100
> +++ gcc/gimple-match.h  2018-05-24 09:54:37.509451356 +0100
> @@ -49,17 +49,19 @@ struct gimple_match_op
>     gimple_match_op (code_helper, tree, tree);
>     gimple_match_op (code_helper, tree, tree, tree);
>     gimple_match_op (code_helper, tree, tree, tree, tree);
> +  gimple_match_op (code_helper, tree, tree, tree, tree, tree);

>     void set_op (code_helper, tree, unsigned int);
>     void set_op (code_helper, tree, tree);
>     void set_op (code_helper, tree, tree, tree);
>     void set_op (code_helper, tree, tree, tree, tree);
> +  void set_op (code_helper, tree, tree, tree, tree, tree);
>     void set_value (tree);

>     tree op_or_null (unsigned int) const;

>     /* The maximum value of NUM_OPS.  */
> -  static const unsigned int MAX_NUM_OPS = 3;
> +  static const unsigned int MAX_NUM_OPS = 4;

>     /* The operation being performed.  */
>     code_helper code;
> @@ -113,6 +115,17 @@ gimple_match_op::gimple_match_op (code_h
>     ops[2] = op2;
>   }

> +inline
> +gimple_match_op::gimple_match_op (code_helper code_in, tree type_in,
> +                                 tree op0, tree op1, tree op2, tree op3)
> +  : code (code_in), type (type_in), num_ops (4)
> +{
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +}
> +
>   /* Change the operation performed to CODE_IN, the type of the result to
>      TYPE_IN, and the number of operands to NUM_OPS_IN.  The caller needs
>      to set the operands itself.  */
> @@ -160,6 +173,19 @@ gimple_match_op::set_op (code_helper cod
>     ops[2] = op2;
>   }

> +inline void
> +gimple_match_op::set_op (code_helper code_in, tree type_in,
> +                        tree op0, tree op1, tree op2, tree op3)
> +{
> +  code = code_in;
> +  type = type_in;
> +  num_ops = 4;
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +}
> +
>   /* Set the "operation" to be the single value VALUE, such as a constant
>      or SSA_NAME.  */

> @@ -196,6 +222,7 @@ bool gimple_simplify (gimple *, gimple_m
>   bool gimple_resimplify1 (gimple_seq *, gimple_match_op *, tree
(*)(tree));
>   bool gimple_resimplify2 (gimple_seq *, gimple_match_op *, tree
(*)(tree));
>   bool gimple_resimplify3 (gimple_seq *, gimple_match_op *, tree
(*)(tree));
> +bool gimple_resimplify4 (gimple_seq *, gimple_match_op *, tree
(*)(tree));
>   tree maybe_push_res_to_seq (gimple_match_op *, gimple_seq *,
>                              tree res = NULL_TREE);
>   void maybe_build_generic_op (gimple_match_op *);
> Index: gcc/genmatch.c
> ===================================================================
> --- gcc/genmatch.c      2018-05-24 09:02:28.763328469 +0100
> +++ gcc/genmatch.c      2018-05-24 09:54:37.508451387 +0100
> @@ -485,6 +485,15 @@ commutative_op (id_base *id)
>         case CFN_FNMS:
>          return 0;

> +      case CFN_COND_ADD:
> +      case CFN_COND_SUB:
> +      case CFN_COND_MAX:
> +      case CFN_COND_MIN:
> +      case CFN_COND_AND:
> +      case CFN_COND_IOR:
> +      case CFN_COND_XOR:
> +       return 1;
> +
>         default:
>          return -1;
>         }
> @@ -2370,6 +2379,18 @@ get_operand_type (id_base *op, unsigned
>     else if (*op == COND_EXPR
>             && pos == 0)
>       return "boolean_type_node";
> +  else if (strncmp (op->id, "CFN_COND_", 9) == 0)
> +    {
> +      /* IFN_COND_* operands 1 and later by default have the same type
> +        as the result.  The type of operand 0 needs to be specified
> +        explicitly.  */
> +      if (pos > 0 && expr_type)
> +       return expr_type;
> +      else if (pos > 0 && in_type)
> +       return in_type;
> +      else
> +       return NULL;
> +    }
>     else
>       {
>         /* Otherwise all types should match - choose one in order of
> @@ -2429,7 +2450,8 @@ expr::gen_transform (FILE *f, int indent
>         in_type = NULL;
>       }
>     else if (*opr == COND_EXPR
> -          || *opr == VEC_COND_EXPR)
> +          || *opr == VEC_COND_EXPR
> +          || strncmp (opr->id, "CFN_COND_", 9) == 0)
>       {
>         /* Conditions are of the same type as their first alternative.  */
>         sprintf (optype, "TREE_TYPE (ops%d[1])", depth);
> @@ -3737,7 +3759,7 @@ decision_tree::gen (FILE *f, bool gimple
>       }
>     fprintf (stderr, "removed %u duplicate tails\n", rcnt);

> -  for (unsigned n = 1; n <= 3; ++n)
> +  for (unsigned n = 1; n <= 4; ++n)
>       {
>         /* First generate split-out functions.  */
>         for (unsigned i = 0; i < root->kids.length (); i++)
> Index: gcc/gimple-match-head.c
> ===================================================================
> --- gcc/gimple-match-head.c     2018-05-24 09:02:28.764328414 +0100
> +++ gcc/gimple-match-head.c     2018-05-24 09:54:37.509451356 +0100
> @@ -51,6 +51,8 @@ static bool gimple_simplify (gimple_matc
>                               code_helper, tree, tree, tree);
>   static bool gimple_simplify (gimple_match_op *, gimple_seq *, tree
(*)(tree),
>                               code_helper, tree, tree, tree, tree);
> +static bool gimple_simplify (gimple_match_op *, gimple_seq *, tree
(*)(tree),
> +                            code_helper, tree, tree, tree, tree, tree);

>   const unsigned int gimple_match_op::MAX_NUM_OPS;

> @@ -215,6 +217,30 @@ gimple_resimplify3 (gimple_seq *seq, gim
>     return canonicalized;
>   }

> +/* Helper that matches and simplifies the toplevel result from
> +   a gimple_simplify run (where we don't want to build
> +   a stmt in case it's used in in-place folding).  Replaces
> +   RES_OP with a simplified and/or canonicalized result and
> +   returns whether any change was made.  */
> +
> +bool
> +gimple_resimplify4 (gimple_seq *seq, gimple_match_op *res_op,
> +                   tree (*valueize)(tree))
> +{
> +  /* No constant folding is defined for four-operand functions.  */
> +
> +  gimple_match_op res_op2 (*res_op);
> +  if (gimple_simplify (&res_op2, seq, valueize,
> +                      res_op->code, res_op->type,
> +                      res_op->ops[0], res_op->ops[1], res_op->ops[2],
> +                      res_op->ops[3]))
> +    {
> +      *res_op = res_op2;
> +      return true;
> +    }
> +
> +  return false;
> +}

>   /* If in GIMPLE the operation described by RES_OP should be single-rhs,
>      build a GENERIC tree for that expression and update RES_OP
accordingly.  */
> @@ -256,7 +282,8 @@ build_call_internal (internal_fn fn, gim
>     return gimple_build_call_internal (fn, res_op->num_ops,
>                                       res_op->op_or_null (0),
>                                       res_op->op_or_null (1),
> -                                    res_op->op_or_null (2));
> +                                    res_op->op_or_null (2),
> +                                    res_op->op_or_null (3));
>   }

>   /* Push the exploded expression described by RES_OP as a statement to
> @@ -343,7 +370,8 @@ maybe_push_res_to_seq (gimple_match_op *
>            new_stmt = gimple_build_call (decl, num_ops,
>                                          res_op->op_or_null (0),
>                                          res_op->op_or_null (1),
> -                                       res_op->op_or_null (2));
> +                                       res_op->op_or_null (2),
> +                                       res_op->op_or_null (3));
>          }
>         if (!res)
>          {
> @@ -654,7 +682,7 @@ gimple_simplify (gimple *stmt, gimple_ma
>         /* ???  This way we can't simplify calls with side-effects.  */
>         if (gimple_call_lhs (stmt) != NULL_TREE
>            && gimple_call_num_args (stmt) >= 1
> -         && gimple_call_num_args (stmt) <= 3)
> +         && gimple_call_num_args (stmt) <= 4)
>          {
>            bool valueized = false;
>            combined_fn cfn;
> @@ -697,6 +725,9 @@ gimple_simplify (gimple *stmt, gimple_ma
>              case 3:
>                return (gimple_resimplify3 (seq, res_op, valueize)
>                        || valueized);
> +           case 4:
> +             return (gimple_resimplify4 (seq, res_op, valueize)
> +                     || valueized);
>              default:
>               gcc_unreachable ();
>              }
> Index: gcc/match.pd
> ===================================================================
> --- gcc/match.pd        2018-05-24 09:31:41.564005930 +0100
> +++ gcc/match.pd        2018-05-24 09:54:37.509451356 +0100
> @@ -74,6 +74,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (FLOOR)
>   DEFINE_INT_AND_FLOAT_ROUND_FN (CEIL)
>   DEFINE_INT_AND_FLOAT_ROUND_FN (ROUND)
>   DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> +
> +/* Binary operations and their associated IFN_COND_* function.  */
> +(define_operator_list UNCOND_BINARY
> +  plus minus
> +  min max
> +  bit_and bit_ior bit_xor)
> +(define_operator_list COND_BINARY
> +  IFN_COND_ADD IFN_COND_SUB
> +  IFN_COND_MIN IFN_COND_MAX
> +  IFN_COND_AND IFN_COND_IOR IFN_COND_XOR)

>   /* As opposed to convert?, this still creates a single pattern, so
>      it is not a suitable replacement for convert? in all cases.  */
> @@ -4760,3 +4770,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>     (negate (IFN_FNMS@3 @0 @1 @2))
>     (if (single_use (@3))
>      (IFN_FMA @0 @1 @2))))
> +
> +/* Simplify:
> +
> +     a = a1 op a2
> +     r = c ? a : b;
> +
> +   to:
> +
> +     r = c ? a1 op a2 : b;
> +
> +   if the target can do it in one go.  This makes the operation
conditional
> +   on c, so could drop potentially-trapping arithmetic, but that's a
valid
> +   simplification if the result of the operation isn't needed.  */
> +(for uncond_op (UNCOND_BINARY)
> +     cond_op (COND_BINARY)
> + (simplify
> +  (vec_cond @0 (view_convert? (uncond_op@4 @1 @2)) @3)
> +  (with { tree op_type = TREE_TYPE (@4); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (view_convert (cond_op @0 @1 @2 (view_convert:op_type @3))))))
> + (simplify
> +  (vec_cond @0 @1 (view_convert? (uncond_op@4 @2 @3)))
> +  (with { tree op_type = TREE_TYPE (@4); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type
@1)))))))
> Index: gcc/config/aarch64/iterators.md
> ===================================================================
> --- gcc/config/aarch64/iterators.md     2018-05-24 09:32:10.521816556
+0100
> +++ gcc/config/aarch64/iterators.md     2018-05-24 09:54:37.508451387
+0100
> @@ -464,6 +464,8 @@ (define_c_enum "unspec"
>       UNSPEC_UMUL_HIGHPART ; Used in aarch64-sve.md.
>       UNSPEC_COND_ADD    ; Used in aarch64-sve.md.
>       UNSPEC_COND_SUB    ; Used in aarch64-sve.md.
> +    UNSPEC_COND_MAX    ; Used in aarch64-sve.md.
> +    UNSPEC_COND_MIN    ; Used in aarch64-sve.md.
>       UNSPEC_COND_LT     ; Used in aarch64-sve.md.
>       UNSPEC_COND_LE     ; Used in aarch64-sve.md.
>       UNSPEC_COND_EQ     ; Used in aarch64-sve.md.
> @@ -1203,6 +1205,8 @@ (define_code_iterator SVE_FP_UNARY [neg
>   (define_code_iterator SVE_INT_BINARY [plus minus smax umax smin umin
>                                        and ior xor])

> +(define_code_iterator SVE_INT_BINARY_REV [minus])
> +
>   (define_code_iterator SVE_INT_BINARY_SD [div udiv])

>   ;; SVE integer comparisons.
> @@ -1535,7 +1539,10 @@ (define_int_iterator UNPACK_UNSIGNED [UN

>   (define_int_iterator MUL_HIGHPART [UNSPEC_SMUL_HIGHPART
UNSPEC_UMUL_HIGHPART])

> -(define_int_iterator SVE_COND_FP_BINARY [UNSPEC_COND_ADD
UNSPEC_COND_SUB])
> +(define_int_iterator SVE_COND_FP_BINARY [UNSPEC_COND_ADD UNSPEC_COND_SUB
> +                                        UNSPEC_COND_MAX UNSPEC_COND_MIN])
> +
> +(define_int_iterator SVE_COND_FP_BINARY_REV [UNSPEC_COND_SUB])

>   (define_int_iterator SVE_COND_FP_CMP [UNSPEC_COND_LT UNSPEC_COND_LE
>                                        UNSPEC_COND_EQ UNSPEC_COND_NE
> @@ -1565,7 +1572,9 @@ (define_int_attr optab [(UNSPEC_ANDF "an
>                          (UNSPEC_IORV "ior")
>                          (UNSPEC_XORV "xor")
>                          (UNSPEC_COND_ADD "add")
> -                       (UNSPEC_COND_SUB "sub")])
> +                       (UNSPEC_COND_SUB "sub")
> +                       (UNSPEC_COND_MAX "smax")
> +                       (UNSPEC_COND_MIN "smin")])

>   (define_int_attr  maxmin_uns [(UNSPEC_UMAXV "umax")
>                                (UNSPEC_UMINV "umin")
> @@ -1777,4 +1786,11 @@ (define_int_attr cmp_op [(UNSPEC_COND_LT
>                           (UNSPEC_COND_GT "gt")])

>   (define_int_attr sve_fp_op [(UNSPEC_COND_ADD "fadd")
> -                           (UNSPEC_COND_SUB "fsub")])
> +                           (UNSPEC_COND_SUB "fsub")
> +                           (UNSPEC_COND_MAX "fmaxnm")
> +                           (UNSPEC_COND_MIN "fminnm")])
> +
> +(define_int_attr commutative [(UNSPEC_COND_ADD "true")
> +                             (UNSPEC_COND_SUB "false")
> +                             (UNSPEC_COND_MIN "true")
> +                             (UNSPEC_COND_MAX "true")])
> Index: gcc/config/aarch64/aarch64-protos.h
> ===================================================================
> --- gcc/config/aarch64/aarch64-protos.h 2018-05-24 09:02:25.112531972
+0100
> +++ gcc/config/aarch64/aarch64-protos.h 2018-05-24 09:54:37.505451481
+0100
> @@ -513,6 +513,7 @@ bool aarch64_gen_adjusted_ldpstp (rtx *,
>   void aarch64_expand_sve_vec_cmp_int (rtx, rtx_code, rtx, rtx);
>   bool aarch64_expand_sve_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
>   void aarch64_expand_sve_vcond (machine_mode, machine_mode, rtx *);
> +void aarch64_sve_prepare_conditional_op (rtx *, unsigned int, bool);
>   #endif /* RTX_CODE */

>   void aarch64_init_builtins (void);
> Index: gcc/config/aarch64/aarch64.c
> ===================================================================
> --- gcc/config/aarch64/aarch64.c        2018-05-24 09:02:25.114531861
+0100
> +++ gcc/config/aarch64/aarch64.c        2018-05-24 09:54:37.507451418
+0100
> @@ -16025,6 +16025,54 @@ aarch64_expand_sve_vcond (machine_mode d
>     emit_set_insn (ops[0], gen_rtx_UNSPEC (data_mode, vec, UNSPEC_SEL));
>   }

> +/* Prepare a cond_<optab><mode> operation that has the operands
> +   given by OPERANDS, where:
> +
> +   - operand 0 is the destination
> +   - operand 1 is a predicate
> +   - operands 2 to NOPS - 2 are the operands to an operation that is
> +     performed for active lanes
> +   - operand NOPS - 1 specifies the values to use for inactive lanes.
> +
> +   COMMUTATIVE_P is true if operands 2 and 3 are commutative.  In that
case,
> +   no pattern is provided for a tie between operands 3 and NOPS - 1.  */
> +
> +void
> +aarch64_sve_prepare_conditional_op (rtx *operands, unsigned int nops,
> +                                   bool commutative_p)
> +{
> +  /* We can do the operation directly if the "else" value matches one
> +     of the other inputs.  */
> +  for (unsigned int i = 2; i < nops - 1; ++i)
> +    if (rtx_equal_p (operands[i], operands[nops - 1]))
> +      {
> +       if (i == 3 && commutative_p)
> +         std::swap (operands[2], operands[3]);
> +       return;
> +      }
> +
> +  /* If the "else" value is different from the other operands, we have
> +     the choice of doing a SEL on the output or a SEL on an input.
> +     Neither choice is better in all cases, but one advantage of
> +     selecting the input is that it can avoid a move when the output
> +     needs to be distinct from the inputs.  E.g. if operand N maps to
> +     register N, selecting the output would give:
> +
> +       MOVPRFX Z0.S, Z2.S
> +       ADD Z0.S, P1/M, Z0.S, Z3.S
> +       SEL Z0.S, P1, Z0.S, Z4.S
> +
> +     whereas selecting the input avoids the MOVPRFX:
> +
> +       SEL Z0.S, P1, Z2.S, Z4.S
> +       ADD Z0.S, P1/M, Z0.S, Z3.S.  */
> +  machine_mode mode = GET_MODE (operands[0]);
> +  rtx temp = gen_reg_rtx (mode);
> +  rtvec vec = gen_rtvec (3, operands[1], operands[2], operands[nops -
1]);
> +  emit_set_insn (temp, gen_rtx_UNSPEC (mode, vec, UNSPEC_SEL));
> +  operands[2] = operands[nops - 1] = temp;
> +}
> +
>   /* Implement TARGET_MODES_TIEABLE_P.  In principle we should always
return
>      true.  However due to issues with register allocation it is preferable
>      to avoid tieing integer scalar and FP scalar modes.  Executing integer
> Index: gcc/config/aarch64/aarch64-sve.md
> ===================================================================
> --- gcc/config/aarch64/aarch64-sve.md   2018-05-24 09:32:10.521816556
+0100
> +++ gcc/config/aarch64/aarch64-sve.md   2018-05-24 09:54:37.506451449
+0100
> @@ -1799,7 +1799,8 @@ (define_expand "cond_<optab><mode>"
>            UNSPEC_SEL))]
>     "TARGET_SVE"
>   {
> -  gcc_assert (rtx_equal_p (operands[2], operands[4]));
> +  bool commutative_p = (GET_RTX_CLASS (<CODE>) == RTX_COMM_ARITH);
> +  aarch64_sve_prepare_conditional_op (operands, 5, commutative_p);
>   })

>   ;; Predicated integer operations.
> @@ -1816,6 +1817,20 @@ (define_insn "*cond_<optab><mode>"
>     "<sve_int_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
>   )

> +;; Predicated integer operations with the operands reversed.
> +(define_insn "*cond_<optab><mode>"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w")
> +       (unspec:SVE_I
> +         [(match_operand:<VPRED> 1 "register_operand" "Upl")
> +          (SVE_INT_BINARY_REV:SVE_I
> +            (match_operand:SVE_I 2 "register_operand" "w")
> +            (match_operand:SVE_I 3 "register_operand" "0"))
> +          (match_dup 3)]
> +         UNSPEC_SEL))]
> +  "TARGET_SVE"
> +  "<sve_int_op>r\t%0.<Vetype>, %1/m, %0.<Vetype>, %2.<Vetype>"
> +)
> +
>   ;; Set operand 0 to the last active element in operand 3, or to tied
>   ;; operand 1 if no elements are active.
>   (define_insn "fold_extract_last_<mode>"
> @@ -2597,7 +2612,7 @@ (define_expand "cond_<optab><mode>"
>            UNSPEC_SEL))]
>     "TARGET_SVE"
>   {
> -  gcc_assert (rtx_equal_p (operands[2], operands[4]));
> +  aarch64_sve_prepare_conditional_op (operands, 5, <commutative>);
>   })

>   ;; Predicated floating-point operations.
> @@ -2616,6 +2631,22 @@ (define_insn "*cond_<optab><mode>"
>     "<sve_fp_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
>   )

> +;; Predicated floating-point operations with the operands reversed.
> +(define_insn "*cond_<optab><mode>"
> +  [(set (match_operand:SVE_F 0 "register_operand" "=w")
> +       (unspec:SVE_F
> +         [(match_operand:<VPRED> 1 "register_operand" "Upl")
> +          (unspec:SVE_F
> +            [(match_dup 1)
> +             (match_operand:SVE_F 2 "register_operand" "w")
> +             (match_operand:SVE_F 3 "register_operand" "0")]
> +            SVE_COND_FP_BINARY)
> +          (match_dup 3)]
> +         UNSPEC_SEL))]
> +  "TARGET_SVE"
> +  "<sve_fp_op>r\t%0.<Vetype>, %1/m, %0.<Vetype>, %2.<Vetype>"
> +)
> +
>   ;; Shift an SVE vector left and insert a scalar into element 0.
>   (define_insn "vec_shl_insert_<mode>"
>     [(set (match_operand:SVE_ALL 0 "register_operand" "=w, w")
> Index: gcc/testsuite/lib/target-supports.exp
> ===================================================================
> --- gcc/testsuite/lib/target-supports.exp       2018-05-24
09:02:24.729553320 +0100
> +++ gcc/testsuite/lib/target-supports.exp       2018-05-24
09:54:37.511451293 +0100
> @@ -5590,6 +5590,13 @@ proc check_effective_target_vect_double
>       return $et_vect_double_saved($et_index)
>   }

> +# Return 1 if the target supports conditional addition, subtraction,
minimum
> +# and maximum on vectors of double, via the cond_ optabs.  Return 0
otherwise.
> +
> +proc check_effective_target_vect_double_cond_arith { } {
> +    return [check_effective_target_aarch64_sve]
> +}
> +
>   # Return 1 if the target supports hardware vectors of long long, 0
otherwise.
>   #
>   # This won't change for different subtargets so cache the result.
> Index: gcc/testsuite/gcc.dg/vect/vect-cond-arith-1.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-cond-arith-1.c       2018-05-24
09:54:37.509451356 +0100
> @@ -0,0 +1,58 @@
> +/* { dg-additional-options "-fdump-tree-optimized -fno-trapping-math
-ffinite-math-only" } */
> +
> +#include "tree-vect.h"
> +
> +#define N (VECTOR_BITS * 11 / 64 + 3)
> +
> +#define add(A, B) ((A) + (B))
> +#define sub(A, B) ((A) - (B))
> +
> +#define DEF(OP)                                                        \
> +  void __attribute__ ((noipa))                                 \
> +  f_##OP (double *restrict a, double *restrict b, double x)    \
> +  {                                                            \
> +    for (int i = 0; i < N; ++i)                                        \
> +      {                                                                \
> +       double truev = OP (b[i], x);                            \
> +       a[i] = b[i] < 100 ? truev : b[i];                       \
> +      }                                                                \
> +  }
> +
> +#define TEST(OP)                                       \
> +  {                                                    \
> +    f_##OP (a, b, 10);                                 \
> +    for (int i = 0; i < N; ++i)                                \
> +      {                                                        \
> +       int bval = (i % 17) * 10;                       \
> +       int truev = OP (bval, 10);                      \
> +       if (a[i] != (bval < 100 ? truev : bval))        \
> +       __builtin_abort ();                             \
> +       asm volatile ("" ::: "memory");                 \
> +      }                                                        \
> +  }
> +
> +#define FOR_EACH_OP(T)                         \
> +  T (add)                                      \
> +  T (sub)                                      \
> +  T (__builtin_fmax)                           \
> +  T (__builtin_fmin)
> +
> +FOR_EACH_OP (DEF)
> +
> +int
> +main (void)
> +{
> +  double a[N], b[N];
> +  for (int i = 0; i < N; ++i)
> +    {
> +      b[i] = (i % 17) * 10;
> +      asm volatile ("" ::: "memory");
> +    }
> +  FOR_EACH_OP (TEST)
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump { = \.COND_ADD} "optimized" { target
vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target
vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_MAX} "optimized" { target
vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_MIN} "optimized" { target
vect_double_cond_arith } } } */
> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_8.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_8.c      2018-05-24
09:54:37.510451324 +0100
> @@ -0,0 +1,119 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-vectorize -fno-trapping-math
-ffinite-math-only" } */
> +
> +#include <stdint.h>
> +
> +#define add(A, B) ((A) + (B))
> +#define sub(A, B) ((A) - (B))
> +#define max(A, B) ((A) > (B) ? (A) : (B))
> +#define min(A, B) ((A) < (B) ? (A) : (B))
> +#define and(A, B) ((A) & (B))
> +#define ior(A, B) ((A) | (B))
> +#define xor(A, B) ((A) ^ (B))
> +
> +#define DEF_LOOP(TYPE, CMPTYPE, OP)                            \
> +  void __attribute__((noipa))                                  \
> +  f_##OP##_##TYPE (TYPE *restrict dest, CMPTYPE *restrict cond,        \
> +                  CMPTYPE limit, TYPE *restrict src,           \
> +                  TYPE val, unsigned int n)                    \
> +  {                                                            \
> +    for (unsigned int i = 0; i < n; ++i)                       \
> +      {                                                                \
> +       TYPE truev = OP (src[i], val);                          \
> +       dest[i] = cond[i] < limit ? truev : src[i];             \
> +      }                                                                \
> +  }
> +
> +#define FOR_EACH_INT_TYPE(T, TYPE) \
> +  T (TYPE, TYPE, add) \
> +  T (TYPE, TYPE, sub) \
> +  T (TYPE, TYPE, max) \
> +  T (TYPE, TYPE, min) \
> +  T (TYPE, TYPE, and) \
> +  T (TYPE, TYPE, ior) \
> +  T (TYPE, TYPE, xor)
> +
> +#define FOR_EACH_FP_TYPE(T, TYPE, CMPTYPE, SUFFIX) \
> +  T (TYPE, CMPTYPE, add) \
> +  T (TYPE, CMPTYPE, sub) \
> +  T (TYPE, CMPTYPE, __builtin_fmax##SUFFIX) \
> +  T (TYPE, CMPTYPE, __builtin_fmin##SUFFIX)
> +
> +#define FOR_EACH_LOOP(T) \
> +  FOR_EACH_INT_TYPE (T, int8_t) \
> +  FOR_EACH_INT_TYPE (T, int16_t) \
> +  FOR_EACH_INT_TYPE (T, int32_t) \
> +  FOR_EACH_INT_TYPE (T, int64_t) \
> +  FOR_EACH_INT_TYPE (T, uint8_t) \
> +  FOR_EACH_INT_TYPE (T, uint16_t) \
> +  FOR_EACH_INT_TYPE (T, uint32_t) \
> +  FOR_EACH_INT_TYPE (T, uint64_t) \
> +  FOR_EACH_FP_TYPE (T, _Float16, uint16_t, f16) \
> +  FOR_EACH_FP_TYPE (T, float, float, f32) \
> +  FOR_EACH_FP_TYPE (T, double, double, f64)
> +
> +FOR_EACH_LOOP (DEF_LOOP)
> +
> +/* { dg-final { scan-assembler-not {\tsel\t} } } */
> +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */
> +
> +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.b, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, p[0-7]/m,} 2 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tsub\tz[0-9]+\.b, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tsub\tz[0-9]+\.h, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tsub\tz[0-9]+\.s, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tsub\tz[0-9]+\.d, p[0-7]/m,} 2 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tsmax\tz[0-9]+\.b, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tsmax\tz[0-9]+\.h, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tsmax\tz[0-9]+\.s, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tsmax\tz[0-9]+\.d, p[0-7]/m,} 1 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tsmin\tz[0-9]+\.b, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tsmin\tz[0-9]+\.h, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tsmin\tz[0-9]+\.s, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tsmin\tz[0-9]+\.d, p[0-7]/m,} 1 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tumax\tz[0-9]+\.b, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tumax\tz[0-9]+\.h, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tumax\tz[0-9]+\.s, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tumax\tz[0-9]+\.d, p[0-7]/m,} 1 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tumin\tz[0-9]+\.b, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tumin\tz[0-9]+\.h, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tumin\tz[0-9]+\.s, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tumin\tz[0-9]+\.d, p[0-7]/m,} 1 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.b, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.h, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.s, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.d, p[0-7]/m,} 2 }
} */
> +
> +/* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.b, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.h, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.s, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.d, p[0-7]/m,} 2 }
} */
> +
> +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.b, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.h, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.s, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.d, p[0-7]/m,} 2 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.h, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m,} 1 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.h, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m,} 1 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tfmaxnm\tz[0-9]+\.h, p[0-7]/m,} 1
} } */
> +/* { dg-final { scan-assembler-times {\tfmaxnm\tz[0-9]+\.s, p[0-7]/m,} 1
} } */
> +/* { dg-final { scan-assembler-times {\tfmaxnm\tz[0-9]+\.d, p[0-7]/m,} 1
} } */
> +
> +/* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.h, p[0-7]/m,} 1
} } */
> +/* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.s, p[0-7]/m,} 1
} } */
> +/* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.d, p[0-7]/m,} 1
} } */
> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_8_run.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_8_run.c  2018-05-24
09:54:37.510451324 +0100
> @@ -0,0 +1,32 @@
> +/* { dg-do run { target aarch64_sve_hw } } */
> +/* { dg-options "-O2 -ftree-vectorize -fno-trapping-math
-ffinite-math-only" } */
> +
> +#include "vcond_8.c"
> +
> +#define N 187
> +
> +#define TEST_LOOP(TYPE, CMPTYPE, OP)                           \
> +  {                                                            \
> +    TYPE dest[N], src[N];                                      \
> +    CMPTYPE cond[N];                                           \
> +    for (unsigned int i = 0; i < N; ++i)                       \
> +      {                                                                \
> +        src[i] = i * 3;                                                \
> +       cond[i] = i % 5;                                        \
> +      }                                                                \
> +    f_##OP##_##TYPE (dest, cond, 3, src, 77, N);               \
> +    for (unsigned int i = 0; i < N; ++i)                       \
> +      {                                                                \
> +        TYPE if_false = i * 3;                                 \
> +       TYPE if_true = OP (if_false, (TYPE) 77);                \
> +       if (dest[i] != (i % 5 < 3 ? if_true : if_false))        \
> +         __builtin_abort ();                                   \
> +      }                                                                \
> +  }
> +
> +int __attribute__ ((optimize (1)))
> +main (void)
> +{
> +  FOR_EACH_LOOP (TEST_LOOP);
> +  return 0;
> +}
> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_9.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_9.c      2018-05-24
09:54:37.510451324 +0100
> @@ -0,0 +1,119 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-vectorize -fno-trapping-math
-ffinite-math-only" } */
> +
> +#include <stdint.h>
> +
> +#define add(A, B) ((A) + (B))
> +#define sub(A, B) ((A) - (B))
> +#define max(A, B) ((A) > (B) ? (A) : (B))
> +#define min(A, B) ((A) < (B) ? (A) : (B))
> +#define and(A, B) ((A) & (B))
> +#define ior(A, B) ((A) | (B))
> +#define xor(A, B) ((A) ^ (B))
> +
> +#define DEF_LOOP(TYPE, CMPTYPE, OP)                            \
> +  void __attribute__((noipa))                                  \
> +  f_##OP##_##TYPE (TYPE *restrict dest, CMPTYPE *restrict cond,        \
> +                  CMPTYPE limit, TYPE *restrict src1,          \
> +                  TYPE *restrict src2, unsigned int n)         \
> +  {                                                            \
> +    for (unsigned int i = 0; i < n; ++i)                       \
> +      {                                                                \
> +       TYPE truev = OP (src1[i], src2[i]);                     \
> +       dest[i] = cond[i] < limit ? truev : src2[i];            \
> +      }                                                                \
> +  }
> +
> +#define FOR_EACH_INT_TYPE(T, TYPE) \
> +  T (TYPE, TYPE, add) \
> +  T (TYPE, TYPE, sub) \
> +  T (TYPE, TYPE, max) \
> +  T (TYPE, TYPE, min) \
> +  T (TYPE, TYPE, and) \
> +  T (TYPE, TYPE, ior) \
> +  T (TYPE, TYPE, xor)
> +
> +#define FOR_EACH_FP_TYPE(T, TYPE, CMPTYPE, SUFFIX) \
> +  T (TYPE, CMPTYPE, add) \
> +  T (TYPE, CMPTYPE, sub) \
> +  T (TYPE, CMPTYPE, __builtin_fmax##SUFFIX) \
> +  T (TYPE, CMPTYPE, __builtin_fmin##SUFFIX)
> +
> +#define FOR_EACH_LOOP(T) \
> +  FOR_EACH_INT_TYPE (T, int8_t) \
> +  FOR_EACH_INT_TYPE (T, int16_t) \
> +  FOR_EACH_INT_TYPE (T, int32_t) \
> +  FOR_EACH_INT_TYPE (T, int64_t) \
> +  FOR_EACH_INT_TYPE (T, uint8_t) \
> +  FOR_EACH_INT_TYPE (T, uint16_t) \
> +  FOR_EACH_INT_TYPE (T, uint32_t) \
> +  FOR_EACH_INT_TYPE (T, uint64_t) \
> +  FOR_EACH_FP_TYPE (T, _Float16, uint16_t, f16) \
> +  FOR_EACH_FP_TYPE (T, float, float, f32) \
> +  FOR_EACH_FP_TYPE (T, double, double, f64)
> +
> +FOR_EACH_LOOP (DEF_LOOP)
> +
> +/* { dg-final { scan-assembler-not {\tsel\t} } } */
> +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */
> +
> +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.b, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, p[0-7]/m,} 2 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tsubr\tz[0-9]+\.b, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tsubr\tz[0-9]+\.h, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tsubr\tz[0-9]+\.s, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tsubr\tz[0-9]+\.d, p[0-7]/m,} 2 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tsmax\tz[0-9]+\.b, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tsmax\tz[0-9]+\.h, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tsmax\tz[0-9]+\.s, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tsmax\tz[0-9]+\.d, p[0-7]/m,} 1 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tsmin\tz[0-9]+\.b, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tsmin\tz[0-9]+\.h, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tsmin\tz[0-9]+\.s, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tsmin\tz[0-9]+\.d, p[0-7]/m,} 1 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tumax\tz[0-9]+\.b, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tumax\tz[0-9]+\.h, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tumax\tz[0-9]+\.s, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tumax\tz[0-9]+\.d, p[0-7]/m,} 1 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tumin\tz[0-9]+\.b, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tumin\tz[0-9]+\.h, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tumin\tz[0-9]+\.s, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tumin\tz[0-9]+\.d, p[0-7]/m,} 1 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.b, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.h, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.s, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.d, p[0-7]/m,} 2 }
} */
> +
> +/* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.b, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.h, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.s, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.d, p[0-7]/m,} 2 }
} */
> +
> +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.b, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.h, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.s, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.d, p[0-7]/m,} 2 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.h, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m,} 1 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tfsubr\tz[0-9]+\.h, p[0-7]/m,} 1
} } */
> +/* { dg-final { scan-assembler-times {\tfsubr\tz[0-9]+\.s, p[0-7]/m,} 1
} } */
> +/* { dg-final { scan-assembler-times {\tfsubr\tz[0-9]+\.d, p[0-7]/m,} 1
} } */
> +
> +/* { dg-final { scan-assembler-times {\tfmaxnm\tz[0-9]+\.h, p[0-7]/m,} 1
} } */
> +/* { dg-final { scan-assembler-times {\tfmaxnm\tz[0-9]+\.s, p[0-7]/m,} 1
} } */
> +/* { dg-final { scan-assembler-times {\tfmaxnm\tz[0-9]+\.d, p[0-7]/m,} 1
} } */
> +
> +/* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.h, p[0-7]/m,} 1
} } */
> +/* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.s, p[0-7]/m,} 1
} } */
> +/* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.d, p[0-7]/m,} 1
} } */
> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_9_run.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_9_run.c  2018-05-24
09:54:37.510451324 +0100
> @@ -0,0 +1,34 @@
> +/* { dg-do run { target aarch64_sve_hw } } */
> +/* { dg-options "-O2 -ftree-vectorize -fno-trapping-math
-ffinite-math-only" } */
> +
> +#include "vcond_9.c"
> +
> +#define N 187
> +
> +#define TEST_LOOP(TYPE, CMPTYPE, OP)                           \
> +  {                                                            \
> +    TYPE dest[N], src1[N], src2[N];                            \
> +    CMPTYPE cond[N];                                           \
> +    for (unsigned int i = 0; i < N; ++i)                       \
> +      {                                                                \
> +        src1[i] = i * 4 - i % 7;                               \
> +        src2[i] = i * 3 + 1;                                   \
> +       cond[i] = i % 5;                                        \
> +      }                                                                \
> +    f_##OP##_##TYPE (dest, cond, 3, src1, src2, N);            \
> +    for (unsigned int i = 0; i < N; ++i)                       \
> +      {                                                                \
> +       TYPE src1v = i * 4 - i % 7;                             \
> +        TYPE src2v = i * 3 + 1;                                        \
> +       TYPE if_true = OP (src1v, src2v);                       \
> +       if (dest[i] != (i % 5 < 3 ? if_true : src2v))           \
> +         __builtin_abort ();                                   \
> +      }                                                                \
> +  }
> +
> +int __attribute__ ((optimize (1)))
> +main (void)
> +{
> +  FOR_EACH_LOOP (TEST_LOOP);
> +  return 0;
> +}
> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_12.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_12.c     2018-05-24
09:54:37.510451324 +0100
> @@ -0,0 +1,125 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-vectorize -ffast-math" } */
> +
> +#include <stdint.h>
> +
> +#define add(A, B) ((A) + (B))
> +#define sub(A, B) ((A) - (B))
> +#define max(A, B) ((A) > (B) ? (A) : (B))
> +#define min(A, B) ((A) < (B) ? (A) : (B))
> +#define and(A, B) ((A) & (B))
> +#define ior(A, B) ((A) | (B))
> +#define xor(A, B) ((A) ^ (B))
> +
> +#define N 121
> +
> +#define DEF_LOOP(TYPE, CMPTYPE, OP)                            \
> +  void __attribute__((noipa))                                  \
> +  f_##OP##_##TYPE (TYPE *restrict dest, CMPTYPE *restrict cond,        \
> +                  CMPTYPE limit, TYPE src2v, TYPE elsev)       \
> +  {                                                            \
> +    TYPE induc = 0;                                            \
> +    for (unsigned int i = 0; i < N; ++i, induc += 1)           \
> +      {                                                                \
> +       TYPE truev = OP (induc, src2v);                         \
> +       dest[i] = cond[i] < limit ? truev : elsev;              \
> +      }                                                                \
> +  }
> +
> +#define FOR_EACH_INT_TYPE(T, TYPE) \
> +  T (TYPE, TYPE, add) \
> +  T (TYPE, TYPE, sub) \
> +  T (TYPE, TYPE, max) \
> +  T (TYPE, TYPE, min) \
> +  T (TYPE, TYPE, and) \
> +  T (TYPE, TYPE, ior) \
> +  T (TYPE, TYPE, xor)
> +
> +#define FOR_EACH_FP_TYPE(T, TYPE, CMPTYPE, SUFFIX) \
> +  T (TYPE, CMPTYPE, add) \
> +  T (TYPE, CMPTYPE, sub) \
> +  T (TYPE, CMPTYPE, __builtin_fmax##SUFFIX) \
> +  T (TYPE, CMPTYPE, __builtin_fmin##SUFFIX)
> +
> +#define FOR_EACH_LOOP(T) \
> +  FOR_EACH_INT_TYPE (T, int8_t) \
> +  FOR_EACH_INT_TYPE (T, int16_t) \
> +  FOR_EACH_INT_TYPE (T, int32_t) \
> +  FOR_EACH_INT_TYPE (T, int64_t) \
> +  FOR_EACH_INT_TYPE (T, uint8_t) \
> +  FOR_EACH_INT_TYPE (T, uint16_t) \
> +  FOR_EACH_INT_TYPE (T, uint32_t) \
> +  FOR_EACH_INT_TYPE (T, uint64_t) \
> +  FOR_EACH_FP_TYPE (T, _Float16, uint16_t, f16) \
> +  FOR_EACH_FP_TYPE (T, float, float, f32) \
> +  FOR_EACH_FP_TYPE (T, double, double, f64)
> +
> +FOR_EACH_LOOP (DEF_LOOP)
> +
> +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */
> +
> +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]+\.b,} 14 } } */
> +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]+\.h,} 18 } } */
> +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]+\.s,} 18 } } */
> +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]+\.d,} 18 } } */
> +
> +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.b, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, p[0-7]/m,} 2 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tsub\tz[0-9]+\.b, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tsub\tz[0-9]+\.h, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tsub\tz[0-9]+\.s, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tsub\tz[0-9]+\.d, p[0-7]/m,} 2 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tsmax\tz[0-9]+\.b, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tsmax\tz[0-9]+\.h, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tsmax\tz[0-9]+\.s, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tsmax\tz[0-9]+\.d, p[0-7]/m,} 1 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tsmin\tz[0-9]+\.b, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tsmin\tz[0-9]+\.h, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tsmin\tz[0-9]+\.s, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tsmin\tz[0-9]+\.d, p[0-7]/m,} 1 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tumax\tz[0-9]+\.b, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tumax\tz[0-9]+\.h, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tumax\tz[0-9]+\.s, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tumax\tz[0-9]+\.d, p[0-7]/m,} 1 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tumin\tz[0-9]+\.b, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tumin\tz[0-9]+\.h, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tumin\tz[0-9]+\.s, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tumin\tz[0-9]+\.d, p[0-7]/m,} 1 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.b, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.h, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.s, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.d, p[0-7]/m,} 2 }
} */
> +
> +/* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.b, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.h, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.s, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.d, p[0-7]/m,} 2 }
} */
> +
> +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.b, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.h, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.s, p[0-7]/m,} 2 }
} */
> +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.d, p[0-7]/m,} 2 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.h, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m,} 1 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.h, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m,} 1 }
} */
> +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m,} 1 }
} */
> +
> +/* { dg-final { scan-assembler-times {\tfmaxnm\tz[0-9]+\.h, p[0-7]/m,} 1
} } */
> +/* { dg-final { scan-assembler-times {\tfmaxnm\tz[0-9]+\.s, p[0-7]/m,} 1
} } */
> +/* { dg-final { scan-assembler-times {\tfmaxnm\tz[0-9]+\.d, p[0-7]/m,} 1
} } */
> +
> +/* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.h, p[0-7]/m,} 1
} } */
> +/* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.s, p[0-7]/m,} 1
} } */
> +/* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.d, p[0-7]/m,} 1
} } */
> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_12_run.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_12_run.c 2018-05-24
09:54:37.510451324 +0100
> @@ -0,0 +1,30 @@
> +/* { dg-do run { target aarch64_sve_hw } } */
> +/* { dg-options "-O2 -ftree-vectorize -ffast-math" } */
> +
> +#include "vcond_12.c"
> +
> +#define TEST_LOOP(TYPE, CMPTYPE, OP)                           \
> +  {                                                            \
> +    TYPE dest[N];                                              \
> +    CMPTYPE cond[N];                                           \
> +    for (unsigned int i = 0; i < N; ++i)                       \
> +      cond[i] = i % 5;                                         \
> +    TYPE src2v = 14;                                           \
> +    TYPE elsev = 17;                                           \
> +    f_##OP##_##TYPE (dest, cond, 3, src2v, elsev);             \
> +    TYPE induc = 0;                                            \
> +    for (unsigned int i = 0; i < N; ++i)                       \
> +      {                                                                \
> +       TYPE if_true = OP (induc, src2v);                       \
> +       if (dest[i] != (i % 5 < 3 ? if_true : elsev))           \
> +         __builtin_abort ();                                   \
> +       induc += 1;                                             \
> +      }                                                                \
> +  }
> +
> +int __attribute__ ((optimize (1)))
> +main (void)
> +{
> +  FOR_EACH_LOOP (TEST_LOOP);
> +  return 0;
> +}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]