[PATCH][RFC] Add x86 subadd SLP pattern

Uros Bizjak ubizjak@gmail.com
Thu Jun 17 09:54:19 GMT 2021


On Thu, Jun 17, 2021 at 11:44 AM Richard Biener <rguenther@suse.de> wrote:
>
> This addds SLP pattern recognition for the SSE3/AVX [v]addsubp{ds} v0, v1
> instructions which compute { v0[0] - v1[0], v0[1], + v1[1], ... }
> thus subtract, add alternating on lanes, starting with subtract.
>
> It adds a corresponding optab and direct internal function,
> vec_subadd$a3 and at the moment to make the i386 backend changes
> "obvious", duplicates the existing avx_addsubv4df3 pattern with
> the new canonical name (CODE_FOR_* and gen_* are used throughout the
> intrinsic code, so the actual change to rename all existing patterns
> will be quite a bit bigger).  I expect some bike-shedding on
> subadd vs. addsub so I delay that change ;)
>
> ARC seems to have both addsub{V2HI,V2SI,V4HI} and
> subadd{V2HI,V2SI,V4HI}.  ARM iwmmxt has wsubaddx on V4HI but
> I didn't dare to decipher whether it's -, + or +, -.  bfin
> has both variants as well plus a saturating variant of both (on v2hi).
> But none of the major vector ISAs besides x86 seem to have it.
> Still some targets having both and opting for the naming I choose
> would make that a better fit IMHO.
>
> Uros/Hontao - would mass-renaming of the x86 addsub patterns
> to subadd be OK?  (not touching the FMA ones which have both)

We can add define_expand (this is how we approached the problem in the
past for x86). If there are not too many of them, then this is the
preferred approach (see below) ...

... until someone comes along and mass-renames everything to a generic name.

Uros.

>
> The SLP pattern matches the exact alternating lane sequence rather
> than trying to be clever and anticipating incoming permutes - we
> could permute the two input vectors to the needed lane alternation,
> do the subadd and then permute the result vector back but that's
> only profitable in case the two input or the output permute will
> vanish - something Tamars refactoring of SLP pattern recog should
> make possible (I hope ;)).
>
> So as said, the patch is incomplete on the x86 backend (and test
> coverage) side, but the vectorizer part should be finalized until
> Tamar posts his work.
>
> Bootstrapped / tested on x86_64-unknown-linux-gnu.
>
> 2021-06-17  Richard Biener  <rguenther@suse.de>
>
>         * config/i386/sse.md (vec_subaddv4df3): Duplicate from
>         avx_addsubv4df3.
>         * internal-fn.def (VEC_SUBADD): New internal optab fn.
>         * optabs.def (vec_subadd_optab): New optab.
>         * tree-vect-slp-patterns.c (class subadd_pattern): New.
>         (slp_patterns): Add subadd_pattern.
>         * tree-vectorizer.h (vect_pattern::vect_pattern): Make
>         m_ops optional.
>
>         * gcc.target/i386/vect-subadd-1.c: New testcase.
> ---
>  gcc/config/i386/sse.md                        | 16 +++
>  gcc/internal-fn.def                           |  1 +
>  gcc/optabs.def                                |  1 +
>  gcc/testsuite/gcc.target/i386/vect-subadd-1.c | 35 +++++++
>  gcc/tree-vect-slp-patterns.c                  | 98 +++++++++++++++++++
>  gcc/tree-vectorizer.h                         |  3 +-
>  6 files changed, 153 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-subadd-1.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 94296bc773b..73238c9874a 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -2396,6 +2396,22 @@
>     (set_attr "prefix" "<round_saeonly_scalar_prefix>")
>     (set_attr "mode" "<ssescalarmode>")])
>
> +/* ???  Rename avx_addsubv4df3, but its name is used throughout the backed,
> +   so avoid this churn for the moment.  */
> +(define_insn "vec_subaddv4df3"
> +  [(set (match_operand:V4DF 0 "register_operand" "=x")
> +       (vec_merge:V4DF
> +         (minus:V4DF
> +           (match_operand:V4DF 1 "register_operand" "x")
> +           (match_operand:V4DF 2 "nonimmediate_operand" "xm"))
> +         (plus:V4DF (match_dup 1) (match_dup 2))
> +         (const_int 5)))]
> +  "TARGET_AVX"
> +  "vaddsubpd\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "type" "sseadd")
> +   (set_attr "prefix" "vex")
> +   (set_attr "mode" "V4DF")])

define_expand "vec_subaddv4df3"
  [(set (match_operand:V4DF 0 "register_operand")
       (vec_merge:V4DF
         (minus:V4DF
           (match_operand:V4DF 1 "register_operand")
           (match_operand:V4DF 2 "nonimmediate_operand"))
         (plus:V4DF (match_dup 1) (match_dup 2))
         (const_int 5)))]
  "TARGET_AVX")

>  (define_insn "avx_addsubv4df3"
>    [(set (match_operand:V4DF 0 "register_operand" "=x")
>         (vec_merge:V4DF
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index b2f414d2131..a4578a0293e 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -281,6 +281,7 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
>  DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
>  DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
>  DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
> +DEF_INTERNAL_OPTAB_FN (VEC_SUBADD, ECF_CONST, vec_subadd, binary)
>
>
>  /* FP scales.  */
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index b192a9d070b..cacbd02cfe1 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -407,6 +407,7 @@ OPTAB_D (vec_widen_usubl_hi_optab, "vec_widen_usubl_hi_$a")
>  OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
>  OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
>  OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
> +OPTAB_D (vec_subadd_optab, "vec_subadd$a3")
>
>  OPTAB_D (sync_add_optab, "sync_add$I$a")
>  OPTAB_D (sync_and_optab, "sync_and$I$a")
> diff --git a/gcc/testsuite/gcc.target/i386/vect-subadd-1.c b/gcc/testsuite/gcc.target/i386/vect-subadd-1.c
> new file mode 100644
> index 00000000000..a79e5ba92e2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-subadd-1.c
> @@ -0,0 +1,35 @@
> +/* { dg-do run { target avx_runtime } } */
> +/* { dg-options "-O3 -mavx2 -fdump-tree-slp2" } */
> +
> +double x[4], y[4], z[4];
> +void __attribute__((noipa)) foo ()
> +{
> +  x[0] = y[0] - z[0];
> +  x[1] = y[1] + z[1];
> +  x[2] = y[2] - z[2];
> +  x[3] = y[3] + z[3];
> +}
> +void __attribute__((noipa)) bar ()
> +{
> +  x[0] = y[0] + z[0];
> +  x[1] = y[1] - z[1];
> +  x[2] = y[2] + z[2];
> +  x[3] = y[3] - z[3];
> +}
> +int main()
> +{
> +  for (int i = 0; i < 4; ++i)
> +    {
> +      y[i] = i + 1;
> +      z[i] = 2 * i + 1;
> +    }
> +  foo ();
> +  if (x[0] != 0 || x[1] != 5 || x[2] != -2 || x[3] != 11)
> +    __builtin_abort ();
> +  bar ();
> +  if (x[0] != 2 || x[1] != -1 || x[2] != 8 || x[3] != -3)
> +    __builtin_abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "SUBADD" 1 "slp2" } } */
> diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> index 2ed49cd9edc..61732c53ad7 100644
> --- a/gcc/tree-vect-slp-patterns.c
> +++ b/gcc/tree-vect-slp-patterns.c
> @@ -1490,6 +1490,103 @@ complex_operations_pattern::build (vec_info * /* vinfo */)
>    gcc_unreachable ();
>  }
>
> +
> +/* The subadd_pattern.  */
> +
> +class subadd_pattern : public vect_pattern
> +{
> +  public:
> +    subadd_pattern (slp_tree *node)
> +       : vect_pattern (node, NULL, IFN_VEC_SUBADD) {};
> +
> +    void build (vec_info *);
> +
> +    static vect_pattern*
> +    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> +};
> +
> +vect_pattern *
> +subadd_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_)
> +{
> +  slp_tree node = *node_;
> +  if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
> +      || SLP_TREE_CHILDREN (node).length () != 2)
> +    return NULL;
> +
> +  /* Match a blend of a plus and a minus op with the same number of plus and
> +     minus lanes on the same operands.  */
> +  slp_tree sub = SLP_TREE_CHILDREN (node)[0];
> +  slp_tree add = SLP_TREE_CHILDREN (node)[1];
> +  bool swapped_p = false;
> +  if (vect_match_expression_p (sub, PLUS_EXPR))
> +    {
> +      std::swap (add, sub);
> +      swapped_p = true;
> +    }
> +  if (!(vect_match_expression_p (add, PLUS_EXPR)
> +       && vect_match_expression_p (sub, MINUS_EXPR)))
> +    return NULL;
> +  if (!((SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[0]
> +        && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[1])
> +       || (SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[1]
> +           && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[0])))
> +    return NULL;
> +
> +  for (unsigned i = 0; i < SLP_TREE_LANE_PERMUTATION (node).length (); ++i)
> +    {
> +      std::pair<unsigned, unsigned> perm = SLP_TREE_LANE_PERMUTATION (node)[i];
> +      if (swapped_p)
> +       perm.first = perm.first == 0 ? 1 : 0;
> +      /* It has to be alternating -, +, -, ...
> +        While we could permute the .SUBADD inputs and the .SUBADD output
> +        that's only profitable over the add + sub + blend if at least
> +        one of the permute is optimized which we can't determine here.  */
> +      if (perm.first != (i & 1)
> +         || perm.second != i)
> +       return NULL;
> +    }
> +
> +  if (!vect_pattern_validate_optab (IFN_VEC_SUBADD, node))
> +    return NULL;
> +
> +  return new subadd_pattern (node_);
> +}
> +
> +void
> +subadd_pattern::build (vec_info *vinfo)
> +{
> +  slp_tree node = *m_node;
> +
> +  slp_tree sub = SLP_TREE_CHILDREN (node)[0];
> +  slp_tree add = SLP_TREE_CHILDREN (node)[1];
> +  if (vect_match_expression_p (sub, PLUS_EXPR))
> +    std::swap (add, sub);
> +
> +  /* Modify the blend node in-place.  */
> +  SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0];
> +  SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1];
> +  SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
> +  SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
> +
> +  /* Build IFN_VEC_SUBADD from the sub representative operands.  */
> +  stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub);
> +  gcall *call = gimple_build_call_internal (IFN_VEC_SUBADD, 2,
> +                                           gimple_assign_rhs1 (rep->stmt),
> +                                           gimple_assign_rhs2 (rep->stmt));
> +  gimple_call_set_lhs (call, make_ssa_name
> +                              (TREE_TYPE (gimple_assign_lhs (rep->stmt))));
> +  gimple_call_set_nothrow (call, true);
> +  gimple_set_bb (call, gimple_bb (rep->stmt));
> +  SLP_TREE_REPRESENTATIVE (node) = vinfo->add_pattern_stmt (call, rep);
> +  STMT_VINFO_RELEVANT (SLP_TREE_REPRESENTATIVE (node)) = vect_used_in_scope;
> +  STMT_SLP_TYPE (SLP_TREE_REPRESENTATIVE (node)) = pure_slp;
> +  SLP_TREE_CODE (node) = ERROR_MARK;
> +  SLP_TREE_LANE_PERMUTATION (node).release ();
> +
> +  vect_free_slp_tree (sub);
> +  vect_free_slp_tree (add);
> +}
> +
>  /*******************************************************************************
>   * Pattern matching definitions
>   ******************************************************************************/
> @@ -1502,6 +1599,7 @@ vect_pattern_decl_t slp_patterns[]
>       overlap in what they can detect.  */
>
>    SLP_PATTERN (complex_operations_pattern),
> +  SLP_PATTERN (subadd_pattern)
>  };
>  #undef SLP_PATTERN
>
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 04c20f8bd0f..b9824623ad9 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2100,7 +2100,8 @@ class vect_pattern
>        this->m_ifn = ifn;
>        this->m_node = node;
>        this->m_ops.create (0);
> -      this->m_ops.safe_splice (*m_ops);
> +      if (m_ops)
> +       this->m_ops.safe_splice (*m_ops);
>      }
>
>    public:
> --
> 2.26.2


More information about the Gcc-patches mailing list