[PATCH] Improve { x, x + 3, x + 6, x + 9 } expansion (take 2)
Richard Biener
rguenther@suse.de
Thu Nov 21 13:29:00 GMT 2013
On Thu, 21 Nov 2013, Jakub Jelinek wrote:
> On Thu, Nov 21, 2013 at 07:43:35AM +1000, Richard Henderson wrote:
> > On 11/20/2013 07:44 PM, Jakub Jelinek wrote:
> > > On Wed, Nov 20, 2013 at 10:31:38AM +0100, Richard Biener wrote:
> > >> Aww ;) Nice improvement. Generally when I see this I always wonder
> > >> whether we want to do this kind of stuff pre RTL expansion.
> > >> 1st to not rely on being able to TER, 2nd to finally eventually
> > >> get rid of TER.
> > >>
> > >> These patches are unfortunately a step backward for #2.
> > >>
> > >> As of the patch, do we have a way to query whether the target
> > >> can efficiently broadcast? If so this IMHO belongs in generic
> > >
> > > We don't. Perhaps if we'd add optab for vec_dup<mode> and mentioned
> > > clearly in the documentation that it should be used only if it is reasonably
> > > efficient. But still, even with optab, it would probably better to do it
> > > in the veclower* passes than in the vectorizer itself.
> >
> > I think we can assume that broadcast is relatively efficient, whether or not
> > vec_dup is present. I'd lean to making the transformation generic to start
> > with, so that you don't need extra handling in the i386 backend.
>
> Ok, here is a generic veclower implementation without looking at any optabs,
> so far only handles PLUS_EXPR, what operation other than MULT_EXPR would
> make sense here? Though, handling MULT_EXPR also would complicate the code
> slightly (it would need to handle say:
> _2 = _1(D) + 1;
> _3 = _2 + 2;
> _4 = _3 * 2;
> _5 = _4 * 3;
> _6 = { _3, _4, _5, _4 };
> where we could start thinking first the operation is PLUS_EXPR, but it
> actually is MULT_EXPR with _3 as base). Also, for MULT_EXPR, supposedly
> we could handle some values to be constant 0, like in:
> _2 = _1(D) * 5;
> _3 = _2 * 2;
> _4 = _1(D) * 10;
> _5 = { _3, 0, _4, _2, _1(D), 0, _4, _2 };
>
> Bootstrap/regtest pending, ok at least for this for the start and can be
> improved later on?
Ok, this should catch most of the vectorizer cases.
Zero could also be handled for PLUS_EXPR, likewise one for MULT_EXPR.
I think for induction it's common to have { base, base + 1, base + 2, ...
}
more comments below
> 2013-11-21 Jakub Jelinek <jakub@redhat.com>
>
> * tree-vect-generic.c (optimize_vector_constructor): New function.
> (expand_vector_operations_1): Call it.
>
> * gcc.dg/vect/vect-124.c: New test.
>
> --- gcc/tree-vect-generic.c.jj 2013-11-19 21:56:40.000000000 +0100
> +++ gcc/tree-vect-generic.c 2013-11-21 11:17:55.146118161 +0100
> @@ -988,6 +988,84 @@ expand_vector_operation (gimple_stmt_ite
> gimple_assign_rhs1 (assign),
> gimple_assign_rhs2 (assign), code);
> }
> +
> +/* Try to optimize
> + a_5 = { b_7, b_7 + 3, b_7 + 6, b_7 + 9 };
> + style stmts into:
> + _9 = { b_7, b_7, b_7, b_7 };
> + a_5 = _9 + { 0, 3, 6, 9 };
> + because vector splat operation is usually more efficient
> + than piecewise initialization of the vector. */
> +
> +static void
> +optimize_vector_constructor (gimple_stmt_iterator *gsi)
> +{
> + gimple stmt = gsi_stmt (*gsi);
> + tree lhs = gimple_assign_lhs (stmt);
> + tree rhs = gimple_assign_rhs1 (stmt);
> + tree type = TREE_TYPE (rhs);
> + unsigned int i, j, nelts = TYPE_VECTOR_SUBPARTS (type);
> + bool all_same = true;
> + constructor_elt *elt;
> + tree *cst;
> + gimple g;
> + tree base = NULL_TREE;
> +
> + if (nelts <= 2 || CONSTRUCTOR_NELTS (rhs) != nelts)
> + return;
> + FOR_EACH_VEC_SAFE_ELT (CONSTRUCTOR_ELTS (rhs), i, elt)
> + if (TREE_CODE (elt->value) != SSA_NAME
> + || TREE_CODE (TREE_TYPE (elt->value)) == VECTOR_TYPE)
> + return;
> + else
> + {
> + tree this_base = elt->value;
> + if (this_base != CONSTRUCTOR_ELT (rhs, 0)->value)
> + all_same = false;
> + for (j = 0; j < nelts + 1; j++)
> + {
> + g = SSA_NAME_DEF_STMT (this_base);
> + if (is_gimple_assign (g)
> + && gimple_assign_rhs_code (g) == PLUS_EXPR
> + && TREE_CODE (gimple_assign_rhs2 (g)) == INTEGER_CST
> + && TREE_CODE (gimple_assign_rhs1 (g)) == SSA_NAME
> + && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_assign_rhs1 (g)))
> + this_base = gimple_assign_rhs1 (g);
> + else
> + break;
> + }
why loop here? Do you want to catch base + 1 + 2? I think that's
hiding a missed optimization elsewhere for no good reason.
> + if (i == 0)
> + base = this_base;
> + else if (this_base != base)
> + return;
> + }
> + if (all_same)
> + return;
> + cst = XALLOCAVEC (tree, nelts);
> + for (i = 0; i < nelts; i++)
> + {
> + tree this_base = CONSTRUCTOR_ELT (rhs, i)->value;;
> + cst[i] = build_zero_cst (TREE_TYPE (base));
> + while (this_base != base)
> + {
> + g = SSA_NAME_DEF_STMT (this_base);
> + cst[i] = fold_binary (PLUS_EXPR, TREE_TYPE (base),
> + cst[i], gimple_assign_rhs2 (g));
> + if (cst[i] == NULL_TREE
> + || TREE_CODE (cst[i]) != INTEGER_CST
> + || TREE_OVERFLOW (cst[i]))
> + return;
> + this_base = gimple_assign_rhs1 (g);
> + }
Likewise.
I'd be more comfortable with leaving this magic out.
Ok with that change.
Thanks,
Richard.
> + }
> + for (i = 0; i < nelts; i++)
> + CONSTRUCTOR_ELT (rhs, i)->value = base;
> + g = gimple_build_assign (make_ssa_name (type, NULL), rhs);
> + gsi_insert_before (gsi, g, GSI_SAME_STMT);
> + g = gimple_build_assign_with_ops (PLUS_EXPR, lhs, gimple_assign_lhs (g),
> + build_vector (type, cst));
> + gsi_replace (gsi, g, false);
> +}
>
> /* Return a type for the widest vector mode whose components are of type
> TYPE, or NULL_TREE if none is found. */
> @@ -1278,6 +1356,17 @@ expand_vector_operations_1 (gimple_stmt_
> expand_vector_condition (gsi);
> return;
> }
> +
> + if (code == CONSTRUCTOR
> + && TREE_CODE (lhs) == SSA_NAME
> + && VECTOR_MODE_P (TYPE_MODE (TREE_TYPE (lhs)))
> + && !gimple_clobber_p (stmt)
> + && optimize)
> + {
> + optimize_vector_constructor (gsi);
> + return;
> + }
> +
> if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
> return;
>
> --- gcc/testsuite/gcc.dg/vect/vect-124.c.jj 2013-11-20 09:27:35.760798649 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-124.c 2013-11-20 09:29:32.219199794 +0100
> @@ -0,0 +1,28 @@
> +#include "tree-vect.h"
> +
> +#ifndef N
> +#define N 64
> +#endif
> +
> +int a[N];
> +
> +__attribute__((noinline, noclone)) void
> +foo (int x)
> +{
> + int i;
> + for (i = 0; i < N; i++, x += 3)
> + a[i] = x;
> +}
> +
> +int
> +main ()
> +{
> + int i;
> +
> + check_vect ();
> + foo (6);
> + for (i = 0; i < N; i++)
> + if (a[i] != i * 3 + 6)
> + abort ();
> + return 0;
> +}
>
>
> Jakub
>
>
--
Richard Biener <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
More information about the Gcc-patches
mailing list