This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [SLP] SLP vectorization: vectorize vector constructors
- From: Richard Biener <rguenther at suse dot de>
- To: Joel Hutton <Joel dot Hutton at arm dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>
- Date: Tue, 15 Oct 2019 14:11:38 +0200 (CEST)
- Subject: Re: [SLP] SLP vectorization: vectorize vector constructors
- References: <5edb0b00-4ae2-41c0-80ec-76de15d0b110@arm.com>
On Fri, 11 Oct 2019, Joel Hutton wrote:
> Hi Richard,
>
> Thanks for your help, I've reworked my SLP RFC based on your feedback.
> > I think a better place for the loop searching for CONSTRUCTORs is
> > vect_slp_analyze_bb_1 where I'd put it before the check you remove,
> > and I'd simply append found CONSTRUCTORs to the grouped_stores
> > array
> I've moved this check into a separate function and called it from
> vect_slp_analyze_bb_1
>
> > The fixup you do in vectorizable_operation doesn't
> > belong there either, I'd add a new field to the SLP instance
> > structure refering to the CONSTRUCTOR stmt and do the fixup
> > in vect_schedule_slp_instance instead where you can simply
> > replace the CONSTRUCTOR with the vectorized SSA name then.
>
> Done.
>
> > + /* Check that the constructor elements are unique. */
> > + FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val)
> > + {
> > + tree prev_val;
> > + int j;
> > + FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), j,
> > prev_val)
> > + {
> > + if (val == prev_val && i!=j)
> >
> > why's that necessary? (it looks incomplete, also doesn't catch
> > [duplicate] constants)
>
> The thinking was that there was no benefit in vectorizing a constructor of
> duplicates, or a vector of constants, although now you mention it that
> thinking
> may be flawed. I've removed it
>
> > You miss to check that CONSTRUCTOR_NELTS == TYPE_VECTOR_SUBPARTS
> > (we can have omitted trailing zeros).
^^^
I don't see this being handled? You give up on non-SSA names
but not on the omitted trailing zeros.
> ...
> > What happens if you have a vector constructor that is twice
> > as large as the machine supports? The vectorizer will happily
> > produce a two vector SSA name vectorized result but your
> > CONSTRUCTOR replacement doesn't work here. I think this should
> > be made work correctly (not give up on that case).
>
> I've reworked the patch to account for this, by checking that the
> vectorized version
> has one vectorized stmt at the root of the SLP tree. I'm not sure how to
> handle
> a vector constructor twice as large as the machine supports, as far as I
> can see,
> when only analyzing a within a basic block, the SSA name of the
> constructor has to
> be maintained.
You build a CONSTRUCTOR of vectors, thus
_orig_ssa_1 = { vect_part1_2, vect_part2_3, ... };
+
+ if (constructor)
+ {
+ SLP_INSTANCE_ROOT_STMT (new_instance) = stmt_info->stmt;
+ }
+ else
+ SLP_INSTANCE_ROOT_STMT (new_instance) = NULL;
+
too much vertical space, no {} around single-stmt if clauses
@@ -2725,6 +2760,10 @@ vect_bb_slp_scalar_cost (basic_block bb,
stmt_vec_info use_stmt_info = vinfo->lookup_stmt
(use_stmt);
if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info))
{
+ /* Check this is not a constructor that will be
vectorized
+ away. */
+ if (BB_VINFO_GROUPED_STORES (vinfo).contains
(use_stmt_info))
+ continue;
hmm, so why not set the slp type on SLP_INSTANCE_ROOT_STMT instead?
In theory the stmt should be part of the SLP tree itself but that's
probably too awkward to be made work at the moment ;)
vect_ssa_use_outside_bb and vect_slp_check_for_constructors are new
functions which need comments.
+ /* For vector constructors, the same SSA name must be used to maintain
data
+ flow into other basic blocks. */
+ if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance)
+ && SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1
+ && SLP_TREE_VEC_STMTS (node).exists ())
+ {
it should read
/* Vectorize the instance root. */
and be in vect_schedule_slp after the vect_schedule_slp_instance.
As said above you need to handle SLP_TREE_NUMBER_OF_VEC_STMTS > 1,
you also cannot simply do "nothing" here, "failing" vectorization
(well, you probably can - DCE will remove the vectorized code - but
at least if there were calls in the SLP tree they will be mangled
by vectorization so the scalar code is wrecked). SO it should be
if (SLP_INSTANCE_ROOT_STMT (instance))
.. you may not fail to replace the scalar root stmt here ..
+ if (CONSTRUCTOR_NELTS (rhs) == 0)
+ vectorizable = false;
+
if you use continue; you can elide the 'vectorizable' variable.
+ if (!vect_ssa_use_outside_bb (gimple_assign_lhs (stmt)))
+ vectorizable = false;
+
why's that? no comments that clarify ;) The vector may be
used as argument to a call or as source of a store. So I'd simply
remove this check (and the function).
Thanks,
Richard.
> Currently SLP vectorization can build SLP trees starting from reductions or
> from group stores. This patch adds a third starting point: vector
> constructors.
>
> For the following test case (compiled with -O3):
>
> char g_d[1024], g_s1[1024], g_s2[1024];
> void test_loop(void)
> {
> char d = g_d, s1 = g_s1, *s2 = g_s2;
> for ( int y = 0; y < 128; y++ )
>
> {
> for ( int x = 0; x < 16; x++ )
> d[x] = s1[x] + s2[x];
> d += 16;
> }
> }
>
> before patch:
> test_loop:
> .LFB0:
> .cfi_startproc
> adrp x0, g_s1
> adrp x2, g_s2
> add x3, x0, :lo12:g_s1
> add x4, x2, :lo12:g_s2
> ldrb w7, [x2, #:lo12:g_s2]
> ldrb w1, [x0, #:lo12:g_s1]
> adrp x0, g_d
> ldrb w6, [x4, 1]
> add x0, x0, :lo12:g_d
> ldrb w5, [x3, 1]
> add w1, w1, w7
> fmov s0, w1
> ldrb w7, [x4, 2]
> add w5, w5, w6
> ldrb w1, [x3, 2]
> ldrb w6, [x4, 3]
> add x2, x0, 2048
> ins v0.b[1], w5
> add w1, w1, w7
> ldrb w7, [x3, 3]
> ldrb w5, [x4, 4]
> add w7, w7, w6
> ldrb w6, [x3, 4]
> ins v0.b[2], w1
> ldrb w8, [x4, 5]
> add w6, w6, w5
> ldrb w5, [x3, 5]
> ldrb w9, [x4, 6]
> add w5, w5, w8
> ldrb w1, [x3, 6]
> ins v0.b[3], w7
> ldrb w8, [x4, 7]
> add w1, w1, w9
> ldrb w11, [x3, 7]
> ldrb w7, [x4, 8]
> add w11, w11, w8
> ldrb w10, [x3, 8]
> ins v0.b[4], w6
> ldrb w8, [x4, 9]
> add w10, w10, w7
> ldrb w9, [x3, 9]
> ldrb w7, [x4, 10]
> add w9, w9, w8
> ldrb w8, [x3, 10]
> ins v0.b[5], w5
> ldrb w6, [x4, 11]
> add w8, w8, w7
> ldrb w7, [x3, 11]
> ldrb w5, [x4, 12]
> add w7, w7, w6
> ldrb w6, [x3, 12]
> ins v0.b[6], w1
> ldrb w12, [x4, 13]
> add w6, w6, w5
> ldrb w5, [x3, 13]
> ldrb w1, [x3, 14]
> add w5, w5, w12
> ldrb w13, [x4, 14]
> ins v0.b[7], w11
> ldrb w12, [x4, 15]
> add w4, w1, w13
> ldrb w1, [x3, 15]
> add w1, w1, w12
> ins v0.b[8], w10
> ins v0.b[9], w9
> ins v0.b[10], w8
> ins v0.b[11], w7
> ins v0.b[12], w6
> ins v0.b[13], w5
> ins v0.b[14], w4
> ins v0.b[15], w1
> .p2align 3,,7
> .L2:
> str q0, [x0], 16
> cmp x2, x0
> bne .L2
> ret
> .cfi_endproc
> .LFE0:
>
> After patch:
>
> test_loop:
> .LFB0:
> .cfi_startproc
> adrp x3, g_s1
> adrp x2, g_s2
> add x3, x3, :lo12:g_s1
> add x2, x2, :lo12:g_s2
> adrp x0, g_d
> add x0, x0, :lo12:g_d
> add x1, x0, 2048
> ldr q1, [x2]
> ldr q0, [x3]
> add v0.16b, v0.16b, v1.16b
> .p2align 3,,7
> .L2:
> str q0, [x0], 16
> cmp x0, x1
> bne .L2
> ret
> .cfi_endproc
> .LFE0:
>
>
> 2019-10-11 Joel Hutton Joel.Hutton@arm.com
>
> * tree-vect-slp.c (vect_analyze_slp_instance): Add case for vector
> constructors.
> (vect_bb_slp_scalar_cost): Likewise.
> (vect_ssa_use_outside_bb): New function.
> (vect_slp_check_for_constructors): New function.
> (vect_slp_analyze_bb_1): Add check for vector constructors.
> (vect_schedule_slp_instance): Add case to fixup vector constructor
> stmt.
> * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field.
>
>
> gcc/testsuite/ChangeLog:
>
> 2019-10-11 Joel Hutton Joel.Hutton@arm.com
>
> * gcc.dg/vect/bb-slp-40.c: New test.
>
> bootstrapped and regression tested on aarch64-none-linux-gnu
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)