[SVE] PR86753

Wed Aug 14 21:22:00 GMT 2019

Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Aug 14, 2019 at 6:49 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
>>
>> On Wed, Aug 14, 2019 at 5:06 PM Prathamesh Kulkarni
>> <prathamesh.kulkarni@linaro.org> wrote:
>> >
>> > Hi,
>> > The attached patch tries to fix PR86753.
>> >
>> > For following test:
>> > void
>> > f1 (int *restrict x, int *restrict y, int *restrict z)
>> > {
>> >   for (int i = 0; i < 100; ++i)
>> >     x[i] = y[i] ? z[i] : 10;
>> > }
>> >
>> > vect dump shows:
>> >   vect_cst__42 = { 0, ... };
>> >   vect_cst__48 = { 0, ... };
>> >
>> >   vect__4.7_41 = .MASK_LOAD (vectp_y.5_38, 4B, loop_mask_40);
>> >   _4 = *_3;
>> >   _5 = z_12(D) + _2;
>> >   mask__35.8_43 = vect__4.7_41 != vect_cst__42;
>> >   _35 = _4 != 0;
>> >   vec_mask_and_46 = mask__35.8_43 & loop_mask_40;
>> >   vect_iftmp.11_47 = .MASK_LOAD (vectp_z.9_44, 4B, vec_mask_and_46);
>> >   iftmp.0_13 = 0;
>> >   vect_iftmp.12_50 = VEC_COND_EXPR <vect__4.7_41 != vect_cst__48,
>> > vect_iftmp.11_47, vect_cst__49>;
>> >
>> > and following code-gen:
>> > L2:
>> >         ld1w    z0.s, p2/z, [x1, x3, lsl 2]
>> >         cmpne   p1.s, p3/z, z0.s, #0
>> >         cmpne   p0.s, p2/z, z0.s, #0
>> >         ld1w    z0.s, p0/z, [x2, x3, lsl 2]
>> >         sel     z0.s, p1, z0.s, z1.s
>> >
>> > We could reuse vec_mask_and_46 in vec_cond_expr since the conditions
>> > vect__4.7_41 != vect_cst__48 and vect__4.7_41 != vect_cst__42
>> > are equivalent, and vect_iftmp.11_47 depends on vect__4.7_41 != vect_cst__48.
>> >
>> > I suppose in general for vec_cond_expr <C, T, E> if T comes from masked load,
>> > which is conditional on C, then we could reuse the mask used in load,
>> > in vec_cond_expr ?
>> >
>> > The patch maintains a hash_map cond_to_vec_mask
>> > from <cond, loop_mask -> vec_mask (with loop predicate applied).
>> > In prepare_load_store_mask, we record <cond, loop_mask> -> vec_mask & loop_mask,
>> > and in vectorizable_condition, we check if <cond, loop_mask> exists in
>> > cond_to_vec_mask
>> > and if found, the corresponding vec_mask is used as 1st operand of
>> > vec_cond_expr.
>> >
>> > <cond, loop_mask> is represented with cond_vmask_key, and the patch
>> > adds tree_cond_ops to represent condition operator and operands coming
>> > either from cond_expr
>> > or a gimple comparison stmt. If the stmt is not comparison, it returns
>> > <ne_expr, lhs, 0> and inserts that into cond_to_vec_mask.
>> >
>> > With patch, the redundant p1 is eliminated and sel uses p0 for above test.
>> >
>> > For following test:
>> > void
>> > f2 (int *restrict x, int *restrict y, int *restrict z, int fallback)
>> > {
>> >   for (int i = 0; i < 100; ++i)
>> >     x[i] = y[i] ? z[i] : fallback;
>> > }
>> >
>> > input to vectorizer has operands swapped in cond_expr:
>> >   _36 = _4 != 0;
>> >   iftmp.0_14 = .MASK_LOAD (_5, 32B, _36);
>> >   iftmp.0_8 = _4 == 0 ? fallback_12(D) : iftmp.0_14;
>> >
>> > So we need to check for inverted condition in cond_to_vec_mask,
>> > and swap the operands.
>> > Does the patch look OK so far ?
>> >
>> > One major issue remaining with the patch is value  numbering.
>> > Currently, it does value numbering for entire function using sccvn
>> > during start of vect pass, which is too expensive since we only need
>> > block based VN. I am looking into that.
>>
>> Why do you need it at all?  We run VN on the if-converted loop bodies btw.

This was my suggestion, but with the idea being to do the numbering
per-statement as we vectorise.  We'll then see pattern statements too.

That's important because we use pattern statements to set the right
vector boolean type (e.g. vect_recog_mask_conversion_pattern).
So some of the masks we care about don't exist after if converison.

> Also I can't trivially see the equality of the masks and probably so
> can't VN.  Is it that we just don't bother to apply loop_mask to
> VEC_COND but there's no harm if we do?

Yeah.  The idea of the optimisation is to decide when it's more profitable
to apply the loop mask, even though doing so isn't necessary.  It would
be hard to do after vectorisation because the masks aren't equivalent.
We're relying on knowledge of how the vectoriser uses the result.

Thanks,
Richard