This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [SVE] PR86753
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Prathamesh Kulkarni <prathamesh dot kulkarni at linaro dot org>
- Cc: gcc Patches <gcc-patches at gcc dot gnu dot org>, Richard Sandiford <richard dot sandiford at arm dot com>
- Date: Wed, 14 Aug 2019 18:50:54 +0200
- Subject: Re: [SVE] PR86753
- References: <CAAgBjMnTn_3hiTxFPW-=QBp3=Pq0oCx1OhUbdRKwN9bWkGQ_UQ@mail.gmail.com> <CAFiYyc0=MQ4p7974RsQawcgQNEU8p8w10MB6Q24A_1ZS5KKHsg@mail.gmail.com>
On Wed, Aug 14, 2019 at 6:49 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Wed, Aug 14, 2019 at 5:06 PM Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> >
> > Hi,
> > The attached patch tries to fix PR86753.
> >
> > For following test:
> > void
> > f1 (int *restrict x, int *restrict y, int *restrict z)
> > {
> > for (int i = 0; i < 100; ++i)
> > x[i] = y[i] ? z[i] : 10;
> > }
> >
> > vect dump shows:
> > vect_cst__42 = { 0, ... };
> > vect_cst__48 = { 0, ... };
> >
> > vect__4.7_41 = .MASK_LOAD (vectp_y.5_38, 4B, loop_mask_40);
> > _4 = *_3;
> > _5 = z_12(D) + _2;
> > mask__35.8_43 = vect__4.7_41 != vect_cst__42;
> > _35 = _4 != 0;
> > vec_mask_and_46 = mask__35.8_43 & loop_mask_40;
> > vect_iftmp.11_47 = .MASK_LOAD (vectp_z.9_44, 4B, vec_mask_and_46);
> > iftmp.0_13 = 0;
> > vect_iftmp.12_50 = VEC_COND_EXPR <vect__4.7_41 != vect_cst__48,
> > vect_iftmp.11_47, vect_cst__49>;
> >
> > and following code-gen:
> > L2:
> > ld1w z0.s, p2/z, [x1, x3, lsl 2]
> > cmpne p1.s, p3/z, z0.s, #0
> > cmpne p0.s, p2/z, z0.s, #0
> > ld1w z0.s, p0/z, [x2, x3, lsl 2]
> > sel z0.s, p1, z0.s, z1.s
> >
> > We could reuse vec_mask_and_46 in vec_cond_expr since the conditions
> > vect__4.7_41 != vect_cst__48 and vect__4.7_41 != vect_cst__42
> > are equivalent, and vect_iftmp.11_47 depends on vect__4.7_41 != vect_cst__48.
> >
> > I suppose in general for vec_cond_expr <C, T, E> if T comes from masked load,
> > which is conditional on C, then we could reuse the mask used in load,
> > in vec_cond_expr ?
> >
> > The patch maintains a hash_map cond_to_vec_mask
> > from <cond, loop_mask -> vec_mask (with loop predicate applied).
> > In prepare_load_store_mask, we record <cond, loop_mask> -> vec_mask & loop_mask,
> > and in vectorizable_condition, we check if <cond, loop_mask> exists in
> > cond_to_vec_mask
> > and if found, the corresponding vec_mask is used as 1st operand of
> > vec_cond_expr.
> >
> > <cond, loop_mask> is represented with cond_vmask_key, and the patch
> > adds tree_cond_ops to represent condition operator and operands coming
> > either from cond_expr
> > or a gimple comparison stmt. If the stmt is not comparison, it returns
> > <ne_expr, lhs, 0> and inserts that into cond_to_vec_mask.
> >
> > With patch, the redundant p1 is eliminated and sel uses p0 for above test.
> >
> > For following test:
> > void
> > f2 (int *restrict x, int *restrict y, int *restrict z, int fallback)
> > {
> > for (int i = 0; i < 100; ++i)
> > x[i] = y[i] ? z[i] : fallback;
> > }
> >
> > input to vectorizer has operands swapped in cond_expr:
> > _36 = _4 != 0;
> > iftmp.0_14 = .MASK_LOAD (_5, 32B, _36);
> > iftmp.0_8 = _4 == 0 ? fallback_12(D) : iftmp.0_14;
> >
> > So we need to check for inverted condition in cond_to_vec_mask,
> > and swap the operands.
> > Does the patch look OK so far ?
> >
> > One major issue remaining with the patch is value numbering.
> > Currently, it does value numbering for entire function using sccvn
> > during start of vect pass, which is too expensive since we only need
> > block based VN. I am looking into that.
>
> Why do you need it at all? We run VN on the if-converted loop bodies btw.
Also I can't trivially see the equality of the masks and probably so can't VN.
Is it that we just don't bother to apply loop_mask to VEC_COND but there's
no harm if we do?
Richard.
> Richard.
>
> >
> > Thanks,
> > Prathamesh