This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [SVE] PR86753
- From: Prathamesh Kulkarni <prathamesh dot kulkarni at linaro dot org>
- To: Richard Sandiford <richard dot sandiford at arm dot com>
- Cc: Richard Biener <richard dot guenther at gmail dot com>, gcc Patches <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 25 Sep 2019 09:17:28 -0700
- Subject: Re: [SVE] PR86753
- References: <CAAgBjMnTn_3hiTxFPW-=QBp3=Pq0oCx1OhUbdRKwN9bWkGQ_UQ@mail.gmail.com> <CAAgBjM=usye_qrYR58sKxjFdSJwD4hEiZ_2FbF48NPMHQKyupg@mail.gmail.com> <CAFiYyc3ksU2GMcZ9vytPx-4DUhxxhva7mCci4ZKog+KdcAh8Bg@mail.gmail.com> <CAAgBjMm=_L9VoE3mDhFAtemz7_2MDbRiY-10=9yb=7GX9=ZOuA@mail.gmail.com> <mptsgpsxd6j.fsf@arm.com> <CAAgBjMkX786wQ2CgtBOTTY_ejD1Zp=KfmAnT08NFkDtNe3ZLJA@mail.gmail.com> <mptwof4vujx.fsf@arm.com> <CAAgBjMntcR5nBs96aD6_FJPGSjiCJVGsbGEk_WBxhrhgPAOQBA@mail.gmail.com> <CAFiYyc10BkstnK2f6X91S3JGZUX5QDz5zF_WDEiJdLxRRFEvwQ@mail.gmail.com> <CAAgBjM=0MN8j_uC2pp7+S6C2BfQArvtW6u2WDug2pEA4hUAzmw@mail.gmail.com> <mptblwbszdr.fsf@arm.com> <CAFiYyc2YRrjG3onU9BuEFvYCy6Mo=o=9XPdSNFEaswebV0xe_A@mail.gmail.com> <mpt8srests5.fsf@arm.com> <CAAgBjM=oZjX18-DkRXxPpU0U2Yx3iJwL0NkPN91L52yg4b7PSg@mail.gmail.com> <mpt1rx6r4tz.fsf@arm.com> <CAAgBjMkL7LKekotcEFzDPej-wvoOAZ+e4EF1J_1qyYWs_B1RsA@mail.gmail.com> <mpt1rx5psqw.fsf@arm.com> <CAFiYyc3FJMy67yFfkBaJuKXuKFrsBOAM3EVn+GSYARYbarHuQg@mail.gmail.com> <CAAgBjMknbQBT-CU-We_mZak5Yn+GLpQqbYs4+pkxLNO57kMh4A@mail.gmail.com> <mpt7e6nce43.fsf@arm.com> <CAAgBjM=BNtAkcj__EhBWqd5FYZgQJiMO_gR4aq49ixdE=_aAJQ@mail.gmail.com> <mpth85laffc.fsf@arm.com> <CAAgBjM=XJ0J9XAARUdX=EFLw67+BOo6ogTQjZuYP-0ryd2+uCQ@mail.gmail.com> <CAAgBjMnrtthG81ERVKWtvDJba9Z0JU_YYa-HY07JSr92+PvNYA@mail.gmail.com>
On Mon, 16 Sep 2019 at 08:54, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Mon, 9 Sep 2019 at 09:36, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> >
> > On Mon, 9 Sep 2019 at 16:45, Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> > >
> > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > > > With patch, the only following FAIL remains for aarch64-sve.exp:
> > > > FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve
> > > > scan-assembler-times \\tmovprfx\\t 6
> > > > which now contains 14.
> > > > Should I adjust the test, assuming the change isn't a regression ?
> > >
> > > Well, it is kind-of a regression, but it really just means that the
> > > integer code is now consistent with the floating-point code in having
> > > an unnecessary MOVPRFX. So I think adjusting the count is fine.
> > > Presumably any future fix for the existing redundant MOVPRFXs will
> > > apply to the new ones as well.
> > >
> > > The patch looks good to me, just some very minor nits:
> > >
> > > > @@ -8309,11 +8309,12 @@ vect_double_mask_nunits (tree type)
> > > >
> > > > /* Record that a fully-masked version of LOOP_VINFO would need MASKS to
> > > > contain a sequence of NVECTORS masks that each control a vector of type
> > > > - VECTYPE. */
> > > > + VECTYPE. SCALAR_MASK if non-null, represents the mask used for corresponding
> > > > + load/store stmt. */
> > >
> > > Should be two spaces between sentences. Maybe:
> > >
> > > VECTYPE. If SCALAR_MASK is nonnull, the fully-masked loop would AND
> > > these vector masks with the vector version of SCALAR_MASK. */
> > >
> > > since the mask isn't necessarily for a load or store statement.
> > >
> > > > [...]
> > > > @@ -1879,7 +1879,8 @@ static tree permute_vec_elements (tree, tree, tree, stmt_vec_info,
> > > > says how the load or store is going to be implemented and GROUP_SIZE
> > > > is the number of load or store statements in the containing group.
> > > > If the access is a gather load or scatter store, GS_INFO describes
> > > > - its arguments.
> > > > + its arguments. SCALAR_MASK is the scalar mask used for corresponding
> > > > + load or store stmt.
> > >
> > > Maybe:
> > >
> > > its arguments. If the load or store is conditional, SCALAR_MASK is the
> > > condition under which it occurs.
> > >
> > > since SCALAR_MASK can be null here too.
> > >
> > > > [...]
> > > > @@ -9975,6 +9978,31 @@ vectorizable_condition (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
> > > > /* Handle cond expr. */
> > > > for (j = 0; j < ncopies; j++)
> > > > {
> > > > + tree loop_mask = NULL_TREE;
> > > > + bool swap_cond_operands = false;
> > > > +
> > > > + if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> > > > + {
> > > > + scalar_cond_masked_key cond (cond_expr, ncopies);
> > > > + if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > > > + {
> > > > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > > > + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, j);
> > > > + }
> > > > + else
> > > > + {
> > > > + cond.code = invert_tree_comparison (cond.code,
> > > > + HONOR_NANS (TREE_TYPE (cond.op0)));
> > >
> > > Long line. Maybe just split it out into a separate assignment:
> > >
> > > bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
> > > cond.code = invert_tree_comparison (cond.code, honor_nans);
> > >
> > > > + if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > > > + {
> > > > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > > > + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, j);
> > >
> > > Long line here too.
> > >
> > > > [...]
> > > > @@ -10090,6 +10121,26 @@ vectorizable_condition (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
> > > > }
> > > > }
> > > > }
> > > > +
> > > > + if (loop_mask)
> > > > + {
> > > > + if (COMPARISON_CLASS_P (vec_compare))
> > > > + {
> > > > + tree tmp = make_ssa_name (vec_cmp_type);
> > > > + gassign *g = gimple_build_assign (tmp,
> > > > + TREE_CODE (vec_compare),
> > > > + TREE_OPERAND (vec_compare, 0),
> > > d> + TREE_OPERAND (vec_compare, 1));
> > >
> > > Two long lines.
> > >
> > > > + vect_finish_stmt_generation (stmt_info, g, gsi);
> > > > + vec_compare = tmp;
> > > > + }
> > > > +
> > > > + tree tmp2 = make_ssa_name (vec_cmp_type);
> > > > + gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR, vec_compare, loop_mask);
> > >
> > > Long line here too.
> > >
> > > > [...]
> > > > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
> > > > index dc181524744..c4b2d8e8647 100644
> > > > --- a/gcc/tree-vectorizer.c
> > > > +++ b/gcc/tree-vectorizer.c
> > > > @@ -1513,3 +1513,39 @@ make_pass_ipa_increase_alignment (gcc::context *ctxt)
> > > > {
> > > > return new pass_ipa_increase_alignment (ctxt);
> > > > }
> > > > +
> > > > +/* If code(T) is comparison op or def of comparison stmt,
> > > > + extract it's operands.
> > > > + Else return <NE_EXPR, T, 0>. */
> > > > +
> > > > +void
> > > > +scalar_cond_masked_key::get_cond_ops_from_tree (tree t)
> > > > +{
> > > > + if (TREE_CODE_CLASS (TREE_CODE (t)) == tcc_comparison)
> > > > + {
> > > > + this->code = TREE_CODE (t);
> > > > + this->op0 = TREE_OPERAND (t, 0);
> > > > + this->op1 = TREE_OPERAND (t, 1);
> > > > + return;
> > > > + }
> > > > +
> > > > + if (TREE_CODE (t) == SSA_NAME)
> > > > + {
> > > > + gassign *stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (t));
> > > > + if (stmt)
> > > > + {
> > >
> > > Might as well do this as:
> > >
> > > if (TREE_CODE (t) == SSA_NAME)
> > > if (gassign *stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (t)))
> > > {
> > >
> > > The patch (as hoped) introduces some XPASSes:
> > >
> > > XPASS: gcc.target/aarch64/sve/cond_cnot_2.c scan-assembler-not \\tsel\\t
> > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21
> > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42
> > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15
> > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30
> > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21
> > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42
> > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15
> > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30
> > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21
> > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42
> > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15
> > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30
> > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21
> > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42
> > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15
> > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30
> > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmuo\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 252
> > > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmuo\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 180
> > > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0 21
> > > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d 42
> > > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0 15
> > > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s 30
> > > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0 21
> > > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d 42
> > > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0 15
> > > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s 30
> > >
> > > Could you remove the associated xfails (and comments above them where
> > > appropriate)?
> > >
> > > OK with those changes from my POV, but please give Richi a day or so
> > > to object.
> > >
> > > Thanks for doing this.
> > Thanks for the suggestions, I have updated the patch accordingly.
> > Boostrap+test in progress on x86_64-unknown-linux-gnu and aarch64-linux-gnu.
> > Richi, does the patch look OK to you ?
> ping https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00573.html
ping * 2: https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00573.html
Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > > Richard