This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [SVE] PR86753
- From: Prathamesh Kulkarni <prathamesh dot kulkarni at linaro dot org>
- To: Richard Sandiford <richard dot sandiford at arm dot com>
- Cc: Richard Biener <richard dot guenther at gmail dot com>, gcc Patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 17 Oct 2019 22:14:49 -0700
- Subject: Re: [SVE] PR86753
- References: <CAAgBjMnTn_3hiTxFPW-=QBp3=Pq0oCx1OhUbdRKwN9bWkGQ_UQ@mail.gmail.com> <mpt8srests5.fsf@arm.com> <CAAgBjM=oZjX18-DkRXxPpU0U2Yx3iJwL0NkPN91L52yg4b7PSg@mail.gmail.com> <mpt1rx6r4tz.fsf@arm.com> <CAAgBjMkL7LKekotcEFzDPej-wvoOAZ+e4EF1J_1qyYWs_B1RsA@mail.gmail.com> <mpt1rx5psqw.fsf@arm.com> <CAFiYyc3FJMy67yFfkBaJuKXuKFrsBOAM3EVn+GSYARYbarHuQg@mail.gmail.com> <CAAgBjMknbQBT-CU-We_mZak5Yn+GLpQqbYs4+pkxLNO57kMh4A@mail.gmail.com> <mpt7e6nce43.fsf@arm.com> <CAAgBjM=BNtAkcj__EhBWqd5FYZgQJiMO_gR4aq49ixdE=_aAJQ@mail.gmail.com> <mpth85laffc.fsf@arm.com> <CAAgBjM=XJ0J9XAARUdX=EFLw67+BOo6ogTQjZuYP-0ryd2+uCQ@mail.gmail.com> <CAAgBjMnrtthG81ERVKWtvDJba9Z0JU_YYa-HY07JSr92+PvNYA@mail.gmail.com> <CAAgBjMm6G78ouFGdwMsZ2ag+OthucqRWP=W518gmxUC0DjZ6ug@mail.gmail.com> <CAAgBjMn7YmxXhPr+wVrE2bwT=DoLhqDr9VSRosJdFBTGOFExZQ@mail.gmail.com> <CAFiYyc0QoG61UgRZ4f0Wh1KHfYo0zyRpgH=9MBJ-cx=KFdQauw@mail.gmail.com> <CAAgBjM=gLz8sSjojUgnPGWCQtZmOQCE7K8JOCkxJqix8-zWCtQ@mail.gmail.com> <mptk19fbrp6.fsf@arm.com> <CAAgBjMn=t0+7JgnnxdxEGvZR4nFewSjy9ADW9aFNk8MYUMu=vw@mail.gmail.com> <CAAgBjMkW0=t=wy4MknXu6jVXhLxkWF9WDzsRbYL7dV+TAQLj-w@mail.gmail.com> <CAFiYyc1Wk_62=8nnyiTix5Fes9DrUhaE4rCcE6H2TtK2Q4SsJA@mail.gmail.com> <mpth849szsy.fsf@arm.com>
On Wed, 16 Oct 2019 at 04:19, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Tue, Oct 15, 2019 at 8:07 AM Prathamesh Kulkarni
> > <prathamesh.kulkarni@linaro.org> wrote:
> >>
> >> On Wed, 9 Oct 2019 at 08:14, Prathamesh Kulkarni
> >> <prathamesh.kulkarni@linaro.org> wrote:
> >> >
> >> > On Tue, 8 Oct 2019 at 13:21, Richard Sandiford
> >> > <richard.sandiford@arm.com> wrote:
> >> > >
> >> > > Leaving the main review to Richard, just some comments...
> >> > >
> >> > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> >> > > > @@ -9774,6 +9777,10 @@ vect_is_simple_cond (tree cond, vec_info *vinfo,
> >> > > >
> >> > > > When STMT_INFO is vectorized as a nested cycle, for_reduction is true.
> >> > > >
> >> > > > + For COND_EXPR<C, T, E> if T comes from masked load, and is conditional
> >> > > > + on C, we apply loop mask to result of vector comparison, if it's present.
> >> > > > + Similarly for E, if it is conditional on !C.
> >> > > > +
> >> > > > Return true if STMT_INFO is vectorizable in this way. */
> >> > > >
> >> > > > bool
> >> > >
> >> > > I think this is a bit misleading. But IMO it'd be better not to have
> >> > > a comment here and just rely on the one in the main function body.
> >> > > This optimisation isn't really changing the vectorisation strategy,
> >> > > and the comment could easily get forgotten if things change in future.
> >> > >
> >> > > > [...]
> >> > > > @@ -9999,6 +10006,35 @@ vectorizable_condition (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
> >> > > > /* Handle cond expr. */
> >> > > > for (j = 0; j < ncopies; j++)
> >> > > > {
> >> > > > + tree loop_mask = NULL_TREE;
> >> > > > + bool swap_cond_operands = false;
> >> > > > +
> >> > > > + /* Look up if there is a loop mask associated with the
> >> > > > + scalar cond, or it's inverse. */
> >> > >
> >> > > Maybe:
> >> > >
> >> > > See whether another part of the vectorized code applies a loop
> >> > > mask to the condition, or to its inverse.
> >> > >
> >> > > > +
> >> > > > + if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> >> > > > + {
> >> > > > + scalar_cond_masked_key cond (cond_expr, ncopies);
> >> > > > + if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> >> > > > + {
> >> > > > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> >> > > > + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, j);
> >> > > > + }
> >> > > > + else
> >> > > > + {
> >> > > > + bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
> >> > > > + cond.code = invert_tree_comparison (cond.code, honor_nans);
> >> > > > + if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> >> > > > + {
> >> > > > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> >> > > > + loop_mask = vect_get_loop_mask (gsi, masks, ncopies,
> >> > > > + vectype, j);
> >> > > > + cond_code = cond.code;
> >> > > > + swap_cond_operands = true;
> >> > > > + }
> >> > > > + }
> >> > > > + }
> >> > > > +
> >> > > > stmt_vec_info new_stmt_info = NULL;
> >> > > > if (j == 0)
> >> > > > {
> >> > > > @@ -10114,6 +10153,47 @@ vectorizable_condition (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
> >> > > > }
> >> > > > }
> >> > > > }
> >> > > > +
> >> > > > + /* If loop mask is present, then AND it with
> >> > >
> >> > > Maybe "If we decided to apply a loop mask, ..."
> >> > >
> >> > > > + result of vec comparison, so later passes (fre4)
> >> > >
> >> > > Probably better not to name the pass -- could easily change in future.
> >> > >
> >> > > > + will reuse the same condition used in masked load.
> >> > >
> >> > > Could be a masked store, or potentially other things too.
> >> > > So maybe just "will reuse the masked condition"?
> >> > >
> >> > > > +
> >> > > > + For example:
> >> > > > + for (int i = 0; i < 100; ++i)
> >> > > > + x[i] = y[i] ? z[i] : 10;
> >> > > > +
> >> > > > + results in following optimized GIMPLE:
> >> > > > +
> >> > > > + mask__35.8_43 = vect__4.7_41 != { 0, ... };
> >> > > > + vec_mask_and_46 = loop_mask_40 & mask__35.8_43;
> >> > > > + _19 = &MEM[base: z_12(D), index: ivtmp_56, step: 4, offset: 0B];
> >> > > > + vect_iftmp.11_47 = .MASK_LOAD (_19, 4B, vec_mask_and_46);
> >> > > > + vect_iftmp.12_52 = VEC_COND_EXPR <vec_mask_and_46,
> >> > > > + vect_iftmp.11_47, { 10, ... }>;
> >> > > > +
> >> > > > + instead of recomputing vec != { 0, ... } in vec_cond_expr */
> >> > >
> >> > > That's true, but gives the impression that avoiding the vec != { 0, ... }
> >> > > is the main goal, whereas we could do that just by forcing a three-operand
> >> > > COND_EXPR. It's really more about making sure that vec != { 0, ... }
> >> > > and its masked form aren't both live at the same time. So maybe:
> >> > >
> >> > > instead of using a masked and unmasked forms of
> >> > > vect__4.7_41 != { 0, ... } (masked in the MASK_LOAD,
> >> > > unmasked in the VEC_COND_EXPR). */
> >> > >
> >> > Hi Richard,
> >> > Thanks for the suggestions, I have updated comments in the attached patch.
> >> Hi,
> >> The attached patch is rebased on trunk, and after PR91532 fix, the
> >> hunk for fmla_2.c is no
> >> longer required.
> >
> > Hmm. So we already record some mask info - you just add in addition
> > to that the scalar predicate representing the mask. I wonder if you can
> > integrate that into the existing vec_loop_masks vector instead of
> > adding another data structure on the side? Not that I am understanding
> > the existing fully masked code at all (or specifically what it computes
> > as nscalars_per_iter, etc. ... :/).
>
> We can AND several different scalar conditions with the same loop
> mask (that's relatively common), and could even AND the same scalar
> condition with different loop masks (although that's less likely).
> So I think having separate info makes sense.
>
> > At least add the new vinfo member right to the other masks related
> > field.
>
> Agree that would be better.
>
> > @@ -10122,6 +10157,48 @@ vectorizable_condition (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
> > }
> > }
> > }
> > +
> > + /* If we decided to apply a loop mask to result of vec
> > + comparison, so later passes will reuse the same condition.
>
> Maybe:
>
> /* If we decided to apply a loop mask to the result of the vector
> comparison, AND the comparison with the mask now. Later passes
> should then be able to reuse the AND results between mulitple
> vector statements.
>
> > + For example:
> > + for (int i = 0; i < 100; ++i)
> > + x[i] = y[i] ? z[i] : 10;
> > +
> > + results in following optimized GIMPLE:
> > +
> > + mask__35.8_43 = vect__4.7_41 != { 0, ... };
> > + vec_mask_and_46 = loop_mask_40 & mask__35.8_43;
> > + _19 = &MEM[base: z_12(D), index: ivtmp_56, step: 4, offset: 0B];
> > + vect_iftmp.11_47 = .MASK_LOAD (_19, 4B, vec_mask_and_46);
> > + vect_iftmp.12_52 = VEC_COND_EXPR <vec_mask_and_46,
> > + vect_iftmp.11_47, { 10, ... }>;
> > +
> > + instead of using a masked and unmasked forms of
> > + vec != { 0, ... } (masked in the MASK_LOAD,
> > + unmasked in the VEC_COND_EXPR). */
>
> The last paragraph uses some space rather than tab indentation.
>
> > +/* If code(T) is comparison op or def of comparison stmt,
> > + extract it's operands.
> > + Else return <NE_EXPR, T, 0>. */
> > +
> > +void
> > +scalar_cond_masked_key::get_cond_ops_from_tree (tree t)
> > +{
>
> Maybe:
>
> /* If the condition represented by T is a comparison or the SSA name
> result of a comparison, extract the comparison's operands. Represent
> T as NE_EXPR <T, 0> otherwise. */
>
> OK with those changes and the one Richard asked for.
Thanks! Committed in r277141.
Thanks,
Prathamesh
>
> Thanks,
> Richard