[SVE] PR86753

Richard Biener richard.guenther@gmail.com
Tue Oct 15 11:40:00 GMT 2019


On Tue, Oct 15, 2019 at 8:07 AM Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Wed, 9 Oct 2019 at 08:14, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> >
> > On Tue, 8 Oct 2019 at 13:21, Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> > >
> > > Leaving the main review to Richard, just some comments...
> > >
> > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > > > @@ -9774,6 +9777,10 @@ vect_is_simple_cond (tree cond, vec_info *vinfo,
> > > >
> > > >     When STMT_INFO is vectorized as a nested cycle, for_reduction is true.
> > > >
> > > > +   For COND_EXPR<C, T, E> if T comes from masked load, and is conditional
> > > > +   on C, we apply loop mask to result of vector comparison, if it's present.
> > > > +   Similarly for E, if it is conditional on !C.
> > > > +
> > > >     Return true if STMT_INFO is vectorizable in this way.  */
> > > >
> > > >  bool
> > >
> > > I think this is a bit misleading.  But IMO it'd be better not to have
> > > a comment here and just rely on the one in the main function body.
> > > This optimisation isn't really changing the vectorisation strategy,
> > > and the comment could easily get forgotten if things change in future.
> > >
> > > > [...]
> > > > @@ -9999,6 +10006,35 @@ vectorizable_condition (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
> > > >    /* Handle cond expr.  */
> > > >    for (j = 0; j < ncopies; j++)
> > > >      {
> > > > +      tree loop_mask = NULL_TREE;
> > > > +      bool swap_cond_operands = false;
> > > > +
> > > > +      /* Look up if there is a loop mask associated with the
> > > > +      scalar cond, or it's inverse.  */
> > >
> > > Maybe:
> > >
> > >    See whether another part of the vectorized code applies a loop
> > >    mask to the condition, or to its inverse.
> > >
> > > > +
> > > > +      if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> > > > +     {
> > > > +       scalar_cond_masked_key cond (cond_expr, ncopies);
> > > > +       if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > > > +         {
> > > > +           vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > > > +           loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, j);
> > > > +         }
> > > > +       else
> > > > +         {
> > > > +           bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
> > > > +           cond.code = invert_tree_comparison (cond.code, honor_nans);
> > > > +           if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > > > +             {
> > > > +               vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > > > +               loop_mask = vect_get_loop_mask (gsi, masks, ncopies,
> > > > +                                               vectype, j);
> > > > +               cond_code = cond.code;
> > > > +               swap_cond_operands = true;
> > > > +             }
> > > > +         }
> > > > +     }
> > > > +
> > > >        stmt_vec_info new_stmt_info = NULL;
> > > >        if (j == 0)
> > > >       {
> > > > @@ -10114,6 +10153,47 @@ vectorizable_condition (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
> > > >                   }
> > > >               }
> > > >           }
> > > > +
> > > > +       /* If loop mask is present, then AND it with
> > >
> > > Maybe "If we decided to apply a loop mask, ..."
> > >
> > > > +          result of vec comparison, so later passes (fre4)
> > >
> > > Probably better not to name the pass -- could easily change in future.
> > >
> > > > +          will reuse the same condition used in masked load.
> > >
> > > Could be a masked store, or potentially other things too.
> > > So maybe just "will reuse the masked condition"?
> > >
> > > > +
> > > > +          For example:
> > > > +          for (int i = 0; i < 100; ++i)
> > > > +            x[i] = y[i] ? z[i] : 10;
> > > > +
> > > > +          results in following optimized GIMPLE:
> > > > +
> > > > +          mask__35.8_43 = vect__4.7_41 != { 0, ... };
> > > > +          vec_mask_and_46 = loop_mask_40 & mask__35.8_43;
> > > > +          _19 = &MEM[base: z_12(D), index: ivtmp_56, step: 4, offset: 0B];
> > > > +          vect_iftmp.11_47 = .MASK_LOAD (_19, 4B, vec_mask_and_46);
> > > > +          vect_iftmp.12_52 = VEC_COND_EXPR <vec_mask_and_46,
> > > > +                                            vect_iftmp.11_47, { 10, ... }>;
> > > > +
> > > > +          instead of recomputing vec != { 0, ... } in vec_cond_expr  */
> > >
> > > That's true, but gives the impression that avoiding the vec != { 0, ... }
> > > is the main goal, whereas we could do that just by forcing a three-operand
> > > COND_EXPR.  It's really more about making sure that vec != { 0, ... }
> > > and its masked form aren't both live at the same time.  So maybe:
> > >
> > >              instead of using a masked and unmasked forms of
> > >              vect__4.7_41 != { 0, ... } (masked in the MASK_LOAD,
> > >              unmasked in the VEC_COND_EXPR).  */
> > >
> > Hi Richard,
> > Thanks for the suggestions, I have updated comments in the attached patch.
> Hi,
> The attached patch is rebased on trunk, and after PR91532 fix, the
> hunk for fmla_2.c is no
> longer required.

Hmm.  So we already record some mask info - you just add in addition
to that the scalar predicate representing the mask.  I wonder if you can
integrate that into the existing vec_loop_masks vector instead of
adding another data structure on the side?  Not that I am understanding
the existing fully masked code at all (or specifically what it computes
as nscalars_per_iter, etc. ... :/).  At least add the new vinfo member
right to the other masks related field.

I still fail to understand this in full so defering to Richard who added
all this stuff.

Richard.

> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > > Thanks,
> > > Richard



More information about the Gcc-patches mailing list