[SLP] SLP vectorization: vectorize vector constructors

Richard Biener rguenther@suse.de
Mon Nov 4 16:27:00 GMT 2019


On November 4, 2019 4:30:57 PM GMT+01:00, Joel Hutton <Joel.Hutton@arm.com> wrote:
>First copy bounced from mailing list.
> > > On 30/10/2019 13:49, Richard Biener wrote:
> > >  >>
>> >  >>          * expr.c (store_constructor): Add case for constructor
>of
> > > vectors.
> > >  > Why do you need this?  The vectorizer already creates such 
>CTORs.  Any
> > >  > testcase that you can show?
> > >  >
> > >  > typedef long v2di __attribute__((vector_size(16)));
> > >  > v2di v;
> > >  > void
> > >  > foo()
> > >  > {
> > >  >    v = (v2di){v[1], v[0]};
> > >  > }
>> > There is a special case for single element vectors, where the
>vector
> > > mode and
> > > the element mode are the same.
>> > I've changed this slightly, as your solution of using VECTOR_MODE_P
>
>didn't
> > > work for my case where:
> > >    emode = E_DImode
> > >    eltmode = E_DImode
> > > On aarch64. As E_DImode is not a vector mode.
>> > Based on some checks I found in verify_gimple, I set the type of
>the
> > > replacement
>> > constructor to the same as the original constructor, as
>verify_gimple
> > > seems to
>> > expect flattened types. i.e. a vector of vectors of ints should
>have
> > > type vector
> > > of ints.
> >
> > Huh.  On aarch64 for the above testcase I see mode V2DImode and
> > emode = eltmode = DImode.  That's just a regular CTOR with
> > non-vector elements so not sure why you need any adjustment at all
> > here?
> >
> > It looks like your vectorization of the CTOR introduces a
> > V2DImode CTOR of vector(1) long elements which incidentially
> > have DImode.  That's because somehow V2DI vectorization isn't
> > profitable but V1DI one is.  In the end it's a noop transform
> > but the testcase shows that for V2DI vectorization we fail
> > to cost the CTOR in the scalar cost computation (you have to
> > include the pattern root there I guess).
> >
>Yes, sorry, I was unclear about this, it's the new constructor that the
>
>change is needed for.
>
> > Technically we feed valid GIMPLE to the expander so we indeed
> > shouldn't ICE.  So I'd like to have the earlier keying on
> > vec_vec_init_p match the later assert we run into, thus
> > sth like
> >
> > Index: gcc/expr.c
> > ===================================================================
> > --- gcc/expr.c  (revision 277765)
> > +++ gcc/expr.c  (working copy)
> > @@ -6809,6 +6809,7 @@ store_constructor (tree exp, rtx target,
> >             && n_elts.is_constant (&const_n_elts))
> >           {
> >             machine_mode emode = eltmode;
> > +           bool vector_typed_elts_p = false;
> >
> >             if (CONSTRUCTOR_NELTS (exp)
> >                 && (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (exp,
> > 0)->value))
> > @@ -6819,13 +6820,14 @@ store_constructor (tree exp, rtx target,
>>                                       * TYPE_VECTOR_SUBPARTS (etype),
> >                                       n_elts));
> >                 emode = TYPE_MODE (etype);
> > +               vector_typed_elts_p = true;
> >               }
>>             icode = convert_optab_handler (vec_init_optab, mode,
>emode);
> >             if (icode != CODE_FOR_nothing)
> >               {
> >                 unsigned int n = const_n_elts;
> >
> > -               if (emode != eltmode)
> > +               if (vector_typed_elts_p)
> >                   {
> >                     n = CONSTRUCTOR_NELTS (exp);
> >                     vec_vec_init_p = true;
> >
> >
>I've used this, thank you.
>
> > >  > @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo)
> > >  >        if (is_a <loop_vec_info> (vinfo))
> > >  >
>> >  > the ChangeLog doesn't mention this.  I guess this isn't
>necessary
> > >  > unless you fail vectorizing late which you shouldn't.
> > >  >
> > > It's necessary to avoid:
> > >
> > >          removing stmts twice for constructors of the form 
>{_1,_1,_1,_1}
>> >          removing stmts that define ssa names used elsewhere (which
>> > previously wasn't a consideration because the scalar_stmts were
>stores
> > > to memory, not assignments to ssa names)
> >
> > OK, but the code you are patching is just supposed to remove stores
> > from the final scalar stmt seeds - it doesn't expect any loads there
> > which is probably what happens.  I'd instead add a
> >
> >    /* Do not CSE the stmts feeding a CTOR, they might have uses
> >       outside of the vectorized stmts.  */
> >    if (SLP_INSTANCE_ROOT_STMT (instance))
> >      continue;
> >
> > before the loop over SLP_TREE_SCALAR_STMTS.
>Done.
>
> > +         if (!subparts.is_constant () || !(subparts.to_constant ()
>> +                                           == CONSTRUCTOR_NELTS
>(rhs)))
> > +           continue;
> >
> > can be better written as if (maybe_ne (subparts, CONSTRUCTOR_NELTS 
>(rhs)))
>Done.
>
> > +/* Vectorize the instance root.  Return success or failure. */
> > +
> > +void
> >
> > since it's now void the comment has to be adjusted.
>Done.
>
> > +      vect_schedule_slp_instance (node,
> >                                   instance, bst_map);
> >
> > this now fits in one line.
>Done.
>
> > OK with those changes.
>Tested and bootstrapped on aarch64.
>OK?

Ok. 

Thanks, 
Richard. 

>2019-11-04  Joel Hutton  <Joel.Hutton@arm.com>
>
>          * expr.c (store_constructor): Modify to handle single element
>vectors.
>          * tree-vect-slp.c (vect_analyze_slp_instance): Add case for
>vector constructors.
>          (vect_slp_check_for_constructors): New function.
>         (vect_slp_analyze_bb_1): Call new function to check for vector
>constructors.
>          (vectorize_slp_instance_root_stmt): New function.
>          (vect_schedule_slp): Call new function to vectorize root stmt
>of vector constructors.
>          * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field.
>
>gcc/testsuite/ChangeLog:
>
>2019-11-04  Joel Hutton  <Joel.Hutton@arm.com>
>
>          * gcc.dg/vect/bb-slp-40.c: New test.
>          * gcc.dg/vect/bb-slp-41.c: New test.



More information about the Gcc-patches mailing list