[PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182

Tue Jun 30 13:04:29 GMT 2020

On Tue, 30 Jun 2020, Richard Sandiford wrote:

> Richard Biener <richard.guenther@gmail.com> writes:
> > On Tue, Jun 30, 2020 at 2:18 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> "Yangfei (Felix)" <felix.yang@huawei.com> writes:
> >> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
> >> > index e050db1a2e4..ea39fcac0e0 100644
> >> > --- a/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
> >> > +++ b/gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
> >> > @@ -1,6 +1,7 @@
> >> >  /* { dg-do compile } */
> >> >  /* { dg-additional-options "-O3" } */
> >> >  /* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } } */
> >> > +/* { dg-additional-options "-march=armv8.2-a+sve -fno-vect-cost-model" { target aarch64*-*-* } } */
> >> >
> >> >  typedef struct {
> >> >      unsigned short mprr_2[5][16][16];
> >>
> >> This test is useful for Advanced SIMD too, so I think we should continue
> >> to test it with whatever options the person running the testsuite chose.
> >> Instead we could duplicate the test in gcc.target/aarch64/sve with
> >> appropriate options.
> >>
> >> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> >> > index eb8288e7a85..b30a7d8a3bb 100644
> >> > --- a/gcc/tree-vect-data-refs.c
> >> > +++ b/gcc/tree-vect-data-refs.c
> >> > @@ -1823,8 +1823,11 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
> >> >               {
> >> >                 poly_uint64 nscalars = (STMT_SLP_TYPE (stmt_info)
> >> >                                         ? vf * DR_GROUP_SIZE (stmt_info) : vf);
> >> > -               possible_npeel_number
> >> > -                 = vect_get_num_vectors (nscalars, vectype);
> >> > +               if (maybe_lt (nscalars, TYPE_VECTOR_SUBPARTS (vectype)))
> >> > +                 possible_npeel_number = 0;
> >> > +               else
> >> > +                 possible_npeel_number
> >> > +                   = vect_get_num_vectors (nscalars, vectype);
> >> >
> >> >                 /* NPEEL_TMP is 0 when there is no misalignment, but also
> >> >                    allow peeling NELEMENTS.  */
> >>
> >> OK, so this is coming from:
> >>
> >>   int s[16][2];
> >>   …
> >>   … =s[j][1];
> >>
> >> and an SLP node containing 16 instances of “s[j][1]”.  The DR_GROUP_SIZE
> >> is 2 because that's the inner dimension of “s”.
> >>
> >> I don't think maybe_lt is right here though.  The same problem could in
> >> principle happen for cases in which NSCALARS > TYPE_VECTOR_SUBPARTS,
> >> e.g. for different inner dimensions of “s”.
> >>
> >> I think the testcase shows that using DR_GROUP_SIZE in this calculation
> >> is flawed.  I'm not sure whether we can really do better given the current
> >> representation though.  This is one case where having a separate dr_vec_info
> >> per SLP node would help.
> >
> > I guess what the code likes to know is what we now have in SLP_TREE_LANES
> > (or formerly group_size) but that's not necessarily connected to DR_GROUP_SIZE.
> > Given we only see a stmt_info here and there's no 1:1 mapping of SLP node
> > to stmt_info (and the reverse mapping doesn't even exist) I do not have
> > any good idea either.
> >
> > Honestly I don't really see what this code tries to do ... doesn't it
> > simply want to set possible_npeel_number to TYPE_VECTOR_SUBPARTS (vectype)?!
> 
> I think it's trying to set possible_npeel_number to the number of
> vector stmts per iteration (i.e. ncopies for non-SLP stuff):
> 
>               /* For multiple types, it is possible that the bigger type access
>                  will have more than one peeling option.  E.g., a loop with two
>                  types: one of size (vector size / 4), and the other one of
>                  size (vector size / 8).  Vectorization factor will 8.  If both
>                  accesses are misaligned by 3, the first one needs one scalar
>                  iteration to be aligned, and the second one needs 5.  But the
>                  first one will be aligned also by peeling 5 scalar
>                  iterations, and in that case both accesses will be aligned.
>                  Hence, except for the immediate peeling amount, we also want
>                  to try to add full vector size, while we don't exceed
>                  vectorization factor.
>                  We do this automatically for cost model, since we calculate
>                  cost for every peeling option.  */
> 
> So for this example, possible_npeel_number==2 for the first access
> (letting us try two peeling values for it) and possible_npeel_number==1
> for the second.

Yeah, but npeel is _scalar_ lanes, since we always peel scalar iterations.
So it seems odd to somehow put in the number of vectors...  so to me
it would have made sense if it did

  possible_npeel_number = lower_bound (nscalars);

or whateveris necessary to make the polys happy.  Thus simply elide
the vect_get_num_vectors call?  But it's been very long since I've
dived into the alignment peeling code...

Richard.