This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?


On Mon, Oct 15, 2018 at 11:11 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Oct 15, 2018 at 10:55:26AM +0200, Richard Biener wrote:
> > Yeah.  Note this still makes the IVs not analyzable since i now effectively
> > becomes wrapping in the inner loop.  For some special values we might
> > get away with a wrapping CHREC in a bit-precision type but we cannot
> > represent wrapping at some (possibly non-constant) value.
> >
> > So - collapsing loops is a bad idea.  Why's that done anyways?
>
> Because the standards (both OpenMP and OpenACC) mandate that if one uses
> collapse(2) or more.  The semantics is that that many nested loops form a
> larger iteration space then and that is then handled according to the rules
> of the particular construct.  Sometimes it can be very much beneficial,
> sometimes less so, but e.g. with OpenMP user has the option to say what they
> want.  They can e.g. do:
>   #pragma omp distribute
>   for (int i = 0; i < M; i++)
>     #pragma omp parallel for
>     for (int j = 0; j < N; j++)
>       #pragma omp simd
>       for (int k = 0; k < O; k++)
>         do_something (i, j, k);
> and that way distribute the outermost loop, parallelize the middle one and
> vectorize the innermost one, or they can do:
>   #pragma omp distribute parallel for simd collapse (3)
>   for (int i = 0; i < M; i++)
>     for (int j = 0; j < N; j++)
>       for (int k = 0; k < O; k++)
>         do_something (i, j, k);
> and let the implementation split the M x N x O iteration space itself (or
> use clauses to say how exactly it is done).  Say if O is very large and N
> small and there are many cores, it might be more beneficial to parallelize
> it more, etc.
> If we come up with some way to help the vectorizer with the collapsed loop,
> whether in a form of some loop flags, or internal fns, whatever, I'm all for
> it.

But isn't _actual_ collapsing an implementation detail?  That is, isn't it
enough to interpret clauses in terms of the collapse result?

That is, can we delay the actual collapsing until after vectorization
for example?

Richard.

>
>         Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]