This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?

From: Jakub Jelinek <jakub at redhat dot com>
To: Richard Biener <richard dot guenther at gmail dot com>
Cc: Thomas Schwinge <thomas at codesourcery dot com>, GCC Development <gcc at gcc dot gnu dot org>, Sebastian Pop <sebpop at gmail dot com>, "Stubbs, Andrew" <ams at codesourcery dot com>
Date: Mon, 15 Oct 2018 11:45:47 +0200
Subject: Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?
References: <yxfpefcvjcfm.fsf@hertz.schwinge.homeip.net> <20181012195213.GM11625@tucnak> <CAFiYyc0aj1qFrtXwdBO6SEiu8n3ACpM4T+jAoywPVr6NeaXGaw@mail.gmail.com> <20181015091113.GP11625@tucnak> <CAFiYyc29YGNpQ915FM9c6MuxwC3tmw+q3g389eG7ybfG+KePJw@mail.gmail.com>
Reply-to: Jakub Jelinek <jakub at redhat dot com>

On Mon, Oct 15, 2018 at 11:30:56AM +0200, Richard Biener wrote:
> But isn't _actual_ collapsing an implementation detail?

No, it is required by the standard and in many cases it is very much
observable.
#pragma omp parallel for schedule(nonmonotonic: static, 23) collapse (2)
for (int i = 0; i < 64; i++)
  for (int j = 0; j < 16; j++)
    a[i][j] = omp_get_thread_num ();
The standard says that from the logical iteration space 64 x 16,
first 23 iterations go to the first thread (i.e. i=0, j=0..15 and i=1,
j=0..14), then 23 iterations go to the second thread, etc.
In other constructs, e.g. the new loop construct, it is a request to
distribute, parallelize and vectorize as much as possible with optional
guarantee of no cross-iteration dependencies at all, but even in that case
using the source loops might not be always a win, e.g. the loopnest could be
5 loops and the iteration space might be diagonal or other not exactly
rectangular.

> That is, can we delay the actual collapsing until after vectorization
> for example?

No.  We can come up with some way to propagate some of the original info to
the vectorizer if it helps (or teach vectorizer to recognize whatever we
produce), but the mandatory transformation needs to be done
immediately before optimizations make those impossible.

	Jakub

Follow-Ups:
- Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?
  - From: Richard Biener

References:
- Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?
  - From: Thomas Schwinge
- Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?
  - From: Jakub Jelinek
- Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?
  - From: Richard Biener
- Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?
  - From: Jakub Jelinek
- Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?
  - From: Richard Biener

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]