This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?

From: Sebastian Pop <sebpop at gmail dot com>
To: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
Cc: Thomas Schwinge <thomas at codesourcery dot com>, ams at codesourcery dot com, Jakub Jelinek <jakub at redhat dot com>
Date: Mon, 15 Oct 2018 15:22:29 -0500
Subject: Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?
References: <yxfpefcvjcfm.fsf@hertz.schwinge.homeip.net> <alpine.DEB.2.21.1810122051520.11775@stedding.saclay.inria.fr>

On Fri, Oct 12, 2018 at 2:14 PM Marc Glisse <marc.glisse@inria.fr> wrote:

> On Fri, 12 Oct 2018, Thomas Schwinge wrote:
>
> > Hmm, and without any OpenACC/OpenMP etc., actually the same problem is
> > also present when running the following code through the vectorizer:
> >
> >    for (int tmp = 0; tmp < N_J * N_I; ++tmp)
> >      {
> >        int j = tmp / N_I;
> >        int i = tmp % N_I;
> >        a[j][i] = 0;
> >      }
> >
> > ... whereas the following variant (obviously) does vectorize:
> >
> >    int a[NJ * NI];
> >
> >    for (int tmp = 0; tmp < N_J * N_I; ++tmp)
> >      a[tmp] = 0;
>
> I had a quick look at the difference, and a[j][i] remains in this form
> throughout optimization. If I write instead *((*(a+j))+i) = 0; I get
>
>    j_10 = tmp_17 / 1025;
>    i_11 = tmp_17 % 1025;
>    _1 = (long unsigned int) j_10;
>    _2 = _1 * 1025;
>    _3 = (sizetype) i_11;
>    _4 = _2 + _3;
>
> or for a power of 2
>
>    j_10 = tmp_17 >> 10;
>    i_11 = tmp_17 & 1023;
>    _1 = (long unsigned int) j_10;
>    _2 = _1 * 1024;
>    _3 = (sizetype) i_11;
>    _4 = _2 + _3;
>
> and in both cases we fail to notice that _4 = (sizetype) tmp_17; (at least
> I think that's true).
>
>
If this folding is correct, the dependence analysis would not have
to handle array accesses with div and mod, and it would be able
to classify the loop as parallel which will enable vectorization.


> So there are missing match.pd transformations in addition to whatever
> scev/ivdep/other work is needed.
>
> --
> Marc Glisse
>

References:
- Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?
  - From: Thomas Schwinge
- Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?
  - From: Marc Glisse

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]