This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [gomp4] Some progress on #pragma omp simd


On Thu, Jun 13, 2013 at 03:15:45PM -0500, Aldy Hernandez wrote:
> 
> >it.  Also, not sure what to do for lastprivate, probably use the magic
> >arrays and just in the epilogue of the loop compute which of the array items
> >belonged to the last iteration somehow.
> 
> Can't you do (for lastprivate(abc) something like:
> 
> 	if (i == 1024) {
> 		abc = magic_abc[__builtin_GOMP.simd_lane (1)];
> 	}

Well, if you do that inside of the loop, you make it probably not
vectorizable.  So you need something like:
abc = magic_abc[(count - 1) & (__builtin_GOMP.simd_vf (1) - 1)];
or so.

> >#pragma omp declare simd
> >__attribute__((noinline, noclone)) void
> >bar (int &x, int &y)
> >{
> >   x += 4;
> >   y += 4;
> >}
> 
> Does bar() have anything to do with this example, or was this an oversight?

It was there just to make the stuff addressable during gimplification, and
possibly no longer addressable afterwards.

> >using the magic arrays and so is reduction.  While the vectorizer can
> >recognize some reductions, e.g. without -ffast-math it will not vectorize
> >any floating point ones because that means changing the order of
> >computations, while when they are mandated to be one copy per simd lane,
> >the order of computations is clear and thus can be vectorized.
> 
> Let me see if I understand (all things floating point confuse me).
> You're saying that the vectorizer, in its present state will refuse
> to vectorize reductions with floats because it may possibly change
> the order of computations, but we should override that behavior for
> OMP simd loops?

No, I'm saying that in simd loops the order of computations is different
(and depending on the vectorization factor), as each SIMD lane is supposed
to have its own private variable and at the end everything is reduced
together.

> >   D.2717[D.2714].s = D.2702;
> >   D.2703 = b[i];
> >   a.0 = a;
> >   D.2705 = a.0 + x;
> >   D.2701 = D.2717[D.2714].s;
> 
> Is there some subtlety in which we have to dereference D.2717 twice
> here, or can we reuse D.2702?

Usually it is FRE/PRE that optimizes at least the loads, and DSE stores,
but FRE/PRE isn't run after vectorization I think.

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]