This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [gomp4] Some progress on #pragma omp simd


On Sat, Apr 27, 2013 at 12:30:28PM -0500, Aldy Hernandez wrote:
> >>"The syntax and semantics of the various simd-openmp-data-clauses
> >>are detailed in the OpenMP specification.
> >>(http://www.openmp.org/mp-documents/spec30.pdf, Section 2.9.3)."
> >>
> >>Balaji, can you verify which is correct?  For that matter, which
> >>are the official specs from which we should be basing this work?
> >
> >Privatization clause makes a variable private for the simd lane. In
> >general,  I would follow the spec. If you have further questions,
> >please feel free to ask.
> 
> Ok, so the Cilk Plus 1.1 spec is incorrectly pointing to the OpenMP
> 3.0 spec, because the OpenMP 3.0 spec has the private clause being
> task/thread private.  Since the OpenMP 4.0rc2 explicitly says that
> the private clause is for the SIMD lane (as you've stated), can we
> assume that when the Cilk Plus 1.1 spec mentions OpenMP, it is
> talking about the OpenMP 4.0 spec?

One way we could implement the SIMD private/lastprivate/reduction
vars and for Cilk+ also firstprivate ones might be:
- query the target what the maximum possible vectorization factor for the
  loop is (and min that with simdlen if any), let's call it MAXVF

for say
struct S { S (); ~S (); int x; };
...
int a, b;
S s;
#pragma omp simd private (a, s) reduction (+:b)
for (int i = 0; i < N; i++)
  { foo (&a, &s); b += a; }
we'd then emit something like:
int a_[MAXVF], b_[MAXVF];
S s_[MAXVF];
for (tmp = 0; tmp < __builtin_omp.simd_vf (simd_uid); tmp++)
  {
    b_[tmp] = 0;
    S::S (&s_[tmp]);
  }
# loop simd_uid with safelen(MAXVF)
for (i = 0; i < N; i++)
  {
    tmp = __builtin_omp.simd_lane (simd_uid);
    foo (&a_[tmp], &s_[tmp]);
    b_[tmp] += a_[tmp];
  }
for (tmp = 0; tmp < __builtin_omp.simd_vf (simd_uid); tmp++)
  {
    S::~S (&s[tmp]);
    b += b_[tmp];
  }

where simd_uid would be some say integer constant, unique to the simd loop
(at least unique within the same function, and perhaps inlining/LTO would
need to remap).  The loop simd_uid would be stored by ompexp pass into
the loop structure.  Then the vectorizer (ideally, we'd enable vectorization
even when not explicitly disabled through -fno-tree-vectorize for -fopenmp
or -fcilk+, though in that case only for the explicit simd loops) would
treat arrays indexed by __builtin_omp.simd_lane (simd_uid) (dot in the name just
to make it impossible to be used by users) (or marked with some special
hidden attribute or something) specially, allow promoting them to just
vector vars if not addressable, etc., and would record the chosen
vectorization factor in the loop structure, and __builtin_omp.simd_vf
would then expand to the vectorization factor and __builtin_omp.simd_lane to
the number of the lane.  If vectorization couldn't be performed on some loop,
__builtin_omp.simd_vf would just be folded into 1 and
__builtin_omp.simd_lane into 0 say by some ompsimd pass run soon after the
vectorization.

Thoughts on this?  Or do you see better IL representation of this stuff
from the omp expansion till vectorization?  I mean, e.g. for floating point
or user defined reductions it might be important in what order they are
performed (unless -ffast-math for the former).

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]