This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [gomp4] Some progress on #pragma omp simd
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Aldy Hernandez <aldyh at redhat dot com>
- Cc: "Iyer, Balaji V" <balaji dot v dot iyer at intel dot com>, Richard Henderson <rth at redhat dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Sat, 27 Apr 2013 20:17:35 +0200
- Subject: Re: [gomp4] Some progress on #pragma omp simd
- References: <20130419132957 dot GE12880 at tucnak dot redhat dot com> <5175B40F dot 7040709 at redhat dot com> <20130423135455 dot GN12880 at tucnak dot redhat dot com> <BF230D13CA30DD48930C31D40993300032A37760 at FMSMSX101 dot amr dot corp dot intel dot com> <20130424060117 dot GV12880 at tucnak dot redhat dot com> <20130424062536 dot GW12880 at tucnak dot redhat dot com> <20130424064054 dot GX12880 at tucnak dot redhat dot com> <5178692F dot 2010902 at redhat dot com> <BF230D13CA30DD48930C31D40993300032A39A2B at FMSMSX101 dot amr dot corp dot intel dot com> <517C0B34 dot 3050804 at redhat dot com>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Sat, Apr 27, 2013 at 12:30:28PM -0500, Aldy Hernandez wrote:
> >>"The syntax and semantics of the various simd-openmp-data-clauses
> >>are detailed in the OpenMP specification.
> >>(http://www.openmp.org/mp-documents/spec30.pdf, Section 2.9.3)."
> >>
> >>Balaji, can you verify which is correct? For that matter, which
> >>are the official specs from which we should be basing this work?
> >
> >Privatization clause makes a variable private for the simd lane. In
> >general, I would follow the spec. If you have further questions,
> >please feel free to ask.
>
> Ok, so the Cilk Plus 1.1 spec is incorrectly pointing to the OpenMP
> 3.0 spec, because the OpenMP 3.0 spec has the private clause being
> task/thread private. Since the OpenMP 4.0rc2 explicitly says that
> the private clause is for the SIMD lane (as you've stated), can we
> assume that when the Cilk Plus 1.1 spec mentions OpenMP, it is
> talking about the OpenMP 4.0 spec?
One way we could implement the SIMD private/lastprivate/reduction
vars and for Cilk+ also firstprivate ones might be:
- query the target what the maximum possible vectorization factor for the
loop is (and min that with simdlen if any), let's call it MAXVF
for say
struct S { S (); ~S (); int x; };
...
int a, b;
S s;
#pragma omp simd private (a, s) reduction (+:b)
for (int i = 0; i < N; i++)
{ foo (&a, &s); b += a; }
we'd then emit something like:
int a_[MAXVF], b_[MAXVF];
S s_[MAXVF];
for (tmp = 0; tmp < __builtin_omp.simd_vf (simd_uid); tmp++)
{
b_[tmp] = 0;
S::S (&s_[tmp]);
}
# loop simd_uid with safelen(MAXVF)
for (i = 0; i < N; i++)
{
tmp = __builtin_omp.simd_lane (simd_uid);
foo (&a_[tmp], &s_[tmp]);
b_[tmp] += a_[tmp];
}
for (tmp = 0; tmp < __builtin_omp.simd_vf (simd_uid); tmp++)
{
S::~S (&s[tmp]);
b += b_[tmp];
}
where simd_uid would be some say integer constant, unique to the simd loop
(at least unique within the same function, and perhaps inlining/LTO would
need to remap). The loop simd_uid would be stored by ompexp pass into
the loop structure. Then the vectorizer (ideally, we'd enable vectorization
even when not explicitly disabled through -fno-tree-vectorize for -fopenmp
or -fcilk+, though in that case only for the explicit simd loops) would
treat arrays indexed by __builtin_omp.simd_lane (simd_uid) (dot in the name just
to make it impossible to be used by users) (or marked with some special
hidden attribute or something) specially, allow promoting them to just
vector vars if not addressable, etc., and would record the chosen
vectorization factor in the loop structure, and __builtin_omp.simd_vf
would then expand to the vectorization factor and __builtin_omp.simd_lane to
the number of the lane. If vectorization couldn't be performed on some loop,
__builtin_omp.simd_vf would just be folded into 1 and
__builtin_omp.simd_lane into 0 say by some ompsimd pass run soon after the
vectorization.
Thoughts on this? Or do you see better IL representation of this stuff
from the omp expansion till vectorization? I mean, e.g. for floating point
or user defined reductions it might be important in what order they are
performed (unless -ffast-math for the former).
Jakub