[Bug c++/80859] Performance Problems with OpenMP 4.5 support

jakub at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Fri May 26 15:38:00 GMT 2017


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859

--- Comment #25 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
In the GCC implementation of offloading to PTX, all HW threads in a warp (i.e.
32 of them) are a single OpenMP thread, and one needs to use a simd region
(effectively SIMT) to get useful work done by all all the threads of a warp
rather than just one.
Right now GCC doesn't do auto-SIMTization (but does auto-vectorization on the
host or XeonPhi accelerator etc., but only with -O3 or -O2 -ftree-vectorize;
while with simd constructs you get it even with just -O2 -fopenmp for those
regions), so simd construct is important to get the right performance.  Threads
within a team are the warp groups of threads within a PTX CTA, and different
teams are the CTAs in a CTA grid (in PTX terms).


More information about the Gcc-bugs mailing list