[OpenACC] Performance issues on simple example program
Christopher Guckes
chris@guckes-webstyle.com
Tue Jun 21 17:27:00 GMT 2016
I'm currently capable of compiling and running the PI example from
http://scelementary.com/2015/04/25/openacc-in-gcc.html with the current
GCC 6.1.0. The GPU version of the code is much slower than the CPU
version and I can't figure out why. I didn't have this problem with GCC
5.3.0 before.
The code looks as follows:
#include <stdio.h>
#include <stdlib.h>
#define N 200000000
int main(void) {
double pi = 0.0f;
long long i;
#pragma acc data copyout(pi)
{
#pragma acc parallel loop reduction (+:pi) present (pi)
for (i=0; i<N; i++) {
double t= (double)((i+0.5)/N);
pi +=4.0/(1.0+t*t);
}
}
printf("pi=%11.10f\n",pi/N);
return 0;
}
The GPU version takes about four times as long as the CPU version of the
code. I used the NVIDIA visual profiler to ensure it wasn't a copy
operation that tanked the runtime. Copying was measured at 0.1% while
the kernel itself runs for about six seconds on a GTX 970. The profiler
tells me that the occupancy is at 1.6% giving the grid size as the
limiting factor. I'm quite new to GPU code, so I'm not sure what to do
about that. The original sample code used a vector length of 1024, the
default seems to be 32 in the current GCC 6.1.0 version. When I try to
set the vector length to 1024 manually it warns me that it will ignore
that. What else can I try to get this to run faster?
Thanks in advance
Chris
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <https://gcc.gnu.org/pipermail/gcc-help/attachments/20160621/bde56171/attachment.sig>
More information about the Gcc-help
mailing list