This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: OpenACC support in 4.9


Jakub Jelinek wrote:
[Fallback generation of CPU code]
If one uses the OpenMP 4.0 accelerator pragmas, then that is the required
behavior, if the code is for whatever reason not possible to run on the
accelerator, it should be executed on host [...]
(I haven't checked, but is this a compile time or run-time requirement?)

Otherwise, the OpenMP runtime as well as the pragmas have a way to choose which accelerator you want to run something on, as device id (integer), so the OpenMP runtime library should maintain the list of supported accelerators (say if you have two Intel MIC cards, and two AMD GPGPU devices), and probably we'll need a compiler switch to say for which kinds of accelerators we want to generate code for, plus the runtime could have dlopened plugins for each of the accelerator kinds.

At least two OpenACC implementations I know fail hard when the GPU is not available (nonexisting or if the /dev/... has not the right permissions). And three of them fail at compile time with an error message if an expression within a device section is not possible (e.g. calling some nondevice/noninlinable function).

While it is convenient to have CPU fallback, it would be nice to know whether some code actually uses the accelerator - both at compile time and at run time. Otherwise, one thinks the the GPU is used - without realizing that it isn't because, e.g. the device permissions are wrong - or one forgot to declare a certain function as target function.

Besides having a flag which tells the compiler for which accelerator the code should be generated, also additional flags should be handled, e.g. for different versions of the accelerator. For instance, one accelerator model of the same series might support double-precision variables while another might not. - I assume that falling back to the CPU if the accelerator doesn't support a certain feature won't work and one will get an error in this case.


Is there actually the need to handle multiple accelerators simultaneously? My impression is that both OpenACC and OpenMP 4 assume that there is only one kind of accelerator available besides the host. If I missed some fine print or something else requires that there are multiple different accelerators, it will get more complicated - especially for those code section where the user didn't explicitly specify which one should be used.


Finally, one should think about debugging. It is not really clear (to me) how to handle this best, but as the compiler generates quite some additional code (e.g. for copying the data around) and as printf debugging doesn't work on GPUs, it is not that easy. I wonder whether there should be an optional library like libgomp_debug which adds additional sanity checks (e.g. related to copying data to/from the GPU) and which allows to print diagnostic output, when one sets an environment variables.

Tobias


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]