This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: OpenACC support in 4.9
- From: Tobias Burnus <burnus at net-b dot de>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: Richard Biener <richard dot guenther at gmail dot com>, Torvald Riegel <triegel at redhat dot com>, Jeff Law <law at redhat dot com>, Evgeny Gavrin <e dot gavrin at samsung dot com>, gcc at gcc dot gnu dot org, GarbuzovViacheslav <v dot garbuzov at samsung dot com>, dtemirbulatov at gmail dot com
- Date: Fri, 10 May 2013 12:06:15 +0200
- Subject: Re: OpenACC support in 4.9
- References: <51879F4E dot 10402 at samsung dot com> <5187B30F dot 1050709 at net-b dot de> <5187C958 dot 9020606 at redhat dot com> <CAFiYyc3bnnFL=k8w-ZqJnL3UtQrFjdNmrdNmiA7mCiuGVtK_aQ at mail dot gmail dot com> <5188C310 dot 5050305 at net-b dot de> <CAFiYyc24FM9Z9meh2DF94bv3VV0gaUgP-nc3CWAZXotBb4ZA0w at mail dot gmail dot com> <CAFiYyc0mattq5enLL+DQxwviob=5qpOjOA5JQWZoE9CDFJQVzg at mail dot gmail dot com> <1368044713 dot 7774 dot 1408 dot camel at triegel dot csb> <CAFiYyc20fgf2hfaqDQzEbwAea0-+=LJL4PQX7X9ndWOwVWor7Q at mail dot gmail dot com> <20130510091511 dot GM1377 at tucnak dot redhat dot com>
Jakub Jelinek wrote:
[Fallback generation of CPU code]
If one uses the OpenMP 4.0 accelerator pragmas, then that is the required
behavior, if the code is for whatever reason not possible to run on the
accelerator, it should be executed on host [...]
(I haven't checked, but is this a compile time or run-time requirement?)
Otherwise, the OpenMP runtime as well as the pragmas have a way to choose which accelerator you want to run something on, as device id (integer), so the OpenMP runtime library should maintain the list of supported accelerators (say if you have two Intel MIC cards, and two AMD GPGPU devices), and probably we'll need a compiler switch to say for which kinds of accelerators we want to generate code for, plus the runtime could have dlopened plugins for each of the accelerator kinds.
At least two OpenACC implementations I know fail hard when the GPU is
not available (nonexisting or if the /dev/... has not the right
permissions). And three of them fail at compile time with an error
message if an expression within a device section is not possible (e.g.
calling some nondevice/noninlinable function).
While it is convenient to have CPU fallback, it would be nice to know
whether some code actually uses the accelerator - both at compile time
and at run time. Otherwise, one thinks the the GPU is used - without
realizing that it isn't because, e.g. the device permissions are wrong -
or one forgot to declare a certain function as target function.
Besides having a flag which tells the compiler for which accelerator the
code should be generated, also additional flags should be handled, e.g.
for different versions of the accelerator. For instance, one accelerator
model of the same series might support double-precision variables while
another might not. - I assume that falling back to the CPU if the
accelerator doesn't support a certain feature won't work and one will
get an error in this case.
Is there actually the need to handle multiple accelerators
simultaneously? My impression is that both OpenACC and OpenMP 4 assume
that there is only one kind of accelerator available besides the host.
If I missed some fine print or something else requires that there are
multiple different accelerators, it will get more complicated -
especially for those code section where the user didn't explicitly
specify which one should be used.
Finally, one should think about debugging. It is not really clear (to
me) how to handle this best, but as the compiler generates quite some
additional code (e.g. for copying the data around) and as printf
debugging doesn't work on GPUs, it is not that easy. I wonder whether
there should be an optional library like libgomp_debug which adds
additional sanity checks (e.g. related to copying data to/from the GPU)
and which allows to print diagnostic output, when one sets an
environment variables.
Tobias