This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Offloading Support in libgomp


> Roughly.  We have 3 directives here,
> #pragma omp target
> #pragma omp target data
> #pragma omp target update
> and all of them have various clauses, some that are allowed at most once
> (e.g. the device clause, if clause) and others that can be used many times
> (the data movement clauses).
> The question is if we want to emit multiple calls for the single directive,
> say one for each data movement clause (where for each one we need address,
> length, direction and some way how to propagate the transformed address
> to the accelerator code), or if we build an array of the data movement
> structures and just pass that down to a single routine.  Because of the
> device clause which should be probably passed just as an integer with -1
> meaning the default, perhaps single routine might be better.
Sure, I used '#pragma omp target' just for a simple example.  The
question about '#pragma omp target data' is still open.  As far as I
understand, all three of the pragmas could require data marshalling (but
not necessary - 'omp target', if it's located inside 'omp target data'
which specifies all needed for 'omp dtarget' variables, won't need any
data marshalling - right?).  This data movement could be done, as you
noted by a single call or a set of calls (one for each clause) - and
while single call seems appealing, it could be better to use separate
calls in case, when e.g. we have a 'target update' for only a subset of
all described in 'target data' variables.  One-call approach has
difficulties with specifying, which subset of the data we want to
update.

> I'd prefer GOMP_target instead of GOMP_offload for the function name, to
> make it clearly related to the corresponding directive.
That makes sense - I just used first name that came to my mind here.

> > GOMP_offload is a call to libgomp, which will be implemented somehow like this:
> >   void GOMP_offload (void (*fn)(void*), void *data, const char *fname)
> >   {
> >     if (gomp_offload_available ())
> 
> This really isn't just check whether accelerator is available, we need to
> query all accelerators in the system (and cache that somehow in the
> library), assign device numbers to individual devices (say, you could have
> two Intel MIC cards, one AMD HSAIL capable GPGPU and 4 Nvidia PTX capable
> GPGPUs or similar), ensure that already assigned device numbers aren't
> reused when discovering new ones and then just check what device user
> requested (if not available, fall back to host), next check see if we
> have corresponding code for that accelerator (again, fallback to host
> otherwise), optionally compile the code if not compiled yet (HSAIL/PTX code
> only) then finally do the name lookup and spawn it.
Multi-target option arises another bunch of questions:)  Could you
please check if my vision of how GCC would handle multiple offload
targets? Here it is:
We have GCC with a set of plugins for compiling code for each available
offloading target.  These plugins work similarly to lto-plugin, i.e.
they consume gimple as the input, but produce a code for the specific
target.  Libgomp also has similar set of plugins for HW specific
implementation of functions for remote running code, data transferring,
getting device status etc.
For example, for Intel MIC, AMD HSAIL and Nvidia PTX we'll have host-GCC
with three plugins and host-libgomp, also with three plugins.
Invoking GCC with options like '-mmic', '-mhsail', '-mptx' triggers
usage of a corresponding plugins in GCC driver.  In result, after the
compilation we'd have four binaries: one for host and three for possible
targets.
Now, libgomp part.  The host binary consists calls to libgomp.so, which
is target-independent (i.e. it's host-specific).  It should be able to
call plugins for all three targets, so in functions like
gomp_offload_available it probably would iterate through all available
plugins, asking for device status and the code availability.  This
iterating would include dlopen of the corresponding plugin, calling a
function from it and moving to the next plugin.
Is this correct?

---
Thanks, Michael

On 22 Aug 16:28, Jakub Jelinek wrote:
> On Thu, Aug 22, 2013 at 06:08:10PM +0400, Michael V. Zolotukhin wrote:
> > We're working on design for offloading support in GCC (part of OpenMP4), and I
> > have a question regarding libgomp part.
> > 
> > Suppose we expand '#pragma omp target' like we expand '#pragma omp parallel',
> > i.e. the compiler expands the following code:
> >   #pragma omp target
> >   {
> >     body;
> >   }
> > to this:
> >   void subfunction (void *data)
> >   {
> >     use data;
> >     body;
> >   }
> > 
> >   setup data;
> >   function_name = "subfunction";
> >   GOMP_offload (subfunction, &data, function_name);
> 
> Roughly.  We have 3 directives here,
> #pragma omp target
> #pragma omp target data
> #pragma omp target update
> and all of them have various clauses, some that are allowed at most once
> (e.g. the device clause, if clause) and others that can be used many times
> (the data movement clauses).
> I'd prefer GOMP_target instead of GOMP_offload for the function name, to
> make it clearly related to the corresponding directive.
> The question is if we want to emit multiple calls for the single directive,
> say one for each data movement clause (where for each one we need address,
> length, direction and some way how to propagate the transformed address
> to the accelerator code), or if we build an array of the data movement
> structures and just pass that down to a single routine.  Because of the
> device clause which should be probably passed just as an integer with -1
> meaning the default, perhaps single routine might be better.
> 
> > GOMP_offload is a call to libgomp, which will be implemented somehow like this:
> >   void GOMP_offload (void (*fn)(void*), void *data, const char *fname)
> >   {
> >     if (gomp_offload_available ())
> 
> This really isn't just check whether accelerator is available, we need to
> query all accelerators in the system (and cache that somehow in the
> library), assign device numbers to individual devices (say, you could have
> two Intel MIC cards, one AMD HSAIL capable GPGPU and 4 Nvidia PTX capable
> GPGPUs or similar), ensure that already assigned device numbers aren't
> reused when discovering new ones and then just check what device user
> requested (if not available, fall back to host), next check see if we
> have corresponding code for that accelerator (again, fallback to host
> otherwise), optionally compile the code if not compiled yet (HSAIL/PTX code
> only) then finally do the name lookup and spawn it.
> Stuff specific to the HW should be in libgomp plugins IMHO, so we have one
> dlopenable module for each of the 3 variants, where one fn in the plugin
> would be about checking what HW is available, one about trying to run the
> code etc.
> 
> 	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]