This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Offloading Support in libgomp


> > 1.2.  Linking
> > 
> > When all source files are compiled, a linker is invoked.  The linker is passed
> > a special option to invoke openmp-plugin.  The plugin is responsible for
> > producing target-side executables - for each target it calls the corresponding
> > target compiler and linker.
> 
> I thought the idea was to just use the LTO plugin for that, teach it to
> handle the .gnu.target_lto* sections specially.
Well, I think we don't have a decision here yet - we'll try one of the options
and see if it goes well.

> > The target-side GCC is invoked to load Gimple IR from .gnu.target_lto sections
> > of the FAT-object and compile it to target-side objects which later will be
> > used by target-side linker.
> > 
> > The host-side linker needs libgomp along side with standard libraries like
> > libc/libm to successfully resolve symbols, generated by the host compiler.  The
> > target-side linker needs CRT.O, containing main-routine for target-side
> > executable and target-specific versions of standard libraries.
> 
> I'd say main shouldn't be linked into the shared libraries (well, for MIC)
> that you put into the binaries resp. shared libraries, but into special
> section inside of libgomp.so from which the runtime would upload it into the
> binary.  Because, if you have main in every shared library, what do you do
> if you have multiple shared libraries with offloading code in it?
> What the plugin will need to do is for each of the shared libraries in the
> link, extract from the special sections the embedded target shared libraries
> (or executables) into temporary files and pass that to the target linker, so
> that if you have say
> #pragma omp declare target
> extern int foo;
> extern void bar (void);
> #pragma omp declare target
> in one shared library and the definitions thereof in a different one, you
> can link that together.
That's true.  However, for now I do not consider offloading code in shared
objects, though it is surely an important task for future and should be taken
into account now.

> > 1.3.  Execution
> > 
> > Host-side executable contains calls to libgomp library, which interfaces all
> > interactions with target-devices.
> > On loading, the executable calls GOMP_target_init from libgomp.so, which will
> 
> No.  The first call to omp_get_num_devices, GOMP_target, GOMP_target_data
> or GOMP_target_update (using pthread_once) scans for the available target
> devices, there is no GOMP_target_init, 
That's true.  I wrote this when I wasn't aware of our current approach.

> ... and the first GOMP_target,
> GOMP_target_data or GOMP_target_update from a particular shared library or
> binary (all of them will have __OPENMP_TARGET__ weak hidden symbol as one of
> the arguments) offloads the embedded shared library into target (resp.
> compiles HSAIL/PTX and uploads or whatever).
What is that __OPENMP_TARGET__ argument?  Is it an address of section with
target code or something like that?  I am not sure I am completely clear with
this part.  (Please also find my other questions/comments below).

> > load the target executables onto target-devices and start them.  Since this
> > moment, the devices are ready to execute requested code and interact with the
> > main host-process.
> > 
> > When a host-side program calls libgomp functions related to the offloading,
> > libgomp decides, whether it's profitable to offload, and which device to choose
> > for that.  In order to do that, libgomp calls available plugins and checks
> > which devices are ready to execute offloaded code.  Available plugins should be
> > located in a specified folder and should implement a certain interface.
> > 
> > Another important function of libgomp is host-target memory mapping and keeping
> > information about mapped regions and their types.
> 
> The only "type" info needed is the copy_from flag, all the rest happens on
> mapping memory to the device, so don't need to be tracked afterwards.
Yes, that's what was meant here.

> > TBD: probably, it's better to 'hard-code' available plugin during build of
> > libgomp (e.g., at configure step).
> 
> That is certainly doable.
Yes, it is doable, but do we want to do this or scanning some folders for
suitable plugins is sufficient for us?

> 
> > 2.  LINKER PLUGINS INFRASTRUCTURE
> > 
> > 2.1.  Overview
> > 
> > When -flto or -fopenmp option is given to the GCC driver, linker plugin
> > invocation is triggered.  The plugin claims the input files containing
> > .gnu.lto* or .gnu.target_lto* sections for further processing and creates
> > resolutions file.
> > After this preliminary work, LTO-wrapper is called.  It is responsible for
> > sequential calls of GCC.
> > 
> > The first call is needed to run WPA, which performs usual LTO partitioning as
> > well as partitioning of OpenMP-target sections.  WPA reads bytecode of:
> >   1) all functions and variables with "omp declare target" attribute;
> >   2) the outlined bodies of #pragma omp target turned into '*.ompfn' functions;
> 
> 1) and 2) is basically the same, because omp expansion adds "omp declare
> target" attribute to the outlined bodies too.
> 
> I don't see why you want a WPA phase, at least when not also -flto.
> IMNSHO you want to compile each .gnu.target_lto* set of input sections
> individually, using one target compiler driver invocation, that will
> generate object files and you just link them together.
Yes, you are right here.  WPA would only be invoked when '-flto' is given and it
will work just as usual.  The target compilers would be called by LTO-wrapper
independently on WPA.

> 
> > 3.  OPENMP PRAGMA TARGET HANDLING IN MIDDLE-END
> > 
> > Middle end work is done in two omp passes.  Specifically, omp-lower pass:
> >   * Creates outlined function with no body
> >   * Adds #pragma omp return in the end of the region
> >   * Creates empty struct args_data
> >   * For each var referenced in clauses  (e.g. int i):
> >     -  Adds entry to data_arr, data_sizes and data_kind arrays describing this
> >        variable, its size and mapping type
> >     -  Adds assignment before call to outlined function : args_data.i = &i
> >     -  Replace uses of i with uses of args_data->i inside the region
> > 
> > Then, omp-expand pass:
> >   * Moves the region to the outlined function
> >   * Adds a call to libGOMP to maybe offload this function:
> >     GOMP_target (condition /* evaluated expression from IF clause */,
> > 		 device_no /* a number from DEVICE clause */,
> > 		 foo, .foo.,
> > 		 data_arr, data_size, data_kinds, 1 /* size of arrays */);
> 
> Except for the #pragma omp declare target arrays creation all this is
> implemented, the GOMP_target arguments have different order and some aren't
> present, just the current fnname argument will be in fact __OPENMP_TARGET__,
> i.e. either NULL, or address of a section which contains some info on what
> offloading targets are supported in the current binary resp. shared library,
> where to find them, where to find their mapping etc.
Yep, many of this is already implemented, and some things even look differently
now, but this document was designed as a documentation for what and how is done,
to make life of future contributors easier.

> > GOMP_target routine takes additional arguments:
> >   * Address of the host version of outlined function.  It is used when runtime
> > decides to perform host fallback instead of offloading to an accelerator.
> >   * Name of the target version of outlined function.  This is used when runtime
> > decides to offload.  It cannot directly call a function on a target device, so
> > it calls the corresponding plugin and gives it a function name to invoke.
> 
> See above and my earlier mail why a name is a bad idea.  You will look up
> the { fnaddr, fnaddr + 1 } address range in the target mapping structure
> instead.
So, fnaddr is the host function address, right?  Then we are looking for it in
the splay tree and find the corresponding address on the target side, correct?
What do we map for the functions?
Also, AFAIK COI needs a name passed to it to run offloaded function, so we might
want to keep the name anyway.

> > GOMP_target, GOMP_target_data, GOMP_target_data_end, GOMP_target_update routines
> > performs maintaining of a global structure describing current mapping, which
> > will be covered in the next section, and an actual data marshalling:
> >   * GOMP_target copies regions with kind TO or TOFROM to device before
> > offloading and copies regions with kind FROM or TOFROM from the device when the
> > offloading is finished.  In case of host fallback no copying is performed.
> >   * GOMP_target_data copies regions with kind TO or TOFROM to the device.
> >   * GOMP_target_data_end copies regions with kind FROM or TOFROM from the
> > device.
> >   * GOMP_target_data_update copies all regions according to their types to and
> > from the device.
> 
> All this is implemented, just it will need to be changed to use device hooks
> to perform the actual target allocation/deallocation/copyto/copyfrom
> operations.
> > 
> > 4.2.  Maintaining info about mapped regions
> > 
> > Every libGOMP routine dealing with mapped memory regions is responsible for an
> > accurate maintaining of a global data structure describing this mapping.  This
> > data structure is a binary search tree containing structures 
> > struct
> >   {
> >     void *host_address;
> >     void *target_address;
> >     size_t region_size;
> >     enum {TO, TOFROM, FROM, ALLOC} region_type;
> >   }
> > with host addresses used as a key.
> > 
> > The data structure allows to check whether a given host address is mapped, or
> > not.  In order to do that, on every request it needs to find out whether the
> > requested interval is covered with already mapped ones and check if all of them
> > have a corresponding type.
> 
> This is again already implemented, just the splay tree and lock will need
> moving from a global variable into the device descriptor.
Yep, but again, this document was intended to describe decisions we've chosen
for implementation of OpenMP4 offloading support.

In general, what do you think, is it worth maintaining such document (and
probably later upload it as a wiki page) or we don't need it and it's better
just be dropped?

Michael
> 
> 	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]