This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [gomp4] Accelerator constructs omp lowering and expansion

On Wed, Sep 04, 2013 at 08:54:47PM +0200, Jakub Jelinek wrote:
> Any issues you can spot in the patch?  If not, I'll commit it tomorrow, so
> that I can continue poking in these areas without too much pending stuff.

Committed now.

> Known issues that don't affect host fallback, but will be a problem
> for offloading (Richard or others, any thoughts about that?):
> 1) right now we pass a host fn address and a function name string
>    to GOMP_target.  For host execution we of course only need the fn
>    address, for offloading I guess libgomp will need to dlinfo
>    that fn address to find out what shared library (or binary) contains
>    it and somehow look for the special data section in it.
>    But, right now the *._omp_fn.N functions are always static, so there
>    is nothing to look up by name for Intel MIC, and if we wanted to make
>    the symbol global, we'd need to give it a shlib resp. binary unique
>    name; but if the containing function is not global, how can we do that?
>    Append get_file_function_name to the name?
> 2) much bigger problem seems to be global #pragma omp declare target
>    variables.  Those are supposed to be mapped from the beginning,
>    if they are just copied into the target LTO subset streaming, they will
>    be emitted normally as data variables.  But, unfortunately the
>    runtime must be aware of those mappings, because you can do stuff like:
> #pragma omp declare target
> int v = 1;
> #pragma omp end declare target
> void bar (int *p)
> {
>   #pragma omp target map(to:p[:1])
>   *p++;
> }
> void foo () { bar (&v); }
>    where the runtime should assign target's p copy the value of target's
>    v variable.  Or even for say #pragma omp target update.
>    So, on the host side, we need to prepare the triplets of host var
>    address, var size and during linking somehow supply it info on how
>    to create the target address, and have some function (one per
>    shared library resp. binary) that would locate the data section
>    within the shared library/binary for the requested accelerator
>    and with another argument call some libgomp function to initialize
>    the device data environment for the given shared library/binary
>    and accelerator.  Perhaps we could pass the address of such (.hidden)
>    function as yet another argument to GOMP_target call and
>    --as-needed link it from some *.a library?

Perhaps both of the above could be solved by (though, dunno about the
feasibility of doing it in the linker plugin):

1) GCC passes
extern char __OPENMP_TARGET__[] __attribute__((visibility ("hidden"), weak));
as the 3rd argument of GOMP_target call, rather than the name of the
function (and NULL on targets that don't support weak symbols or named
sections or visibility).  NULL will mean always fall back to host execution,
no support for offloading.
2) GCC emits struct { void *addr; uintptr_t size; } pairs into
   section for any non-extern vars declared in the TU with "omp declare target"
   attribute and for any *.omp_fn* functions passed to GOMP_target calls
   (with size 0).  For TREE_PUBLIC vars, perhaps we should emit !TREE_PUBLIC
   aliases and store those into the section instead.
3) during .gnu.target_lto* streaming, either those .gnu.openmp_target
   additions would be streamed too as is, or modified (only the addresses,
   leaving out sizes, which don't make sense)
4) when compiling target code and linking it, it would be important not to
   reorder the individual .gnu.openmp_target section additions in any way,
   so there is a 1:1 correspondence between the host array and target array
5) the lto linker plugin, when special handling the .gnu.target_lto*
   sections would also create the __OPENMP_TARGET__ symbol at the start
   of some magic section, fill in the section header at that point
   (likely some version number, number of supported targets, and for each
   of them address of the data blob in memory (or filename if it can't be
   in memory?), blob size, and pointer to the array of target addresses
   (unrelocated or something), plus the host .gnu.openmp_target content
   as array)
Then GOMP_target, if it sees NULL, would just do host fallback execution,
if it sees non-NULL in that argument and the device doesn't have that
loaded yet, it would read the info there to see if the target is supported,
find out how to load it (COI dlopen from memory, compile and ??? for
HSAIL/PTX), and initialize the mapping data structure from the host address,
size pairs in that array plus target array (after relocation?).
Then, for the function address, GOMP_target would just look up the function
address passed to it in the target mapping data structure.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]