Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

Fri Feb 28 16:23:00 GMT 2014

On 02/28/2014 05:09 PM, Ilya Verbin wrote:
> 2014-02-20 22:27 GMT+04:00 Bernd Schmidt <bernds@codesourcery.com>:
>>   * Functions and variables now go into different tables, otherwise
>>     intermixing between them could be a problem that causes tables to
>>     go out of sync between host and target (imagine one big table being
>>     generated by ptx lto1/mkoffload, and multiple small table fragments
>>     being linked together on the host side).
>
> If you need 2 different tables for funcs and vars, we can also use
> them. But I still don't understand how it will help synchronization
> between host and target tables.

I think it won't help that much - I still think this entire scheme is 
likely to fail on nvptx. I'll try to construct an example at some point.

One other thing about the split tables is that we don't have to write a 
useless size of 1 for functions.

>>   * I've put the begin/end fragments for the host tables into crtstuff,
>>     which seems like the standard way of doing things.
>
> Our plan was that the host side descriptor __OPENMP_TARGET__ will
> contain (in addition to func/var table) pointers to the images for all
> enabled accelerators (e.g. omp_image_nvptx_start and
> omp_image_intelmic_start), therefore we generated it in the
> lto-wrapper.

The concept of "image" is likely to vary somewhat between accelerators. 
For ptx, it's just a string and it can't really be generated the same 
way as for your target where you can manipulate ELF images. So I think 
it is better to have a call to a gomp registration function for every 
offload target. That should also give you the ordering you said you 
wanted between shared libraries.

>>   * Is there a reason to call a register function for the host tables?
>>     The way I've set it up, we register a target function/variable table
>>     while also passing a pointer to the __OPENMP_TARGET__ symbol which
>>     holds information about the host side tables.
>
> In our case we can't register target table with a call to libgomp, it
> can be obtained only from the accelerator. Therefore we propose a
> target-independent approach: during device initialization libgomp
> calls 2 functions from the plugin (or this can be implemented by a
> single function):
> 1. devicep->device_load_image_func, which will load target image (its
> pointer will be taken from the host descriptor);
> 2. devicep->device_get_table_func, which in our case connects to the
> device and receives its table. And in your case it will return
> func_mappings and var_mappings. Will it work for you?

Probably. I think the constructor call to the gomp registration function 
would contain an opaque pointer to whatever data the target wants, so it 
can arrange its image/table data in whatever way it likes.

It would help to see the code you have on the libgomp side, I don't 
believe that's been posted yet?

> Unfortunately I don't fully understand this configure magic... When a
> user specifies 2 or 3 accelerators during configuration with
> --enable-accelerators, will several different accel-gccs be built?

No - the idea is that --enable-accelerator= is likely specific to ptx, 
where we really just want to build a gcc and no target libraries, so 
building it alongside the host in an accel-gcc subdirectory is ideal.

For your use case, I'd imagine the offload compiler would be built 
relatively normally as a full build with 
"--enable-as-accelerator-for=x86_64-linux", which would install it into 
locations where the host will eventually be able to find it. Then the 
host compiler would be built with another new configure option (as yet 
unimplemented in my patch set) "--enable-offload-targets=mic,..." which 
would tell the host compiler about the pre-built offload target 
compilers. On the ptx side, "--enable-accelerator=ptx" would then also 
add ptx to the list of --enable-offload-targets.
Naming of all these configure options can be discussed, I have no real 
preference for any of them.

Bernd