[RFC] Offloading Support in libgomp

Jakub Jelinek jakub@redhat.com
Tue Sep 10 15:15:00 GMT 2013

On Tue, Sep 10, 2013 at 07:01:26PM +0400, Michael V. Zolotukhin wrote:
> I continued playing with plugins for libgomp, and I have several questions
> regarding that:
> 1) Would it be ok, at least for the beginning, if we'd look for plugins in a
> folder, specified by some environment variable?  A plugin would be considered
> as suitable, if it's named "*.so" and if dlsym finds a certain set of functions
> in it (e.g. "device_available", "offload_function" - names are subjected to
> change of course).

Trying to dlopen random libraries is bad, so when libgomp dlopens something,
it better should be a plugin and not something else.
I'd suggest that the name should be matching libgomp-plugin-*.so.1 or
similar wildcard.

> 2) We need to perform all libgomp initialization once at the first entry to
> libgomp.  Should we add corresponding checks to all GOMP_* routines or should
> the compiler add calls to GOMP_init (which also needs to be introduced) by
> itself before all other calls to libgomp?

Why?  If this is the plugin stuff, then IMNSHO it should be initialized only
on the first call to GOMP_target{,_data,_update} or omp_get_num_devices.
Just use pthread_once to initialize it just once.

> 3) Also, would it be ok if we store libgomp status (already initialized or not)
> in some static variable?  I haven't seen such examples in the existing code
> base, so I don't sure it is a good way to go.


> 4) We'll need to store some information about available devices:
>   - a search tree with data about mapping

For the search tree, I was going to actually implement it myself, but got
interrupted this week with work on UDRs again.  I wanted to write just
temporarily a dummy device that would execute on the host, but remap all
memory to something allocated elsewhere in the same address space by malloc.
Sure, #pragma omp declare target vars wouldn't work that way, but otherwise
it could work fine.  Each device that would have a flag set that it doesn't
have shared address space between host and device (I belive HSAIL might have
shared address space, host fallback of course has shared address space,
the rest do not?) would have its own splay tree plus some host mutex to
guard accesses to the tree.

>   - corresponding plugin handler
>   - handlers for functions from the corresponding plugin
>   - maybe some other info

> I guess that's a bad idea to store all this data in some static-sized global
> variables, and it's better to dynamically allocate memory for that.  But it
> implies that we need to care about deallocation, which should be called at some
> moment on the program end.  Shouldn't we introduce something like
> GOMP_deinitialize and insert calls to it during the compilation?

We don't need to care about deallocation, if it is not per-host-thread
stuff, but per-device stuff.  If we wanted, we could add some magic function
for valgrind that could be called (like e.g. glibc has), but it is
definitely not very important and we don't do it right now for parallels

> 5) We mentioned that similar to a tree data-structure for storing info about
> mapping.  Am I getting it correctly, that currently there is no such
> data-structure at all and we need to design and implement it from scratch?

See above.


More information about the Gcc mailing list