This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[WIP] OpenMP 4 NVPTX support


Hi!

Attached is a minimal patch to get at least a trivial OpenMP 4.0 testcase
offloading to NVPTX (the first patch).  The second patch is WIP, just first
few needed changes to make libgomp to build for NVPTX (several weeks of work
at least).

The following seems to work and the output suggests that it was offloaded to
a non-SHM arch:

int
main ()
{
  int v = 0;
  int *w = 0;
  int x = 0;
#pragma omp target
  {
    v = 6;
    w = &v;
    x = 1; // omp_is_initial_device ();
  }
  __builtin_printf ("%d %p %p %d\n", v, &v, w, x);
  return 0;
}

but already tiny bit more complicated testcase:

extern void *malloc (__SIZE_TYPE__);
extern void free (void *);

int
main ()
{
  int v = 0;
  int *w = 0;
  int x = 0;
#pragma omp target
  {
    v = 6;
    w = &v;
    char *p = malloc (64);
    x = 1; // omp_is_initial_device ();
    free (p);
  }
  __builtin_printf ("%d %p %p %d\n", v, &v, w, x);
  return 0;
}

suggests that while it is nice that when building nvptx accel compiler
we build libgcc.a, libc.a, libm.a, libgfortran.a (and in the future hopefully libgomp.a),
nothing attempts to link those in :(.

Is the plan to link those in at mkoffload time (haven't seen any attempt
of mkoffload to invoke the nvptx-none-ld linker though), or link those in
somehow at link_ptx time in the plugin?
In either case, it isn't clear to me how things will work (if at all) in the
case where multiple shared libraries (or executable and at least one shared
library) have their own offloading bits, and if you try to e.g. call an
offloaded function defined in the shared library from an offloaded kernel in
the executable, because if any library needs some global singleton case, if
it is linked multiple times, no idea what the PTX JIT will do.

Once that is resolved, another thing will be to figure out how to
efficiently implement the TLS libgomp needs for its ICVs and other state
- right now it uses either __thread, or pthread_getspecific, neither of
these is usable of course.  I've been thinking about an array of those
structures in .shared memory indexed by %tid.x, but I guess that runs into
the issue that the array would need to be declared fixed size and there is a
very small size limitation on .shared memory size.
So perhaps a file scope .shared pointer to global memory, where whomever
launches an OpenMP 4.0 kernel (either the libgomp-plugin-nvptx.so.1 doing
GOMP_run, or later on dynamic parallelism from GOMP_target in the nvptx
libgomp.a) allocates the memory and some wrapper sets the .shared variable
to that allocated memory, then calls the kernel?

	Jakub

Attachment: U1
Description: Text document

Attachment: U2
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]