[gomp4] various OpenACC/PTX built-ins and a reduction tweak

Cesar Philippidis cesar@codesourcery.com
Wed Sep 17 00:33:00 GMT 2014


The patch introduces the following OpenACC/PTX-specific built-ins:

  * GOACC_ntid
  * GOACC_tid
  * GOACC_nctaid
  * GOACC_ctaid
  * acc_on_device
  * GOACC_get_thread_num
  * GOACC_get_num_threads

Of these functions, the only one part of the OpenACC spec is
acc_on_device. The other functions are helpers for omp-low.c. In
particular, I'm using GOACC_get_thread_num and GOACC_get_num_threads to
determine the number of accelerator threads available to the reduction
clause. Current GOACC_get_num_threads is num_gangs * vector_length, but
value is subject to change later on. It's probably a premature to
include the PTX built-ins right now, but I'd like to middle end of our
internal OpenACC branch in sync with gomp-4_0-branch.

This patch also allows OpenACC reductions to process the array holding
partial reductions on the accelerator, instead of copying that array
back to the host. Currently, this only happens when num_gangs = 1. For
PTX targets, we're going to need to use another kernel to process the
array of partial results because PTX lacks inter-CTA synchronization
(we're currently mapping gangs to CTAs). That's why I was working on the
routine clause recently.

Is this OK for gomp-4_0-branch?

Thanks,
Cesar
-------------- next part --------------
A non-text attachment was scrubbed...
Name: builtins-gomp4.diff
Type: text/x-patch
Size: 36188 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20140917/5181decf/attachment.bin>


More information about the Gcc-patches mailing list