This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][RFC][OpenMP] Forbid target* pragmas in target regions


Hi!

On Tue, 10 Feb 2015 11:20:24 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Feb 10, 2015 at 11:16:22AM +0100, Martin Jambor wrote:
> > On Mon, Jan 12, 2015 at 12:22:44AM +0300, Ilya Verbin wrote:
> > > Currently if a target* pragma appears within a target region, GCC successfully
> > > compiles such code (with a warning).  But the binary fails at run-time, since it
> > > tries to call GOMP_target* functions on target.
> > > 
> > > The spec says: "If a target, target update, or target data construct appears
> > > within a target region then the behavior is unspecified."
> > > 
> > > I see 2 options to make the behavior more user-friendly:
> > > 1. To return an error at compile-time.
> > > 2. To check at run-time in libgomp whether GOMP_target* is called on target, and
> > > perform target-fallback if so.
> > > 
> > 
> > What actually happens when an accelerator calls a libgomp function?
> > Is a target libgomp port invoked?  If so, it should easily know it
> > runs on a target even without a run-time check, I suppose.  Or do you
> > somehow bring that call back to the host?
> 
> The spec says that it is undefined behavior to invoke
> #pragma omp target {,data,update} from within #pragma omp target region.

We're not currently implementing that, but let me mention that OpenACC
describes a concept of nested parallelism:

OpenACC 2.0a, 1.2 Execution Model:

    [...]
    On some devices, the accelerator may also create and launch parallel kernels, allowing for
    nested parallelism. In that case, the OpenACC directives may be executed by a host thread or
    an accelerator thread. [...]

OpenACC 2.0a, 2.6 Data Environment:

    [...] When a
    nested OpenACC construct is executed on the device, the default target device for that
    construct is the same device on which the encountering accelerator thread is executing. In
    that case, the target device shares memory with the encountering thread.

For PTX, this would use CUDA's Dynamic Parallelism,
<http://devblogs.nvidia.com/parallelforall/introduction-cuda-dynamic-parallelism/>,
for example.

> For intelmic, the offloading shared libraries are normally linked against
> -lgomp and thus can call any functions from there.
> For nvptx, libgomp still needs to be ported to that target.
> So, what we can do is e.g. ignore the nested #pragma omp target* regions
> inside of #pragma omp target, or turn them into __builtin_trap ().


GrÃÃe,
 Thomas

Attachment: pgpV0GM2pYcHU.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]