12.2 nvptx

On the hardware side, there is the hierarchy (fine to coarse):

All OpenMP and OpenACC levels are used, i.e.

The used sizes are

Additional information can be obtained by setting the environment variable to GOMP_DEBUG=1 (very verbose; grep for kernel.*launch for launch parameters).

GCC generates generic PTX ISA code, which is just-in-time compiled by CUDA, which caches the JIT in the user’s directory (see CUDA documentation; can be tuned by the environment variables CUDA_CACHE_{DISABLE,MAXSIZE,PATH}.

Note: While PTX ISA is generic, the -mptx= and -march= commandline options still affect the used PTX ISA code and, thus, the requirements on CUDA version and hardware.

The implementation remark: