GCC can be configured to target nvptx (Nvidia PTX).

This is primarily used for code offloading with OpenACC and OpenMP directives.

PTX Documentation

Build Dependencies

Both nvptx offloading and stand-alone GCC --target=nvptx-none toolchains requires nvptx-tools and newlib. See the specific installation notes, https://gcc.gnu.org/install/specific.html#nvptx-x-none for more details.

GCC nvptx-none Target Testing

A stand-alone GCC --target=nvptx-none configuration is useful for getting confidence in the GCC nvptx back end code generation.

After building, run:

$ export NVPTX_NONE_DEJAGNU=[...]/build-nvptx-tools/dejagnu.exp
$ export NVPTX_NONE_RUN=[...]/build-nvptx-tools/nvptx-none-run
$ make -k check-gcc RUNTESTFLAGS="--target_board=nvptx-none-run --all dg.exp='*initpri*' ecos.exp='*initpri*'" DEJAGNU="$NVPTX_NONE_DEJAGNU"

Setting NVPTX_NONE_DEJAGNU/DEJAGNU is just to make DejaGnu find the DejaGnu board file for nvptx-none for --target_board=nvptx-none-run. NVPTX_NONE_RUN points to the launcher. Otherwise, use the standard GCC test suite idioms.

You may use make -jN to parallelize testing. In order to stabilize test results (avoid "twinkling" test cases, hangs, spurious FAILs), this internally serializes all execution testing (that is, all nvptx-none-run invocations), see NVPTX_NONE_EXECUTION_LOCK_FILE. (This means that also the PTX -> SASS JIT compilation isn't done in parallel, which could be improved upon.)

Running the whole GCC test suite (such as: make -k -j13 RUNTESTFLAGS="--target_board=nvptx-none-run" DEJAGNU="$NVPTX_NONE_DEJAGNU") takes several hours.

How It Works

The DejaGnu board file for nvptx-none unconditionally specifies -mmainkernel to link in crt0.o, a tiny bit of startup code. The nvptx-none-run tool then loads the PTX code to the GPU, and launches a single-threaded kernel. (This is in contrast to offloading use, where the nvptx libgomp plugin loads PTX code to the GPU, and launches "multi-dimensional" parallelized kernels.)

Interpreting Test Results

In some regards, the nvptx target is quite different from GCC's usual set of targets, so the test results should be interpreted "differently", too.

As these would be difficult to implement due to the constraints set by PTX itself, the GCC nvptx back end doesn't support setjmp/longjmp, `alloca`, computed goto, non-local goto, C++ exceptions handling, for example.

GCC's C test suite has been mostly cleaned up, but generally, don't expect clean test results -- the value of such testing is in before/after comparison (diff the *.sum files before/after code changes).

To verify that the PTX code generated by GCC's nvptx back end is syntactically correct, by default the CUDA Toolkit ptxas tool is invoked by nvptx-none-as, if available. This "chokes" on a few test cases of GCC's test suite: for a few test cases it crashes, or doesn't terminate in a reasonable amount of time. Several of these have been reported to Nvidia, and ought to be fixed in later versions of the CUDA Toolkit.

The nvptx target uses a "crippled" variant of the DWARF debug information (DWARF2_LINENO_DEBUGGING_INFO), which generally the DWARF test cases have not been adjusted to, so don't expect these to return sensible results.

It's not easy (and, not required for offloading use) to support all of the standard C library, so a number of test cases across all the GCC test suite will FAIL to link due to missing symbols, for example.

The C++ support library (libstdc++) is not built at all, so all test cases depending on it will fail.

All this could be improved, but the focus has been on these aspects that are useful for parallelized code offloading, which, for example, file I/O is not a part of.

None: nvptx (last edited 2024-09-23 13:38:57 by tschwinge)