GCC can be configured to target nvptx (Nvidia PTX).
Both nvptx offloading and standalone toolchains requires nvptx-tools and newlib 3.0 revision cd31fbb2aea25f94d7ecedc9db16dfc87ab0c316 or later. The nvptx-tools does not have any official point release. However, it is generally stable, and end users are encouraged to use the git master branch.
At present, the nvptx newlib port only support EL/IX level 1. This implies that various file I/O functions beyond printf are not supported. Furthermore, due to PTX restrictions, the supported I/O functions are not re-entrant. However, that's largely irrelevant because when nvptx devices are operating as offloading accelerators, the host processor is responsible for the I/O functionality. Likewise, standalone compilers are restricted to generating code for single-threaded environments. Generally, newlib's default configuration parameters should work out of the box, but newlib also needs to be configured with --enable-newlib-io-long-long. When the defaults are not used, do not enable newlib-register-fini or EL/IX levels 2-4, because those may prevent newlib from building or cause unexpected failures at runtime.
See the specific installation notes, https://gcc.gnu.org/install/specific.html#nvptx-x-none for more details.
GCC nvptx-none Target testing
A stand-alone GCC --target=nvptx-none configuration is useful for getting confidence in the GCC nvptx back end code generation.
tschwinge has scripts: https://github.com/tschwinge/gcc-playground/tree/nvptx-light/master, and occasionally runs these, and fixes or reports any major breakage. Be sure to run TEST-gcc with -j1 only, as otherwise there will be too many "twinkling" test results. It will take a long time: 17 hours on one specific system.
How It Works
The DejaGnu board file for nvptx-none unconditionally specifies -mmainkernel to link in crt0.o, a tiny bit of startup code. The nvptx-none-run tool then loads the PTX code to the GPU, and launches a single-threaded kernel. (This is in contrast to offloading use, where the nvptx libgomp plugin loads PTX code to the GPU, and launches "multi-dimensional" parallelized kernels.)
Interpreting Test Results
In some regards, the nvptx target is quite different from GCC's usual set of targets, so the test results should be interpreted "differently", too.
As these would be difficult to implement due to the constraints set by PTX itself, the GCC nvptx back end doesn't support setjmp/longjmp, exceptions (?), alloca, computed goto, non-local goto, for example.
GCC's C test suite has been mostly cleaned up, but generally, don't expect clean test results -- the value of such testing is in before/after comparison.
To verify that the assembly code generated by GCC's nvptx back end is correct, by default the ptxas tool is invoked, which "chokes" on some test cases of GCC's test suite. That is, for some test cases it crashes, or doesn't terminate in a reasonable amount of time. Several of these have been reported to Nvidia, and ought to be fixed in later versions of the CUDA Toolkit.
The nvptx target uses a "crippled" variant of the DWARF debug information (DWARF2_LINENO_DEBUGGING_INFO), which generally the DWARF test cases have not been adjusted to, so don't expect these to return sensible results.
It's not easy (and, not required for offloading use) to support all of the standard C library, so a good number of test cases will run into missing functionality there.
For the same reason, only a "minimal" variant of the Fortran support library (libgfortran) is built, so a good number of Fortran test cases will run into missing functionality there.
The C++ support library (libstdc++) is not built at all, so all test cases depending on it will fail.
All this could be improved, but the focus has been on these aspects that are useful for parallelized code offloading, which, for example, file I/O is not a part of.