GCC can be configured to target nvptx (Nvidia PTX).
GCC nvptx-none Target testing
A stand-alone GCC --target=nvptx-none configuration is useful for getting confidence in the GCC nvptx back end code generation.
See the specific installation notes, https://gcc.gnu.org/install/specific.html#nvptx-x-none.
tschwinge has scripts: https://github.com/tschwinge/gcc-playground/tree/nvptx-light/master, and occasionally runs these, and fixes or reports any major breakage. Be sure to run TEST-gcc with -j1 only, as otherwise there will be too many "twinkling" test results. It will take a long time: 17 hours on one specific system.
How It Works
The DejaGnu board file for nvptx-none unconditionally specifies -mmainkernel to link in crt0.o, a tiny bit of startup code. The nvptx-none-run tool then loads the PTX code to the GPU, and launches a single-threaded kernel. (This is in contrast to offloading use, where the nvptx libgomp plugin loads PTX code to the GPU, and launches "multi-dimensional" parallelized kernels.)
Interpreting Test Results
In some regards, the nvptx target is quite different from GCC's usual set of targets, so the test results should be interpreted "differently", too.
As these would be difficult to implement due to the constraints set by PTX itself, the GCC nvptx back end doesn't support setjmp/longjmp, exceptions (?), alloca, computed goto, non-local goto, for example.
GCC's C test suite has been mostly cleaned up, but generally, don't expect clean test results -- the value of such testing is in before/after comparison.
To verify that the assembly code generated by GCC's nvptx back end is correct, by default the ptxas tool is invoked, which "chokes" on some test cases of GCC's test suite. That is, for some test cases it crashes, or doesn't terminate in a reasonable amount of time. Several of these have been reported to Nvidia, and ought to be fixed in later versions of the CUDA Toolkit.
The nvptx target uses a "crippled" variant of the DWARF debug information (DWARF2_LINENO_DEBUGGING_INFO), which generally the DWARF test cases have not been adjusted to, so don't expect these to return sensible results.
It's not easy (and, not required for offloading use) to support all of the standard C library, so a good number of test cases will run into missing functionality there.
For the same reason, only a "minimal" variant of the Fortran support library (libgfortran) is built, so a good number of Fortran test cases will run into missing functionality there.
The C++ support library (libstdc++) is not built at all, so all test cases depending on it will fail.
All this could be improved, but the focus has been on these aspects that are useful for parallelized code offloading, which, for example, file I/O is not a part of.