Created attachment 38758 [details]
Very simple OpenACC program
Hardware : Core 2 Quad + Nvidia Geforce GT 430
OS : Linux 4.4.0-24-generic x86_64
lib environ :
- gcc 6.1 (compiled from sources)
- libcudart 7.5
- nvptx-tools, master branch of June, the 17th (compiled from sources)
The attached source program is compiled and linked thanks to this command :
gcc t.c -fopenacc -foffload=nvptx-none -foffload="-O3" -O3 -o t -lgomp -Wl,-rpath=/usr/local/lib64
Typing this : export ACC_DEVICE_TYPE=
then executing ./t and these messages appear :
libgomp: Link error log ptxas fatal : SM version specified by .target is higher than default SM version assumed
libgomp: cuLinkAddData (ptx_code) error: no kernel image is available for execution on the device
Moreover, ./t hangs.
It is expected as my video card supports at most sm_20 ptx code while sm_30 instructions are generated by gcc and even .target sm_30 is hardcoded at gcc/config/nvptx/nvptx.c:3904 : fputs ("\t.target\tsm_30\n", asm_out_file);
From my point of view, as sm_30 ptx code only is generated, int nvptx_get_num_devices (void) (libgomp/plugin/plugin-nvptx.c:680) should be aware of that and should not count such a video card.
As a result, gomp runtime would switch to host as it does when cuInit(0) != CUDA_SUCCESS.