Bug 71646 - incompability between ptx code and GPU hardware
Summary: incompability between ptx code and GPU hardware
Alias: None
Product: gcc
Classification: Unclassified
Component: libgomp (show other bugs)
Version: 6.1.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
Depends on:
Reported: 2016-06-24 13:23 UTC by Didier G
Modified: 2016-06-24 15:58 UTC (History)
3 users (show)

See Also:
Known to work:
Known to fail:
Last reconfirmed:

Very simple OpenACC program (304 bytes, text/x-csrc)
2016-06-24 13:23 UTC, Didier G

Note You need to log in before you can comment on or make changes to this bug.
Description Didier G 2016-06-24 13:23:54 UTC
Created attachment 38758 [details]
Very simple OpenACC program

Hardware : Core 2 Quad + Nvidia Geforce GT 430

OS : Linux 4.4.0-24-generic x86_64

lib environ : 
              - gcc 6.1 (compiled from sources) 
              - nvidia-toolkit-7.5
              - libcudart 7.5
              - libcuda1-361
              - nvptx-tools, master branch of June, the 17th (compiled from sources)

The attached source program is compiled and linked thanks to this command :

gcc t.c -fopenacc -foffload=nvptx-none -foffload="-O3" -O3 -o t -lgomp -Wl,-rpath=/usr/local/lib64 

Typing this : export ACC_DEVICE_TYPE=

then executing ./t and these messages appear :

libgomp: Link error log ptxas fatal   : SM version specified by .target is higher than default SM version assumed

libgomp: cuLinkAddData (ptx_code) error: no kernel image is available for execution on the device

Moreover, ./t hangs.

It is expected as my video card supports at most sm_20 ptx code while sm_30 instructions are generated by gcc and even .target sm_30 is hardcoded at gcc/config/nvptx/nvptx.c:3904 : fputs ("\t.target\tsm_30\n", asm_out_file);

From my point of view, as sm_30 ptx code only is generated,  int nvptx_get_num_devices (void) (libgomp/plugin/plugin-nvptx.c:680) should be aware of that and should not count such a video card.
As a result, gomp runtime would switch to host as it does when cuInit(0) != CUDA_SUCCESS.