71646 – incompability between ptx code and GPU hardware due to sm_20 devices not being supported

Bug 71646 - incompability between ptx code and GPU hardware due to sm_20 devices not being supported

Summary: incompability between ptx code and GPU hardware due to sm_20 devices not bein...

Status:	UNCONFIRMED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	libgomp (show other bugs)
Version:	6.1.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2016-06-24 13:23 UTC by Didier G
Modified:	2024-04-15 00:44 UTC (History)
CC List:	3 users (show)

See Also:
Host:
Target:	nvptx-none
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Very simple OpenACC program (304 bytes, text/x-csrc) 2016-06-24 13:23 UTC, Didier G	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Didier G 2016-06-24 13:23:54 UTC

Created attachment 38758 [details]
Very simple OpenACC program

Hardware : Core 2 Quad + Nvidia Geforce GT 430

OS : Linux 4.4.0-24-generic x86_64

lib environ : 
              - gcc 6.1 (compiled from sources) 
              - nvidia-toolkit-7.5
              - libcudart 7.5
              - libcuda1-361
              - nvptx-tools, master branch of June, the 17th (compiled from sources)

The attached source program is compiled and linked thanks to this command :

gcc t.c -fopenacc -foffload=nvptx-none -foffload="-O3" -O3 -o t -lgomp -Wl,-rpath=/usr/local/lib64 

Typing this : export ACC_DEVICE_TYPE=

then executing ./t and these messages appear :

libgomp: Link error log ptxas fatal   : SM version specified by .target is higher than default SM version assumed


libgomp: cuLinkAddData (ptx_code) error: no kernel image is available for execution on the device

Moreover, ./t hangs.

It is expected as my video card supports at most sm_20 ptx code while sm_30 instructions are generated by gcc and even .target sm_30 is hardcoded at gcc/config/nvptx/nvptx.c:3904 : fputs ("\t.target\tsm_30\n", asm_out_file);

From my point of view, as sm_30 ptx code only is generated,  int nvptx_get_num_devices (void) (libgomp/plugin/plugin-nvptx.c:680) should be aware of that and should not count such a video card.
As a result, gomp runtime would switch to host as it does when cuInit(0) != CUDA_SUCCESS.

Comment 1 Andrew Pinski 2024-04-15 00:44:09 UTC

sm_30 is definitely the min target for offloading for GCC to nvptx .

What I don't know if `nvidia geforce gt 430` support is still existant in cuda. Maybe someone who knows the offloading support for Nvidia GPUs should comment here really.