[Bug target/100678] New: [OpenACC/nvptx] 'libgomp.oacc-c-c++-common/private-atomic-1.c' FAILs (differently) in certain configurations
tschwinge at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Wed May 19 12:45:20 GMT 2021
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100678
Bug ID: 100678
Summary: [OpenACC/nvptx]
'libgomp.oacc-c-c++-common/private-atomic-1.c' FAILs
(differently) in certain configurations
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Keywords: openacc
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: tschwinge at gcc dot gnu.org
CC: jules at gcc dot gnu.org, vries at gcc dot gnu.org
Target Milestone: ---
Target: nvptx
For OpenACC/nvptx offloading, the testcase
'libgomp.oacc-c-c++-common/private-atomic-1.c' that I've just pushed as commit
r12-908-g1467100fc72562a59f70cdd4e05f6c810d1fadcc "Add
'libgomp.oacc-c-c++-common/private-atomic-1.c' [PR83812]" has been expected to
fail with "operation not supported on global/shared address space" (see
PR83812). However, I now found that on an x86_64 GNU/Linux system, Nvidia
TITAN V GPU, CUDA Driver 455.23.05, it *doesn't* fail in that way: the device
kernel execution completes normally -- but it instead returns a wrong reduction
result: zero.
At this point, it's (a) unclear whether the PR83812 restriction indeed is
supposed to be lifted for certain modern GPU hardware/SM levels/CUDA Driver
releases, and (b) what is then instead going wrong so that we don't compute the
expected reduction result.
Assuming that (a) has been done in good faith, I can see how (b) might happen
if the 'v' variable would in fact *not* be thread-private (but instead
device-global, I suppose), thus all threads atomically incrementing the
device-global variable concurrently, thus the '(v == -222 + 121)' expression
never being true?
More information about the Gcc-bugs
mailing list