Bug 115740 - gcc-14.1.1: __glibcxx_assert_fail const-evaluation breaks clang/hip device code
Summary: gcc-14.1.1: __glibcxx_assert_fail const-evaluation breaks clang/hip device code
Status: REOPENED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 14.1.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-07-01 22:56 UTC by Lockal
Modified: 2025-06-11 23:39 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-07-01 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lockal 2024-07-01 22:56:28 UTC
After introducing of call to std::__glibcxx_assert_fail() in _GLIBCXX_HAVE_IS_CONSTANT_EVALUATED ifdef block, clang fails to compile some code (found on pytorch)

gcc version 14.1.1 20240622 (Gentoo Hardened 14.1.1_p20240622 p2)

Breaking change: https://github.com/gcc-mirror/gcc/commit/1395c573c523762957bde8c2a08832c5f4350815

-------------------------------------------------------------------

Error message:

In file included from <built-in>:1:
In file included from /usr/lib/llvm/18/bin/../../../../lib/clang/18/include/__clang_hip_runtime_wrapper.h:145:
In file included from /usr/lib/llvm/18/bin/../../../../lib/clang/18/include/cuda_wrappers/algorithm:55:
In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/14/include/g++-v14/algorithm:61:
/usr/lib/gcc/x86_64-pc-linux-gnu/14/include/g++-v14/bits/stl_algo.h:3625:7: error: reference to __host__ function '__glibcxx_assert_fail' in
      __host__ __device__ function
 3625 |       __glibcxx_assert(!(__hi < __lo));
      |       ^
/usr/lib/gcc/x86_64-pc-linux-gnu/14/include/g++-v14/x86_64-pc-linux-gnu/bits/c++config.h:612:7: note: expanded from macro '__glibcxx_assert'
  612 |         std::__glibcxx_assert_fail();                                   \
      |              ^
/var/tmp/portage/sci-libs/caffe2-2.3.1/work/pytorch-2.3.1/aten/src/ATen/native/hip/IndexKernel.hip:254:21: note: called by 'operator()'
  254 |       qvalue = std::clamp(qvalue, qmin, qmax);
      |                     ^
/var/tmp/portage/sci-libs/caffe2-2.3.1/work/pytorch-2.3.1/aten/src/ATen/native/hip/IndexKernel.hip:101:5: note: called by 'operator()'
  101 |     f(out_data, in_data, offset);
      |     ^
/var/tmp/portage/sci-libs/caffe2-2.3.1/work/pytorch-2.3.1/aten/src/ATen/native/hip/IndexKernel.hip:36:7: note: called by
      'index_elementwise_kernel<128, 4, (lambda at /var/tmp/portage/sci-libs/caffe2-2.3.1/work/pytorch-2.3.1/aten/src/ATen/native/hip/IndexKernel.hip:85:62)>'
   36 |       f(idx);
      |       ^
/usr/lib/gcc/x86_64-pc-linux-gnu/14/include/g++-v14/x86_64-pc-linux-gnu/bits/c++config.h:605:3: note: '__glibcxx_assert_fail' declared here
  605 |   __glibcxx_assert_fail()
      |   ^

-------------------------------------------------------------------

How it can be fixed: by adding constexpr:

  __attribute__((__always_inline__,__visibility__("default")))
  _GLIBCXX_CONSTEXPR inline void
  __glibcxx_assert_fail()
  { }

-------------------------------------------------------------------

Why it exists in the first place: I don't know. c++config.h calls empty function for some reason (maybe for setting debugger breakpoints?).
Comment 1 Drea Pinski 2024-07-01 23:04:52 UTC
The function already has __always_inline__ on it.
This looks more like a cuda issue rather than even a clang issue.

Have you reported it to NVIDIA yet?
Comment 2 Drea Pinski 2024-07-01 23:06:19 UTC
Oh yes:
https://github.com/llvm/llvm-project/issues/95183
Comment 3 Drea Pinski 2024-07-01 23:09:04 UTC
Note the change is very specific says it should NOT be constexpr even:
 But when that macro is not defined, we use a new inline (but not constexpr)
overload of __glibcxx_assert_fail to cause compilation to fail.


So yes this is a cuda issue.
Comment 4 Drea Pinski 2024-07-01 23:12:45 UTC
std::clamp has undefined behavior if lo is greater than hi and that is why the assert is there.
It has to be rejected at compile time for the undefined behavior with respect to constexpr. Adding constexpr would NOT reject the undefined behavior which is wrong for C++.
Comment 5 Drea Pinski 2024-07-01 23:18:01 UTC
Oh it was worked around in complex before too (PR 100676).
Comment 6 Lockal 2024-07-01 23:35:41 UTC
Oops, did not expect that someone already reported this to llvm/llvm-project.

This was noticed as an issue of ROCm, not CUDA. Did not try CUDA, probably won't work with gcc-14 due to multiple reasons (starting with version check in preprocessor).

But I think it has nothing to do with LLVM:

1) I did not enable -D_GLIBCXX_ASSERTIONS - I don't want see calls to __glibcxx_assert_fail (device code can't comprehend this)

2) Specifically clang dislikes calls to host code from device code
Comment 7 Sam James 2024-07-01 23:36:41 UTC
> 1) I did not enable -D_GLIBCXX_ASSERTIONS - I don't want see calls to __glibcxx_assert_fail (device code can't comprehend this)

Note that for hardened in Gentoo, we enable it by default - see https://gitweb.gentoo.org/proj/gcc-patches.git/tree/14.1.0/gentoo/15_all_DEF_GENTOO_GLIBCXX_ASSERTIONS.patch.
Comment 8 Sam James 2024-07-01 23:37:32 UTC
Ah, but you're using Clang. But for Clang, we do the same sort of thing: https://gitweb.gentoo.org/repo/gentoo.git/tree/sys-devel/clang-common/clang-common-18.1.8.ebuild#n246
Comment 9 Lockal 2024-07-01 23:44:34 UTC
Sam James, that's a separate issue - https://bugs.gentoo.org/935314

I modified my /etc/clang/gentoo-hardened.cfg to contain:

-Xarch_host -D_GLIBCXX_ASSERTIONS

and it works fine, except it does not work and fails few lines below in #else block - https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/c%2B%2Bconfig#L610-L625

And it can't be fixed in gentoo-hardened.cfg with this #else
Comment 10 Drea Pinski 2024-07-01 23:45:59 UTC
(In reply to Lockal from comment #6)
> 2) Specifically clang dislikes calls to host code from device code

THERE IS NO device or host code in libstdc++. That in itself a hack on how cuda/ROCm are hacked on. It just happens that you are using std::clamp inside device code which might not be a valid thing anyways as it is not marked as either host or device ...

This whole thing feels like a hack on how this works from the front-end rather than doing stuff in the middle-end and outlining it there.


SEE https://github.com/llvm/llvm-project/issues/49727 and PR 100676 previous issues in this abuse of C++ code.
Comment 11 Drea Pinski 2024-07-01 23:53:00 UTC
Oh this has already been fixed/worked around in pytorch .
Comment 12 Jonathan Wakely 2025-06-11 08:54:34 UTC
Apparently this is still a problem (pytorch just stopped using std::clamp apparently) but I'm unable to reproduce any errors: https://cuda.godbolt.org/z/EM4aqqczx

In the absence of a reproducer nothing is going to change.
Comment 13 Jonathan Wakely 2025-06-11 09:08:49 UTC
The problem happens for functions marked __device__:

#include <algorithm>
#include <complex>
__device__ int f(int i) {
    return std::clamp(1, 2, 10);
}


https://cuda.godbolt.org/z/aKn4Ef485
Comment 14 Jonathan Wakely 2025-06-11 21:20:51 UTC
I wonder if this would work:

--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -610,7 +610,9 @@ namespace std
 # define _GLIBCXX_EXTERN_TEMPLATE -1
 #endif
 
+#if !(__has_attribute(__device__) && defined(__device__))
 #undef _GLIBCXX_VERBOSE_ASSERT
+#endif
 
 // Assert.
 #ifdef _GLIBCXX_VERBOSE_ASSERT
@@ -645,17 +647,10 @@ namespace std
 // _GLIBCXX_ASSERTIONS is not defined, so assertions checks are only enabled
 // during constant evaluation. This ensures we diagnose undefined behaviour
 // in constant expressions.
-namespace std
-{
-  __attribute__((__always_inline__,__visibility__("default")))
-  inline void
-  __glibcxx_assert_fail()
-  { }
-}
 # define __glibcxx_assert(cond)                                                \
   do {                                                                 \
     if (std::__is_constant_evaluated() && !bool(cond))                 \
-      std::__glibcxx_assert_fail();                                    \
+      _GLIBCXX_ASSERT_FAIL(cond);                                      \
   } while (false)
 #else
 // _GLIBCXX_ASSERTIONS is not defined and __is_constant_evaluated() doesn't


This would disable the verbose assert when __device__ is defined, and so always use __builtin_abort() for assertions (at runtime and at compile-time in constexpr).

Replace __device__ with some other suitable macro(s) like (__HIPCC_ || __CUDACC__) or something.
Comment 15 Jonathan Wakely 2025-06-11 21:48:10 UTC
https://llvm.org/docs/CompileCudaWithLLVM.html#detecting-clang-vs-nvcc-from-code suggests that __CUDA_ARCH__ is the right macro (thanks to zhao)
Comment 16 Zhao Wei Liew 2025-06-11 22:57:37 UTC
Upon further thought, the required macro should be

#if defined(__clang__) && defined(__CUDA__) && defined(__HIP__)

without any the guard on device mode or host mode. This is because we want __builtin_abort to be used whenever we're compiling CUDA/HIP code with clang, regardless of whether it's compiling in in device mode or host mode.
Comment 17 Zhao Wei Liew 2025-06-11 23:39:02 UTC
Oops, I meant 

#if defined(__clang__) && (defined(__CUDA__) || defined(__HIP__))


The __HIP__ macro comes from https://clang.llvm.org/docs/HIPSupport.html#predefined-macros