As documented on cppreference[1], hardware_destructive_interference_size provide a portable way to access the L1 data cache line size. And Raymond mentions hardware_destructive_interference_size tells you (basically) the size of a cache line in his blog[2]. But this value is 32 for riscv64, which seems wrong to me because IIRC many riscv64 cpus have cacheline of size 64. This value is defined to be __GCC_DESTRUCTIVE_SIZE. And Clang takes this value from gcc when implementing their __GCC_DESTRUCTIVE_SIZE[3]. In general I am not familiar with the gcc code base and didn't find the definition of this constant for riscv64 target. Jonathan told me that 32 is the default value[4] so it is a bug. [1]: https://en.cppreference.com/w/cpp/thread/hardware_destructive_interference_size [2]: https://devblogs.microsoft.com/oldnewthing/20230424-00/?p=108085 [3]: https://github.com/llvm/llvm-project/pull/89446#issuecomment-2070649367 [4]: https://gcc.gnu.org/pipermail/libstdc++/2024-September/059490.html
I think the root cause is: gcc/params.opt:Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param Optimization If the target doesn't override that, then it defaults to 32, and then param_destruct_interfere_size also defaults to 32.
See/Read https://inbox.sourceware.org/gcc-patches/20210910131625.159525-1-jason@redhat.com/ also.
Any use of std::hardware_destructive_interference_size and std::hardware_constructive_interference_size for ABI purposes in a header file is very much discouraged and GCC will (or should be) warn about them . So the value is not wrong just needs improvement.
(In reply to Levi Zim from comment #0) > But this value is 32 for riscv64, which seems wrong to me because IIRC > many riscv64 cpus have cacheline of size 64. If this is only true for "man riscv64 cpus" and not all, then maybe 32 is an appropriate default. If you know you're using hardware with 64-byte cachelines, then you can compile your code with: --param l1-cache-line-size=64 or leave the cache line size unchanged, and use: --param destructive-interference-size=64 But see -Winterference-size in the GCC manual.
(In reply to Andrew Pinski from comment #3) > Any use of std::hardware_destructive_interference_size and > std::hardware_constructive_interference_size for ABI purposes in a header > file is very much discouraged and GCC will (or should be) warn about them . > > So the value is not wrong just needs improvement. I don't think so. I think it should be set to the largest possible value of L1 cache line size. If not, then how does it fulfill the requirement "Minimum offset between two objects to avoid false sharing."? As documented in https://github.com/riscv/riscv-profiles/releases/download/v1.0/profiles.pdf For RVA20*64 profiles, this value should be set to 128, taken from page 12, Za128rs. For RVA22*64 profiles, this value should be set to 64, taken from page 17, Za64rs.
> I don't think so. I think it should be set to the largest possible value of > L1 cache line size. If not, then how does it fulfill the requirement > "Minimum offset between two objects to avoid false sharing."? cppreference is not the standard. The standard never says hardware_destructive_interference_size means this. Instead it says: This number is the minimum recommended offset between two concurrently-accessed objects to avoid additional performance degradation due to contention introduced by the implementation. It shall be at least alignof(max_align_t). To me "recommended" means the value is not needed to be strictly matching the hardware property. And the original paper https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0154r1.html clearly says: Destructive interference size: a number that’s suitable as an offset between two objects to likely avoid false-sharing due to different runtime access patterns from different threads. Note the word "likely" directly contradicts from your suggestion using the maximum value. If we have 100 CPUs with 32B, 250 CPUs with 64B, and 10 CPUs with 128B, by "likely" we should use 64B, but the maximum is 128B.
(In reply to Xi Ruoyao from comment #6) > cppreference is not the standard. The standard never says > hardware_destructive_interference_size means this. > > Instead it says: > > This number is the minimum recommended offset between two > concurrently-accessed objects to avoid additional performance degradation > due to contention introduced by the implementation. It shall be at least > alignof(max_align_t). > > To me "recommended" means the value is not needed to be strictly matching > the hardware property. And the original paper > https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0154r1.html > clearly says: > > Destructive interference size: a number that’s suitable as an offset between > two objects to likely avoid false-sharing due to different runtime access > patterns from different threads. > > Note the word "likely" directly contradicts from your suggestion using the > maximum value. If we have 100 CPUs with 32B, 250 CPUs with 64B, and 10 CPUs > with 128B, by "likely" we should use 64B, but the maximum is 128B. Thanks for your excellent explanation! Then most of the riscv64 CPUs I know have a cache line size of 64B. Would gcc update the value to 64B?
That is up to the riscv maintainers.
So the question in my mind, how important is this? On modern kernels & toolchains it's possible to query the cboz extension & its block size which effectively gives you the size of a cache line. But not every design has the cboz extension, so we'd have to have a fallback if we went the route of dynamic discovery. Or we could just update the constant. 64b cache lines are likely going to be the sweet spot for a while. I wouldn't lose sleep just setting it to 64bit and getting on with our lives.
(In reply to Jeffrey A. Law from comment #9) > So the question in my mind, how important is this? On modern kernels & > toolchains it's possible to query the cboz extension & its block size which > effectively gives you the size of a cache line. But not every design has > the cboz extension, so we'd have to have a fallback if we went the route of > dynamic discovery. Usually std::hardware_destructive_interference_size is used at compile time in alignas. If this doesn't match L1 cacheline size, probably it will only make performance given it is used correctly. > Or we could just update the constant. 64b cache lines are likely going to > be the sweet spot for a while. I wouldn't lose sleep just setting it to > 64bit and getting on with our lives. Yes. I think that's the solution.
So I was about to submit a patch to fix this, but after reviewing Jason's comments & cfe-dev, I'm a bit concerned about making this change. Essentially the concern is this value has ABI implications; so I'm quite hesitant to change it. I think Jason's position is that while it has ABI implications, that anyone using it in a way that exposes those ABI implications is using the feature in an unsafe way and that we shouldn't even try to present a stable & correct value as it inherently has to change based on underlying u-architectures. I can buy that argument. But before I make such a change for RISC-V I'd like to hear from Jon and Jason on that point.
(In reply to Jeffrey A. Law from comment #11) > I think Jason's position is that while it has ABI implications, that anyone > using it in a way that exposes those ABI implications is using the feature > in an unsafe way and that we shouldn't even try to present a stable & > correct value as it inherently has to change based on underlying > u-architectures. That is indeed my position. To reinforce that I added -Winterference-size (on by default) to warn about any use in a header or module interface unit; see the documentation for that warning.