Bug 116662 - The value of __GCC_DESTRUCTIVE_SIZE for riscv64 could be improved
Summary: The value of __GCC_DESTRUCTIVE_SIZE for riscv64 could be improved
Status: WAITING
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 15.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-09-10 12:24 UTC by Levi Zim
Modified: 2025-02-21 18:23 UTC (History)
6 users (show)

See Also:
Host:
Target: riscv64-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2025-02-21 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Levi Zim 2024-09-10 12:24:18 UTC
As documented on cppreference[1], hardware_destructive_interference_size
provide a portable way to access the L1 data cache line size. And
Raymond mentions hardware_destructive_interference_size tells you
(basically) the size of a cache line in his blog[2].

But this value is 32 for riscv64, which seems wrong to me because IIRC
many riscv64 cpus have cacheline of size 64.


This value is defined to be __GCC_DESTRUCTIVE_SIZE. And Clang takes this
value from gcc when implementing their __GCC_DESTRUCTIVE_SIZE[3]. In
general I am not familiar with the gcc code base and didn't find the
definition of this constant for riscv64 target. 

Jonathan told me that 32 is the default value[4] so it is a bug.


[1]:
https://en.cppreference.com/w/cpp/thread/hardware_destructive_interference_size
[2]: https://devblogs.microsoft.com/oldnewthing/20230424-00/?p=108085
[3]: https://github.com/llvm/llvm-project/pull/89446#issuecomment-2070649367
[4]: https://gcc.gnu.org/pipermail/libstdc++/2024-September/059490.html
Comment 1 Jonathan Wakely 2024-09-10 12:28:12 UTC
I think the root cause is:

gcc/params.opt:Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param Optimization


If the target doesn't override that, then it defaults to 32, and then param_destruct_interfere_size also defaults to 32.
Comment 3 Andrew Pinski 2024-09-10 13:00:13 UTC
Any use of std::hardware_destructive_interference_size and std::hardware_constructive_interference_size for ABI purposes in a header file is very much discouraged and GCC will (or should be) warn about them .

So the value is not wrong just needs improvement.
Comment 4 Jonathan Wakely 2024-09-10 13:03:00 UTC
(In reply to Levi Zim from comment #0)
> But this value is 32 for riscv64, which seems wrong to me because IIRC
> many riscv64 cpus have cacheline of size 64.

If this is only true for "man riscv64 cpus" and not all, then maybe 32 is an appropriate default. If you know you're using hardware with 64-byte cachelines, then you can compile your code with:

 --param l1-cache-line-size=64

or leave the cache line size unchanged, and use:

  --param destructive-interference-size=64

But see -Winterference-size in the GCC manual.
Comment 5 Levi Zim 2024-09-10 13:29:49 UTC
(In reply to Andrew Pinski from comment #3)
> Any use of std::hardware_destructive_interference_size and
> std::hardware_constructive_interference_size for ABI purposes in a header
> file is very much discouraged and GCC will (or should be) warn about them .
> 
> So the value is not wrong just needs improvement.

I don't think so. I think it should be set to the largest possible value of L1 cache line size. If not, then how does it fulfill the requirement "Minimum offset between two objects to avoid false sharing."?

As documented in https://github.com/riscv/riscv-profiles/releases/download/v1.0/profiles.pdf

For RVA20*64 profiles, this value should be set to 128, taken from page 12, Za128rs.
For RVA22*64 profiles, this value should be set to  64, taken from page 17, Za64rs.
Comment 6 Xi Ruoyao 2024-09-10 15:05:05 UTC
> I don't think so. I think it should be set to the largest possible value of
> L1 cache line size. If not, then how does it fulfill the requirement
> "Minimum offset between two objects to avoid false sharing."?

cppreference is not the standard.  The standard never says hardware_destructive_interference_size means this.

Instead it says:

This number is the minimum recommended offset between two concurrently-accessed objects to avoid additional performance degradation due to contention introduced by the implementation. It shall be at least alignof(max_align_t).

To me "recommended" means the value is not needed to be strictly matching the hardware property.  And the original paper https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0154r1.html clearly says:

Destructive interference size: a number that’s suitable as an offset between two objects to likely avoid false-sharing due to different runtime access patterns from different threads.

Note the word "likely" directly contradicts from your suggestion using the maximum value.  If we have 100 CPUs with 32B, 250 CPUs with 64B, and 10 CPUs with 128B, by "likely" we should use 64B, but the maximum is 128B.
Comment 7 Levi Zim 2024-09-10 15:32:51 UTC
(In reply to Xi Ruoyao from comment #6)
> cppreference is not the standard.  The standard never says
> hardware_destructive_interference_size means this.
> 
> Instead it says:
> 
> This number is the minimum recommended offset between two
> concurrently-accessed objects to avoid additional performance degradation
> due to contention introduced by the implementation. It shall be at least
> alignof(max_align_t).
> 
> To me "recommended" means the value is not needed to be strictly matching
> the hardware property.  And the original paper
> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0154r1.html
> clearly says:
> 
> Destructive interference size: a number that’s suitable as an offset between
> two objects to likely avoid false-sharing due to different runtime access
> patterns from different threads.
> 
> Note the word "likely" directly contradicts from your suggestion using the
> maximum value.  If we have 100 CPUs with 32B, 250 CPUs with 64B, and 10 CPUs
> with 128B, by "likely" we should use 64B, but the maximum is 128B.

Thanks for your excellent explanation! Then most of the riscv64 CPUs I know have a cache line size of 64B. Would gcc update the value to 64B?
Comment 8 Jakub Jelinek 2024-09-10 15:39:55 UTC
That is up to the riscv maintainers.
Comment 9 Jeffrey A. Law 2024-09-11 20:42:31 UTC
So the question in my mind, how important is this?  On modern kernels & toolchains it's possible to query the cboz extension & its block size which effectively gives you the size of a cache line.  But not every design has the cboz extension, so we'd have to have a fallback if we went the route of dynamic discovery.

Or we could just update the constant.  64b cache lines are likely going to be the sweet spot for a while.  I wouldn't lose sleep just setting it to 64bit and getting on with our lives.
Comment 10 Levi Zim 2024-09-12 10:08:50 UTC
(In reply to Jeffrey A. Law from comment #9)
> So the question in my mind, how important is this?  On modern kernels &
> toolchains it's possible to query the cboz extension & its block size which
> effectively gives you the size of a cache line.  But not every design has
> the cboz extension, so we'd have to have a fallback if we went the route of
> dynamic discovery.

Usually std::hardware_destructive_interference_size is used at compile time in alignas.
If this doesn't match L1 cacheline size, probably it will only make performance given it is used correctly.

> Or we could just update the constant.  64b cache lines are likely going to
> be the sweet spot for a while.  I wouldn't lose sleep just setting it to
> 64bit and getting on with our lives.

Yes. I think that's the solution.
Comment 11 Jeffrey A. Law 2024-12-30 19:43:38 UTC
So I was about to submit a patch to fix this, but after reviewing Jason's comments  & cfe-dev, I'm a bit concerned about making this change.

Essentially the concern is this value has ABI implications; so I'm quite hesitant to change it.

I think Jason's position is that while it has ABI implications, that anyone using it in a way that exposes those ABI implications is using the feature in an unsafe way and that we shouldn't even try to present a stable & correct value as it inherently has to change based on underlying u-architectures.

I can buy that argument.  But before I make such a change for RISC-V I'd like to hear from Jon and Jason on that point.
Comment 12 Jason Merrill 2025-02-21 18:23:22 UTC
(In reply to Jeffrey A. Law from comment #11)
> I think Jason's position is that while it has ABI implications, that anyone
> using it in a way that exposes those ABI implications is using the feature
> in an unsafe way and that we shouldn't even try to present a stable &
> correct value as it inherently has to change based on underlying
> u-architectures.

That is indeed my position.  To reinforce that I added -Winterference-size (on by default) to warn about any use in a header or module interface unit; see the documentation for that warning.