Bug 113807 - [performance] bitset::set not using memset opportunity
Summary: [performance] bitset::set not using memset opportunity
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 14.0
: P3 normal
Target Milestone: 14.2
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2024-02-07 12:34 UTC by rhalbersma
Modified: 2024-05-07 07:44 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-02-08 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description rhalbersma 2024-02-07 12:34:06 UTC
Conditionally on is_constant_evaluated() being false, bitset::reset delegate to __builtin_memset(_M_w, 0, _Nw * sizeof(_WordT)) and uses a loop otherwise. In contrast, bitset::set unconditionally has a raw loop. 

Can't bitset::set also not similarly use __builtin_memset(_M_w, 0xFF, _Nw * sizeof(_WordT)); when is_constant_evaluated() is false?
Comment 1 Jonathan Wakely 2024-02-08 12:55:15 UTC
Yup, I don't see why not.
Comment 2 GCC Commits 2024-02-15 11:44:29 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:e7ae13a858f36031b8fd3aa07362752ff2b19b2e

commit r14-9002-ge7ae13a858f36031b8fd3aa07362752ff2b19b2e
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Thu Feb 8 15:46:08 2024 +0000

    libstdc++: Use memset to optimize std::bitset::set() [PR113807]
    
    As pointed out in the PR we already do this for reset().
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/113807
            * include/std/bitset (bitset::set()): Use memset instead of a
            loop over the individual words.
Comment 3 rhalbersma 2024-02-15 11:53:23 UTC
Nice that this is changed now. I noticed a similar optimization could be done for bitset::operator== (more accurately: the helper _M_is_equal) where there is an opportunity to use memcmp, with a similar dance for consteval contexts. MSVC STL also does this.
Comment 4 Jonathan Wakely 2024-02-15 12:40:44 UTC
I'm surprised the compiler can't optimize _M_do_set() and operator== already, but it looks like it doesn't recognize the trivial loops.
Comment 5 Jakub Jelinek 2024-02-15 12:45:51 UTC
For
unsigned int a[1024];

void
foo (void)
{
  for (int i = 0; i < 1024; ++i)
    a[i] = 0;
}

void
bar (void)
{
  for (int i = 0; i < 1024; ++i)
    a[i] = -1;
}
it certainly can.
Comment 6 Richard Biener 2024-02-15 12:51:16 UTC
It can also for an effective memset to a non-constant, but it has to be uniform,
thus 'char'.
Comment 7 Jonathan Wakely 2024-02-15 13:10:30 UTC
Hmm, yes, the code for bitset<N>::set() is actually similar to what we get for foo() in comment 5. The new version with memset does produce different (vectorized?) code though.

For operator== the current code is quite branchy, and looks better with memcmp to me (but I don't really know what I'm talking about).
Comment 8 rhalbersma 2024-02-15 13:36:12 UTC
For bitset::operator==, I wonder why (at last in C++20 and later mode) it is not defaulted? 

For bitset::set and bitset::operator==, I also wonder why the manual loop vs memset/memcmp consteval logic is not delegated to a call of std::fill_n or std::equal, respectively? Then std::bitset is better proofed against future changes in the tradeoffs between manual loops, unrolled loops or library calls, no?
Comment 9 Jonathan Wakely 2024-02-15 13:43:39 UTC
(In reply to rhalbersma from comment #8)
> For bitset::operator==, I wonder why (at last in C++20 and later mode) it is
> not defaulted?

Because nobody bothered to change working code.

> For bitset::set and bitset::operator==, I also wonder why the manual loop vs
> memset/memcmp consteval logic is not delegated to a call of std::fill_n or
> std::equal, respectively?

Those aren't constexpr in C++14, but bitset is. If we delegated to those algos we'd still need a constexpr-in-C++14 manual loop.
Comment 10 Richard Biener 2024-05-07 07:44:57 UTC
GCC 14.1 is being released, retargeting bugs to GCC 14.2.