Bug 110807

Summary: [13 Regression] Copy list initialisation of a vector<bool> raises a warning with -O2
Product: gcc Reporter: Tom Anderson <twic>
Component: libstdc++Assignee: Jonathan Wakely <redi>
Status: RESOLVED FIXED    
Severity: normal CC: aoliva, gabravier, sjames, webrown.cpp
Priority: P3 Keywords: diagnostic, missed-optimization
Version: 13.1.0   
Target Milestone: 13.3   
See Also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110137
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110863
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110859
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114758
Host: Target:
Build: Known to work: 12.1.0, 12.3.0
Known to fail: 13.1.0, 14.0 Last reconfirmed: 2023-07-26 00:00:00
Bug Depends on:    
Bug Blocks: 88443    
Attachments: the source coded needed to reproduce the problem

Description Tom Anderson 2023-07-25 18:26:57 UTC
Created attachment 55631 [details]
the source coded needed to reproduce the problem

If you have a class with a vector<bool> member:

struct Foo {
    std::vector<bool> byCallSpread;
};

And try to initialise it with copy list initialisation:

    Foo() { byCallSpread = {true, false}; }

Then that works fine with the default optimisation level, but gets a warning at -O2 saying:

/usr/local/include/c++/13.1.0/bits/stl_algobase.h:437:30: warning: 'void* __builtin_memmove(void*, const void*, long unsigned int)' writing between 9 and 9223372036854775807 bytes into a region of size 8 overflows the destination [-Wstringop-overflow=]
  437 |             __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
      |             ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(i believe this is the relevant error - i am not very good at reading error messages, so apologies if not)

The warning goes away if the initialisation uses an explicit constructor:

    Foo() { byCallSpread = std::vector<bool>({true, false}); }

I did not get this warning with GCC 7.2.0. According to Compiler Explorer, it does not occur with GCC 12.3.

Here is a transcript of a self-contained session using the GCC 13.1.0 official Docker image (official for Docker, perhaps not for GCC!) which demonstrates the problem:

$ sudo docker run -it gcc:13.1.0
root@4694030a8bea:/# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-linux-gnu/13.1.0/lto-wrapper
Target: x86_64-linux-gnu
Configured with: /usr/src/gcc/configure --build=x86_64-linux-gnu --disable-multilib --enable-languages=c,c++,fortran,go
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.1.0 (GCC) 
root@4694030a8bea:/# cat >foo.cpp
#include <vector>

struct Foo {
    std::vector<bool> byCallSpread;

    Foo() { byCallSpread = {true, false}; }
};

Foo f;

int main(int argc, char** argv) {}
root@4694030a8bea:/# g++ foo.cpp 
root@4694030a8bea:/# g++ -O2 foo.cpp 
In file included from /usr/local/include/c++/13.1.0/vector:62,
                 from foo.cpp:1:
In static member function 'static _Up* std::__copy_move<_IsMove, true, std::random_access_iterator_tag>::__copy_m(_Tp*, _Tp*, _Up*) [with _Tp = long unsigned int; _Up = long unsigned int; bool _IsMove = false]',
    inlined from '_OI std::__copy_move_a2(_II, _II, _OI) [with bool _IsMove = false; _II = long unsigned int*; _OI = long unsigned int*]' at /usr/local/include/c++/13.1.0/bits/stl_algobase.h:506:30,
    inlined from '_OI std::__copy_move_a1(_II, _II, _OI) [with bool _IsMove = false; _II = long unsigned int*; _OI = long unsigned int*]' at /usr/local/include/c++/13.1.0/bits/stl_algobase.h:533:42,
    inlined from '_OI std::__copy_move_a(_II, _II, _OI) [with bool _IsMove = false; _II = long unsigned int*; _OI = long unsigned int*]' at /usr/local/include/c++/13.1.0/bits/stl_algobase.h:540:31,
    inlined from '_OI std::copy(_II, _II, _OI) [with _II = long unsigned int*; _OI = long unsigned int*]' at /usr/local/include/c++/13.1.0/bits/stl_algobase.h:633:7,
    inlined from 'std::vector<bool, _Alloc>::iterator std::vector<bool, _Alloc>::_M_copy_aligned(const_iterator, const_iterator, iterator) [with _Alloc = std::allocator<bool>]' at /usr/local/include/c++/13.1.0/bits/stl_bvector.h:1303:28,
    inlined from 'void std::vector<bool, _Alloc>::_M_insert_range(iterator, _ForwardIterator, _ForwardIterator, std::forward_iterator_tag) [with _ForwardIterator = const bool*; _Alloc = std::allocator<bool>]' at /usr/local/include/c++/13.1.0/bits/vector.tcc:915:33,
    inlined from 'std::vector<bool, _Alloc>::iterator std::vector<bool, _Alloc>::insert(const_iterator, _InputIterator, _InputIterator) [with _InputIterator = const bool*; <template-parameter-2-2> = void; _Alloc = std::allocator<bool>]' at /usr/local/include/c++/13.1.0/bits/stl_bvector.h:1180:19,
    inlined from 'void std::vector<bool, _Alloc>::_M_assign_aux(_ForwardIterator, _ForwardIterator, std::forward_iterator_tag) [with _ForwardIterator = const bool*; _Alloc = std::allocator<bool>]' at /usr/local/include/c++/13.1.0/bits/stl_bvector.h:1440:14,
    inlined from 'void std::vector<bool, _Alloc>::assign(_InputIterator, _InputIterator) [with _InputIterator = const bool*; <template-parameter-2-2> = void; _Alloc = std::allocator<bool>]' at /usr/local/include/c++/13.1.0/bits/stl_bvector.h:935:17,
    inlined from 'std::vector<bool, _Alloc>& std::vector<bool, _Alloc>::operator=(std::initializer_list<bool>) [with _Alloc = std::allocator<bool>]' at /usr/local/include/c++/13.1.0/bits/stl_bvector.h:915:14,
    inlined from 'Foo::Foo()' at foo.cpp:6:40,
    inlined from 'void __static_initialization_and_destruction_0()' at foo.cpp:9:5,
    inlined from '(static initializers for foo.cpp)' at foo.cpp:11:34:
/usr/local/include/c++/13.1.0/bits/stl_algobase.h:437:30: warning: 'void* __builtin_memmove(void*, const void*, long unsigned int)' writing between 9 and 9223372036854775807 bytes into a region of size 8 overflows the destination [-Wstringop-overflow=]
  437 |             __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
      |             ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/local/include/c++/13.1.0/x86_64-linux-gnu/bits/c++allocator.h:33,
                 from /usr/local/include/c++/13.1.0/bits/allocator.h:46,
                 from /usr/local/include/c++/13.1.0/vector:63:
In member function '_Tp* std::__new_allocator<_Tp>::allocate(size_type, const void*) [with _Tp = long unsigned int]',
    inlined from 'static _Tp* std::allocator_traits<std::allocator<_Tp1> >::allocate(allocator_type&, size_type) [with _Tp = long unsigned int]' at /usr/local/include/c++/13.1.0/bits/alloc_traits.h:482:28,
    inlined from 'std::_Bvector_base<_Alloc>::_Bit_pointer std::_Bvector_base<_Alloc>::_M_allocate(std::size_t) [with _Alloc = std::allocator<bool>]' at /usr/local/include/c++/13.1.0/bits/stl_bvector.h:643:48,
    inlined from 'void std::vector<bool, _Alloc>::_M_insert_range(iterator, _ForwardIterator, _ForwardIterator, std::forward_iterator_tag) [with _ForwardIterator = const bool*; _Alloc = std::allocator<bool>]' at /usr/local/include/c++/13.1.0/bits/vector.tcc:913:39,
    inlined from 'std::vector<bool, _Alloc>::iterator std::vector<bool, _Alloc>::insert(const_iterator, _InputIterator, _InputIterator) [with _InputIterator = const bool*; <template-parameter-2-2> = void; _Alloc = std::allocator<bool>]' at /usr/local/include/c++/13.1.0/bits/stl_bvector.h:1180:19,
    inlined from 'void std::vector<bool, _Alloc>::_M_assign_aux(_ForwardIterator, _ForwardIterator, std::forward_iterator_tag) [with _ForwardIterator = const bool*; _Alloc = std::allocator<bool>]' at /usr/local/include/c++/13.1.0/bits/stl_bvector.h:1440:14,
    inlined from 'void std::vector<bool, _Alloc>::assign(_InputIterator, _InputIterator) [with _InputIterator = const bool*; <template-parameter-2-2> = void; _Alloc = std::allocator<bool>]' at /usr/local/include/c++/13.1.0/bits/stl_bvector.h:935:17,
    inlined from 'std::vector<bool, _Alloc>& std::vector<bool, _Alloc>::operator=(std::initializer_list<bool>) [with _Alloc = std::allocator<bool>]' at /usr/local/include/c++/13.1.0/bits/stl_bvector.h:915:14,
    inlined from 'Foo::Foo()' at foo.cpp:6:40,
    inlined from 'void __static_initialization_and_destruction_0()' at foo.cpp:9:5,
    inlined from '(static initializers for foo.cpp)' at foo.cpp:11:34:
/usr/local/include/c++/13.1.0/bits/new_allocator.h:147:55: note: destination object of size 8 allocated by 'operator new'
  147 |         return static_cast<_Tp*>(_GLIBCXX_OPERATOR_NEW(__n * sizeof(_Tp)));
      |                                                       ^

I have attached the code from this session to this issue.
Comment 1 Andrew Pinski 2023-07-26 06:30:30 UTC
Slightly reduced:
#include <vector>

std::vector<bool> byCallSpread;
void f()
{
  byCallSpread = {true};
}
Comment 2 Richard Biener 2023-07-26 07:43:42 UTC
Confirmed.  We're diagnosing the sequence

<bb 2> [local count: 1073741824]:
D.26106[0] = 1;
_7 = byCallSpread.D.26008._M_impl.D.25481._M_start.D.16802._M_p;
_8 = MEM[(const struct _Bit_iterator &)&byCallSpread + 16].D.16802._M_offset;
_9 = MEM[(const struct _Bit_iterator &)&byCallSpread + 16].D.16802._M_p;
_10 = _9 - _7;
_11 = _10 * 8;
_12 = (long int) _8;
_13 = _11 + _12;
_14 = (long unsigned int) _13;
if (_14 > 1)
  goto <bb 3>; [50.00%]
else
  goto <bb 4>; [50.00%]

<bb 4> [local count: 536870913]:
if (_13 == 1)
  goto <bb 5>; [89.00%]
else
  goto <bb 9>; [11.00%]

<bb 9> [local count: 375809640]:
__result ={v} {CLOBBER(eol)};
_118 = MEM[(const struct _Bvector_impl *)&byCallSpread].D.25481._M_end_of_storage;
if (_7 != _118)
  goto <bb 10>; [50.00%]
else
  goto <bb 14>; [50.00%]

<bb 14> [local count: 187904820]:
_316 = operator new (8);
_138 = byCallSpread.D.26008._M_impl.D.25481._M_start.D.16802._M_p;
_339 = _9 - _138;
_775 = (long unsigned int) _339;
if (_339 > 8)
  goto <bb 15>; [90.00%]
else
  goto <bb 16>; [10.00%]

<bb 15> [local count: 169114338]:
__builtin_memmove (_316, _138, _775);

the memmove call in BB 15 is diagnosed, it goes to storage of size 8
and obviously the guarding test just checked we will write more than 8 bytes
here.

I'm just guessing this is a missed optimization (this code must be
unreachable), but possibly again the standard library implementation
might make it impossible for the compiler to prove that given it isn't
able to second-guess constraints in the standard library.
Disclaimer: I didn't try to decipher the shitload of code we generate for
this simple function ;)
Comment 3 Jonathan Wakely 2023-07-26 12:04:02 UTC
I think this is the same warning that causes libstdc++ testsuite failures when testing with --target_board=unix/-D_GLIBCXX_DEBUG

FAIL: 23_containers/vector/bool/swap.cc (test for excess errors)

I thought I'd already reported that, but I can't find it.

The library allocates space for N elements then copies n elements:

      vector(const vector& __x)
      : _Base(_Bit_alloc_traits::_S_select_on_copy(__x._M_get_Bit_allocator()))
      {
	_M_initialize(__x.size());
	_M_copy_aligned(__x.begin(), __x.end(), begin());
      }

We probably have the usual problem that GCC thinks allocating memory might alter __x because it's a global, and so in theory (but never in practice) the program could replace operator new with something insane that modifies the global.

I tried changing the constructor to:

	const_iterator __xbegin = __x.begin(), __xend = __x.end();
	_M_initialize(__x.size());
	_M_copy_aligned(__xbegin, __xend, begin());

That fixes the library test FAILs, but not the case in this bug report.
Comment 4 Jonathan Wakely 2023-07-26 13:34:27 UTC
A similar change in vector<bool>::_M_insert_range fixes this one though.
Comment 5 GCC Commits 2023-07-26 16:02:59 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:7931a1de9ec87b996d51d3d60786f5c81f63919f

commit r14-2797-g7931a1de9ec87b996d51d3d60786f5c81f63919f
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Jul 26 14:09:24 2023 +0100

    libstdc++: Avoid bogus overflow warnings in std::vector<bool> [PR110807]
    
    GCC thinks the allocation can alter the object being copied if it's
    globally reachable, so doesn't realize that [x.begin(), x.end()) after
    the allocation is the same as x.size() before it.
    
    This causes a testsuite failure when testing with -D_GLIBCXX_DEBUG:
    FAIL: 23_containers/vector/bool/swap.cc (test for excess errors)
    A fix is to move the calls to x.begin() and x.end() to before the
    allocation.
    
    A similar problem exists in vector<bool>::_M_insert_range where *this is
    globally reachable, as reported in PR libstdc++/110807. That can also be
    fixed by moving calls to begin() and end() before the allocation.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/110807
            * include/bits/stl_bvector.h (vector(const vector&)): Access
            iterators before allocating.
            * include/bits/vector.tcc (vector<bool>::_M_insert_range):
            Likewise.
            * testsuite/23_containers/vector/bool/110807.cc: New test.
Comment 6 Jonathan Wakely 2023-07-26 16:04:26 UTC
Fixed on trunk so far.
Comment 7 GCC Commits 2023-07-26 18:11:07 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:f30e62b0ee05befd20863466d1fb55a34d15c228

commit r14-2802-gf30e62b0ee05befd20863466d1fb55a34d15c228
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Jul 26 19:08:39 2023 +0100

    libstdc++: Require C++11 for 23_containers/vector/bool/110807.cc [PR110807]
    
    This new test uses uniform initialization syntax, so requires C++11 or
    later.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/110807
            * testsuite/23_containers/vector/bool/110807.cc: Require c++11.
Comment 8 Jonathan Wakely 2023-07-28 10:13:15 UTC
The new test fails with -m32 -D_GLIBCXX_USE_CXX11_ABI=0

I give up.
Comment 9 GCC Commits 2023-07-28 17:32:21 UTC
The releases/gcc-13 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:962cd3e2c475e8197fd0cfa7330a8f352b4ff5b2

commit r13-7625-g962cd3e2c475e8197fd0cfa7330a8f352b4ff5b2
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Jul 26 14:09:24 2023 +0100

    libstdc++: Avoid bogus overflow warnings in std::vector<bool> [PR110807]
    
    GCC thinks the allocation can alter the object being copied if it's
    globally reachable, so doesn't realize that [x.begin(), x.end()) after
    the allocation is the same as x.size() before it.
    
    This causes a testsuite failure when testing with -D_GLIBCXX_DEBUG:
    FAIL: 23_containers/vector/bool/swap.cc (test for excess errors)
    A fix is to move the calls to x.begin() and x.end() to before the
    allocation.
    
    A similar problem exists in vector<bool>::_M_insert_range where *this is
    globally reachable, as reported in PR libstdc++/110807. That can also be
    fixed by moving calls to begin() and end() before the allocation.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/110807
            * include/bits/stl_bvector.h (vector(const vector&)): Access
            iterators before allocating.
            * include/bits/vector.tcc (vector<bool>::_M_insert_range):
            Likewise.
            * testsuite/23_containers/vector/bool/110807.cc: New test.
    
    (cherry picked from commit 7931a1de9ec87b996d51d3d60786f5c81f63919f)
Comment 10 Jonathan Wakely 2023-07-28 19:00:28 UTC
Fixed for 13.3
Comment 11 Alexandre Oliva 2023-11-06 12:20:05 UTC
> The new test fails with -m32

I've looked a bit into why.  The memmove is optimized out by vrp (or, if that's disabled, by dom) on lp64, because it's guarded by two conditions: _10 > sizeof(long), and !(_14 > 1), where _10 is a signed long (ptrdiff_t) computed as the difference between the _M_p of _M_finish and _M_start in the preexisting vector, and _14 = (unsigned long)(_10*8 + _8), where _8 is the vector's finish offset.  in order for the _14 condition to hold, _14 must be 0ul..1ul.

Since _10 is long, _8 promotes to long in lp64, the addition is performed as a signed long, and then converted to unsigned long.  _8 is loaded from memory as an unsigned int, and nothing is known about it, so its promoted operand is 0l..0xffffffffl.  In order for _14 to be <= 1ul, _10 * 8 must be in the range -0xffffffffl..1l, and therefore _10 must be <= 0x1fffffffl..0l, which enables folding of the _10 condition as the entire range is <= sizeof(long).

In the lp32 case, _10 is int, so _10*8 promotes to unsigned int for the addition, whose result is then NOPped to unsigned long.  _8 is also loaded from memory as unsigned int, but because unsigned addition wraps around and _8 covers the full range, nothing can be inferred about the range of _10*8, and thus _10's range is only limited by overflow-avoidance in the signed multiplication: -0x1fffffffl..0x1ffffffl.  Therefore, the _10 compare cannot be folded, and the memmove call remains.

I think the missed optimization and the overall problem stems from the fact that optimizers don't know the actual range of _M_offset.  Ensuring it's visibly normalized at uses in which out-of-range _M_offsets might sneak in might be enough to avoid the warning and enable further optimizations.
Comment 12 Jonathan Wakely 2023-11-06 12:38:40 UTC
I don't think _M_offset can ever be out of range, it's always set by the library code.

Doesn't the warning come from this line, which doesn't use _M_offset anyway?

	_Bit_type* __q = std::copy(__first._M_p, __last._M_p, __result._M_p);

So I'm not sure what we can do in the library to state the invariants in a way the optimizer can understand them.
Comment 13 GCC Commits 2023-11-09 03:30:12 UTC
The master branch has been updated by Alexandre Oliva <aoliva@gcc.gnu.org>:

https://gcc.gnu.org/g:e39b3e02c27bd771a07e385f9672ecf1a45ced77

commit r14-5260-ge39b3e02c27bd771a07e385f9672ecf1a45ced77
Author: Alexandre Oliva <oliva@adacore.com>
Date:   Thu Nov 9 00:01:37 2023 -0300

    libstdc++: optimize bit iterators assuming normalization [PR110807]
    
    The representation of bit iterators, using a pointer into an array of
    words, and an unsigned bit offset into that word, makes for some
    optimization challenges: because the compiler doesn't know that the
    offset is always in a certain narrow range, beginning at zero and
    ending before the word bitwidth, when a function loads an offset that
    it hasn't normalized itself, it may fail to derive certain reasonable
    conclusions, even to the point of retaining useless calls that elicit
    incorrect warnings.
    
    Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit
    list to a global bit vector variable.  Based on the compile-time
    constant length of the list, we decide in _M_insert_range whether to
    use the existing storage or to allocate new storage for the vector.
    After allocation, we decide in _M_copy_aligned how to copy any
    preexisting portions of the vector to the newly-allocated storage.
    When copying two or more words, we use __builtin_memmove.
    
    However, because we compute the available room using bit offsets
    without range information, even comparing them with constants, we fail
    to infer ranges for the preexisting vector depending on word size, and
    may thus retain the memmove call despite knowing we've only allocated
    one word.
    
    Other parts of the compiler then detect the mismatch between the
    constant allocation size and the much larger range that could
    theoretically be copied into the newly-allocated storage if we could
    reach the call.
    
    Ensuring the compiler is aware of the constraints on the offset range
    enables it to do a much better job at optimizing.  Using attribute
    assume (_M_offset <= ...) didn't work, because gimple lowered that to
    something that vrp could only use to ensure 'this' was non-NULL.
    Exposing _M_offset as an automatic variable/gimple register outside
    the unevaluated assume operand enabled the optimizer to do its job.
    
    Rather than placing such load-then-assume constructs all over, I
    introduced an always-inline member function in bit iterators that does
    the job of conveying to the compiler the information that the
    assumption is supposed to hold, and various calls throughout functions
    pertaining to bit iterators that might not otherwise know that the
    offsets have to be in range, so that the compiler no longer needs to
    make conservative assumptions that prevent optimizations.
    
    With the explicit assumptions, the compiler can correlate the test for
    available storage in the vector with the test for how much storage
    might need to be copied, and determine that, if we're not asking for
    enough room for two or more words, we can omit entirely the code to
    copy two or more words, without any runtime overhead whatsoever: no
    traces remain of the undefined behavior or of the tests that inform
    the compiler about the assumptions that must hold.
    
    
    for  libstdc++-v3/ChangeLog
    
            PR libstdc++/110807
            * include/bits/stl_bvector.h (_Bit_iterator_base): Add
            _M_assume_normalized member function.  Call it in _M_bump_up,
            _M_bump_down, _M_incr, operator==, operator<=>, operator<, and
            operator-.
            (_Bit_iterator): Also call it in operator*.
            (_Bit_const_iterator): Likewise.
Comment 14 GCC Commits 2024-02-09 08:46:50 UTC
The releases/gcc-13 branch has been updated by Torbjorn Svensson <azoff@gcc.gnu.org>:

https://gcc.gnu.org/g:5d684a5f7f82b1425cac5eaa11dccc3b51a62d44

commit r13-8309-g5d684a5f7f82b1425cac5eaa11dccc3b51a62d44
Author: Alexandre Oliva <oliva@adacore.com>
Date:   Thu Nov 9 00:01:37 2023 -0300

    libstdc++: optimize bit iterators assuming normalization [PR110807]
    
    The representation of bit iterators, using a pointer into an array of
    words, and an unsigned bit offset into that word, makes for some
    optimization challenges: because the compiler doesn't know that the
    offset is always in a certain narrow range, beginning at zero and
    ending before the word bitwidth, when a function loads an offset that
    it hasn't normalized itself, it may fail to derive certain reasonable
    conclusions, even to the point of retaining useless calls that elicit
    incorrect warnings.
    
    Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit
    list to a global bit vector variable.  Based on the compile-time
    constant length of the list, we decide in _M_insert_range whether to
    use the existing storage or to allocate new storage for the vector.
    After allocation, we decide in _M_copy_aligned how to copy any
    preexisting portions of the vector to the newly-allocated storage.
    When copying two or more words, we use __builtin_memmove.
    
    However, because we compute the available room using bit offsets
    without range information, even comparing them with constants, we fail
    to infer ranges for the preexisting vector depending on word size, and
    may thus retain the memmove call despite knowing we've only allocated
    one word.
    
    Other parts of the compiler then detect the mismatch between the
    constant allocation size and the much larger range that could
    theoretically be copied into the newly-allocated storage if we could
    reach the call.
    
    Ensuring the compiler is aware of the constraints on the offset range
    enables it to do a much better job at optimizing.  Using attribute
    assume (_M_offset <= ...) didn't work, because gimple lowered that to
    something that vrp could only use to ensure 'this' was non-NULL.
    Exposing _M_offset as an automatic variable/gimple register outside
    the unevaluated assume operand enabled the optimizer to do its job.
    
    Rather than placing such load-then-assume constructs all over, I
    introduced an always-inline member function in bit iterators that does
    the job of conveying to the compiler the information that the
    assumption is supposed to hold, and various calls throughout functions
    pertaining to bit iterators that might not otherwise know that the
    offsets have to be in range, so that the compiler no longer needs to
    make conservative assumptions that prevent optimizations.
    
    With the explicit assumptions, the compiler can correlate the test for
    available storage in the vector with the test for how much storage
    might need to be copied, and determine that, if we're not asking for
    enough room for two or more words, we can omit entirely the code to
    copy two or more words, without any runtime overhead whatsoever: no
    traces remain of the undefined behavior or of the tests that inform
    the compiler about the assumptions that must hold.
    
    for  libstdc++-v3/ChangeLog
    
            PR libstdc++/110807
            * include/bits/stl_bvector.h (_Bit_iterator_base): Add
            _M_assume_normalized member function.  Call it in _M_bump_up,
            _M_bump_down, _M_incr, operator==, operator<=>, operator<, and
            operator-.
            (_Bit_iterator): Also call it in operator*.
            (_Bit_const_iterator): Likewise.
    
    (cherry picked from commit e39b3e02c27bd771a07e385f9672ecf1a45ced77)