Bug 105329 - [12/13/14 Regression] Bogus restrict warning when assigning 1-char string literal to std::string since r12-3347-g8af8abfbbace49e6
Summary: [12/13/14 Regression] Bogus restrict warning when assigning 1-char string lit...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 12.0
: P2 normal
Target Milestone: 12.4
Assignee: Not yet assigned to anyone
URL:
Keywords: diagnostic, missed-optimization
: 104336 107069 (view as bug list)
Depends on:
Blocks: Wrestrict
  Show dependency treegraph
 
Reported: 2022-04-21 07:41 UTC by Boris Kolpackov
Modified: 2023-06-28 21:41 UTC (History)
28 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2022-04-21 00:00:00


Attachments
gcc13-pr105329.patch (2.06 KB, patch)
2022-07-26 12:05 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Boris Kolpackov 2022-04-21 07:41:57 UTC
When compiling my codebase (build2) with GCC 12 (12.0.1 20220421) the output is littered with new bogus (AFAICS) -Wrestrict warnings that seems to be triggered by any attempt to assign or append a 1-char string literal to std::string:

$ cat <<EOF >bogus-restrict.cxx
#include <string>

void f (std::string& s)
{
  s = "5";
}
EOF

$ g++ -Wall -Wextra -O3 -c -std=c++23 bogus-restrict.cxx
In file included from /home/boris/work/build2/tests/modules/gcc2/gcc-install/include/c++/12.0.1/string:40,
                 from bogus-restrict.cxx:1:
In static member function ‘static constexpr std::char_traits<char>::char_type* std::char_traits<char>::copy(char_type*, const char_type*, std::size_t)’,
    inlined from ‘static constexpr void std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_S_copy(_CharT*, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /home/boris/work/build2/tests/modules/gcc2/gcc-install/include/c++/12.0.1/bits/basic_string.h:423:21,
    inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Allocator>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_M_replace(size_type, size_type, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /home/boris/work/build2/tests/modules/gcc2/gcc-install/include/c++/12.0.1/bits/basic_string.tcc:532:22,
    inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::assign(const _CharT*) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /home/boris/work/build2/tests/modules/gcc2/gcc-install/include/c++/12.0.1/bits/basic_string.h:1647:19,
    inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::operator=(const _CharT*) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /home/boris/work/build2/tests/modules/gcc2/gcc-install/include/c++/12.0.1/bits/basic_string.h:815:28,
    inlined from ‘void f(std::string&)’ at bogus-restrict.cxx:5:7:
/home/boris/work/build2/tests/modules/gcc2/gcc-install/include/c++/12.0.1/bits/char_traits.h:431:56: warning: ‘void* __builtin_memcpy(void*, const void*, long unsigned int)’ accessing 9223372036854775810 or more bytes at offsets -4611686018427387902 and [-4611686018427387903, 4611686018427387904] may overlap up to 9223372036854775813 bytes at offset -3 [-Wrestrict]
  431 |         return static_cast<char_type*>(__builtin_memcpy(__s1, __s2, __n));
      |
Comment 1 Richard Biener 2022-04-21 11:15:51 UTC
It warns about

__builtin_memcpy (_140, _29, _28);

in

<bb 15> [local count: 14831835]:
__nleft_49 = (const size_type) _48;
__builtin_memcpy (_22, "5", __nleft_49);
_28 = 1 - __nleft_49;
_29 = _22 + 1;
_140 = _22 + __nleft_49;
__builtin_memcpy (_140, _29, _28);
pretmp_145 = MEM[(const struct basic_string *)s_2(D)]._M_dataplus._M_p;
goto <bb 19>; [100.00%]

the block is guarded with

if (_22 >= &MEM <const char[2]> [(void *)"5" + 1B])
  goto <bb 10>; [50.00%]
else
  goto <bb 11>; [50.00%]

if (_22 <= "5")
  goto <bb 12>; [50.00%]
else
  goto <bb 13>; [50.00%]

<bb 13> [local count: 22472477]:
_48 = _22 - "5";
if (_48 == 1)
  goto <bb 14>; [34.00%]
else
  goto <bb 15>; [66.00%]

so we're taking the path where "5" is not one character in size.  We should
have somehow simplified this.  It looks like a missed jump-threading,
not sure if we were able to catch this with the forward threader or the
old VRP.
Comment 2 Andrew Macleod 2022-04-21 20:50:37 UTC
The problem goes away if we use --param=evrp-mode-legacy.  Ranger picks up a few more things and we eliminate a couple of latter comparisons which in turn affect threading, etc etc. and we present the restrict pass with something different  that causes issues.  

I'm in the middle of analyzing it and will post the results when I'm done.
Comment 3 Andrew Macleod 2022-04-21 22:51:08 UTC
Just to bookmark where the analysis is since im out for a few days, in the restrict pass, with ranger tracing on, the code sequence is:

 
  <bb 15> [local count: 14831835]:
  __nleft_49 = (const size_type) _48;
  __builtin_memcpy (_22, "5", __nleft_49);

and the ranger trace from wrestrict shows:
             TRUE : (5628) range_of_expr (_48) long int [-INF, 0][2, +INF]
           TRUE : (5617) range_of_stmt (__nleft_49) const size_type [2, 9223372036854775807]

So we know _48 is non-zero, and when I further delve into things, ranger is calculating __nleft_49 correct as  [0, 0][2, +INF], but when it merges this with the current known global range, THAT has been set somewhere as [2, 9223372036854775807] 

Im trying to find where the global value is first set. when I put a breakpoint in the set and get routines, the very first thing that triggers is:

#0  gimple_range_global (name=0x7fffefd0be10) at /opt/notnfs/amacleod/master/gcc/gcc/value-query.cc:419
#1  0x0000000002e73534 in ranger_cache::get_global_range (this=0x41797b8, r=..., name=0x7fffefd0be10) at /opt/notnfs/amacleod/master/gcc/gcc/gimple-range-cache.cc:925
#2  0x0000000002e73580 in ranger_cache::get_global_range (this=0x41797b8, r=..., name=0x7fffefd0be10, current_p=@0x7fffffff9c37: false) at /opt/notnfs/amacleod/master/gcc/gcc/gimple-range-cache.cc:939
#3  0x0000000002e6f731 in gimple_ranger::range_of_stmt (this=0x4179790, r=..., s=0x7fffef265000, name=0x7fffefd0be10) at /opt/notnfs/amacleod/master/gcc/gcc/gimple-range.cc:307
#4  0x0000000002e6edc2 in gimple_ranger::range_on_entry (this=0x4179790, r=..., bb=0x7fffeecc4340, name=0x7fffefd0be10) at /opt/notnfs/amacleod/master/gcc/gcc/gimple-range.cc:151
#5  0x0000000002e6ec40 in gimple_ranger::range_of_expr (this=0x4179790, r=..., expr=0x7fffefd0be10, stmt=0x7fffeeca8f78) at /opt/notnfs/amacleod/master/gcc/gcc/gimple-range.cc:128
#6  0x0000000001b4cd0d in get_range (val=0x7fffefd0be10, stmt=0x7fffeeca8f78, minmax=0x7fffffffbe10, rvals=0x4179790) at /opt/notnfs/amacleod/master/gcc/gcc/tree-ssa-strlen.cc:219
#7  0x00000000016f8fae in get_offset_range (x=0x7fffefd0be10, stmt=0x7fffeeca8f78, r=0x7fffffffc0b0, rvals=0x4179790) at /opt/notnfs/amacleod/master/gcc/gcc/pointer-query.cc:92
#8  0x0000000001702c2f in handle_ssa_name (ptr=0x7fffef8d20d8, addr=false, ostype=0, pref=0x7fffffffc5d0, snlim=..., qry=0x4167468) at /opt/notnfs/amacleod/master/gcc/gcc/pointer-query.cc:2157
#9  0x000000000170375b in compute_objsize_r (ptr=0x7fffef8d20d8, stmt=0x7fffeeca8f78, addr=false, ostype=0, pref=0x7fffffffc5d0, snlim=..., qry=0x4167468)
    at /opt/notnfs/amacleod/master/gcc/gcc/pointer-query.cc:2321
#10 0x0000000001703907 in compute_objsize (ptr=0x7fffef8d20d8, stmt=0x7fffeeca8f78, ostype=0, pref=0x7fffffffc5d0, ptr_qry=0x4167468) at /opt/notnfs/amacleod/master/gcc/gcc/pointer-query.cc:2355
#11 0x00000000016ff4e4 in pointer_query::get_ref (this=0x4167468, ptr=0x7fffef8d20d8, stmt=0x7fffeeca8f78, pref=0x7fffffffc5d0, ostype=0) at /opt/notnfs/amacleod/master/gcc/gcc/pointer-query.cc:1505
#12 0x00000000013b99c3 in (anonymous namespace)::pass_waccess::check_dangling_stores (this=0x4167410, bb=0x7fffeecc4340, stores=..., bbs=...)
    at /opt/notnfs/amacleod/master/gcc/gcc/gimple-ssa-warn-access.cc:4528

with
p vr.dump(stderr)
const size_type [1, 9223372036854775807]

I also have a breakpoint in set_range_info for this name which hasn't been triggered.   So either the set routine have been bypassed or perhaps inlining is setting this global value?

Im still trying to figure out who and where has decided that __nleft_49 is [2, 0x7FFFFFFFFFFFFFFF] instead of [2, 0xFFFFFFFFFFFFFFFFFFFF]
Comment 4 Aldy Hernandez 2022-04-22 10:31:55 UTC
(In reply to Andrew Macleod from comment #3)

> p vr.dump(stderr)
> const size_type [1, 9223372036854775807]
> 
> I also have a breakpoint in set_range_info for this name which hasn't been
> triggered.   So either the set routine have been bypassed or perhaps
> inlining is setting this global value?
> 
> Im still trying to figure out who and where has decided that __nleft_49 is
> [2, 0x7FFFFFFFFFFFFFFF] instead of [2, 0xFFFFFFFFFFFFFFFFFFFF]

Global ranges can also be set via duplicate_ssa_name_range_info, which is used by the inliner.

It looks like [1, 9223372036854775807] was originally calculated in evrp for __nleft_64:

=========== BB 45 ============
Imports: _27  __s_53(D)  
Exports: _27  _34  __s_53(D)  __nleft_64  
         _34 : _27(I)  __s_53(D)(I)  
         __nleft_64 : _27(I)  _34  __s_53(D)(I)  
_27	char * [1B, -2B]
__s_53(D)	const char * [0B, -3B]
Relational : (__s_53(D) < _26)
Relational : (_27 > __s_53(D))
    <bb 45> :
    _34 = _27 - __s_53(D);
    __nleft_64 = (const size_type) _34;
    if (_34 == 1)
      goto <bb 46>; [34.00%]
    else
      goto <bb 47>; [66.00%]

_34 : long int [1, +INF]
__nleft_64 : long unsigned int [1, 9223372036854775807]
45->46  (T) _27 : 	char * [1B, -2B]
45->46  (T) _34 : 	long int [1, 1]
45->46  (T) __s_53(D) : 	const char * [0B, -3B]
45->46  (T) __nleft_64 : 	long unsigned int [1, 9223372036854775807]
45->47  (F) _27 : 	char * [1B, -2B]
45->47  (F) _34 : 	long int [2, +INF]
45->47  (F) __s_53(D) : 	const char * [0B, -3B]
45->47  (F) __nleft_64 : 	long unsigned int [1, 9223372036854775807]

This was later renamed to __nleft_40, and then copied by the IPA inliner to __nleft_47 and ultimately to __nleft_49.
Comment 5 Andrew Macleod 2022-04-25 17:50:46 UTC
Before inlining, the general code see:
  if (_27 <= __s_53(D))
    goto <bb 36>; [INV]
  else
    goto <bb 40>; [INV]

<bb40>
  _34 = _27 - __s_53(D);
  __nleft_64 = (const size_type) _34

THe branch now registers the relation that __s_53 < _27, and this allows us to now  determine that _34 and _nleft_64 is [1, 0x7FFFFFFF] 


THen we inline this code, and by the time we get to the restrict code, we have the following IL:
 <bb 9> [local count: 89889908]:
  if (_22 >= &MEM <const char[2]> [(void *)"5" + 1B])
    goto <bb 10>; [50.00%]
  else
    goto <bb 11>; [50.00%]

<..>

 <bb 11> [local count: 44944954]:
  if (_22 <= "5")
    goto <bb 12>; [50.00%]
  else
    goto <bb 13>; [50.00%]

<...>

<bb 13> [local count: 22472477]:
  _48 = _22 - "5";
  if (_48 == 1)
    goto <bb 14>; [34.00%]
  else
    goto <bb 15>; [66.00%]

<bb 14> [local count: 7640642]:
  MEM[(char_type &)_22] = 53;
  pretmp_64 = MEM[(const struct basic_string *)s_2(D)]._M_dataplus._M_p;
  goto <bb 19>;

<bb15>
  __nleft_49 = (const size_type) _48;
  __builtin_memcpy (_22, "5", __nleft_49);
  _28 = 1 - __nleft_49;
  _29 = _22 + 1;
  _140 = _22 + __nleft_49;
  __builtin_memcpy (_140, _29, _28);


Our problem seems rooted in the calculation of _28. 

Ranger has figured out via feeding calculations based on the comparisons that nleft_49 cannot be 0 or 1 either, and therefore must be [2, 0x7FFFFFFF]

But it means that 
  _28 = 1 - __nleft_49;
 becomes a calculated range of  [9223372036854775810, +INF]  
And that makes the warning code very unhappy when it is used as the number of bytes in the second memcpy.   Previously, we hadn't made some of these conclusions, so we were looking at VARYING, so the warning code presumably ignored it. 

This code is actually dead, but its not being figured out.

the pointer tracking does not recognize that if this branch is NOT taken:
  if (_22 >= &MEM <const char[2]> [(void *)"5" + 1B])

then we follow the following sequence:

  <bb 11> [local count: 44944954]:
  if (_22 <= "5")
    goto <bb 12>; [50.00%]
  else
    goto <bb 13>; [50.00%]

  <bb 12> [local count: 22472477]:
  _42 = "5" - _22;
  _43 = (long unsigned int) _42;
  __poff_45 = _43 + 1;
  _46 = _22 + __poff_45;
  _47 = MEM[(const char_type &)_46];
  MEM[(char_type &)_22] = _47;
  pretmp_83 = MEM[(const struct basic_string *)s_2(D)]._M_dataplus._M_p;
  goto <bb 19>; [100.00%]

  <bb 13> [local count: 22472477]:
  _48 = _22 - "5";
  if (_48 == 1)

I believe the very first if can never take the edge 11->13... and thus the rest of that code should go.

for the general case, The tracking of _22 needs to recognize that when we get to bb12 that the expression "5" - 22 and in bb13 _22 - "5" have very strict ranges.  in fact, they are [0,0] in think case I think.  

That is certainly beyond rangers current capabilities. I thought that  the reference tracking used by the warning system tried to do that. Regardless, when we introduce prange (pointer ranges) next cycle perhaps there is something we can do to track the base an offsets based on the conditions.
Comment 6 Mattias Ellert 2022-04-25 22:25:49 UTC
This is a duplicate of bug #104336.
Comment 7 Martin Liška 2022-04-26 15:04:29 UTC
Started with r12-3347-g8af8abfbbace49e6.
Comment 8 Richard Biener 2022-05-02 12:44:06 UTC
Btw, requires -std=c++20 but -O2 is enough, -O3 not needed.

To quote again:

if (_22 >= &MEM <const char[2]> [(void *)"5" + 1B])
  goto <bb 10>; [50.00%]
else
  goto <bb 11>; [50.00%]

<bb 11> [local count: 44944954]:
if (_22 <= "5")
  goto <bb 12>; [50.00%]
else
  goto <bb 13>; [50.00%]

<bb 13> [local count: 22472477]:
_48 = _22 - "5";
if (_48 == 1)
  goto <bb 14>; [34.00%]
else
  goto <bb 15>; [66.00%]

the "simple" thing we fail to thread is

  if (_22 >= _1 + 1)
    ;
  else
    {
       if (_22 <= _1)
          ; // this must be true, _22 < _1 is true even
    }

in fact we fail to canonicalize _22 >= &MEM <const char[2]> [(void *)"5" + 1B]
to _22 > "5".  fold_comparison via maybe_canonicalize_comparison does such
thing but not for pointers or &MEM.

The following (too special) simplifies the above to

  <bb 9> [local count: 89889908]:
  if (_22 > &MEM <const char[2]> [(void *)"5"])
    goto <bb 10>; [50.00%]
  else
    goto <bb 11>; [50.00%]

  <bb 11> [local count: 44944954]:
  if (_22 <= "5")
    goto <bb 12>; [50.00%]
  else
    goto <bb 13>; [50.00%]

but we still won't simplify the second compare.

diff --git a/gcc/match.pd b/gcc/match.pd
index 6d691d302b3..cb16694a150 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5397,6 +5397,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        (if (known_eq (off, 0))
         { constant_boolean_node (cmp == EQ_EXPR, type); }))))))))
 
+(for cmp (ge le)
+     cmpp (gt lt)
+ (simplify
+  (cmp:c @0 addr@1)
+  (with
+    { tree m = TREE_OPERAND (@1, 0); }
+    (if (TREE_CODE (m) == MEM_REF
+        && integer_onep (TREE_OPERAND (m, 1)))
+     (cmpp @0 { build1 (ADDR_EXPR, TREE_TYPE (@1),
+                       build2 (MEM_REF, TREE_TYPE (m), TREE_OPERAND (m, 0),
+                               build_zero_cst (TREE_TYPE (TREE_OPERAND (m, 1))))); })))))
+
 /* Equality compare simplifications from fold_binary  */
 (for cmp (eq ne)
 

The ifcombine eventually arrives in

Breakpoint 5, tree_ssa_ifcombine_bb_1 (inner_cond_bb=<basic_block 0x7ffff463bf08 (11)>, outer_cond_bb=<basic_block 0x7ffff4740af8 (9)>, then_bb=<basic_block 0x7ffff463bf70 (12)>, else_bb=<basic_block 0x7ffff47728f0 (13)>, phi_pred_bb=<basic_block 0x7ffff463bf08 (11)>) at /home/rguenther/src/gcc-12-branch/gcc/tree-ssa-ifcombine.cc:646
646       if (phi_pred_bb != else_bb
<bb 9> [local count: 89889908]:
if (_22 > &MEM <const char[2]> [(void *)"5"])
  goto <bb 10>; [50.00%]
else
  goto <bb 11>; [50.00%]

$7 = void
<bb 11> [local count: 44944954]:
if (_22 <= "5")
  goto <bb 12>; [50.00%]
else
  goto <bb 13>; [50.00%]

$8 = void

but the CFG doesn't resemble any of the forms it handles and it does not
try to catch the case where the inner condition would simplify (basically
thread it without any code duplication).  So it doesn't really fit
ifcombine.

The old VRP pass sees

  <bb 12> [local count: 44944954]:
  _112 = ASSERT_EXPR <_22, _22 <= &MEM <const char[2]> [(void *)"5"]>;
  if (_112 <= "5")
    goto <bb 13>; [50.00%]
  else
    goto <bb 14>; [50.00%]

but concludes

xtract_range_from_stmt visiting:
_112 = ASSERT_EXPR <_22, _22 <= &MEM <const char[2]> [(void *)"5"]>;
Found new range for _112: char * VARYING

extract_range_from_stmt visiting:
if (_112 <= "5")

Visiting conditional with predicate: if (_112 <= "5")

With known ranges
        _112: char * VARYING

Predicate evaluates to: DON'T KNOW

it might reason that _22 <= &MEM <const char[2]> [(void *)"5"] means
_22 == &"5" and track the range of _22 as constant.

That said, we relied on jump threading for these kind of simplifications
but we are not good at this particular case.
Comment 9 Richard Biener 2022-05-02 12:45:11 UTC
*** Bug 104336 has been marked as a duplicate of this bug. ***
Comment 10 Richard Biener 2022-05-02 12:46:00 UTC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104336#c1 contains a workaround for libstdc++
Comment 11 Richard Biener 2022-05-02 12:57:39 UTC
(In reply to Richard Biener from comment #10)
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104336#c1 contains a workaround
> for libstdc++

Maybe

      _GLIBCXX20_CONSTEXPR
      basic_string&
      assign(const _CharT* __s)
      {
        __glibcxx_requires_string(__s);
        return _M_replace(size_type(0), this->size(), __s,
                          traits_type::length(__s));
      }

isn't the most efficient way to assign a string (constant) looking at
the complexity of _M_replace and its overlap checks (that we all expect
to be optimized away but which are not).  In fact the _M_disjunct check
isn't optimized because we cannot tell if the std::string storage
overlaps the constant pool ...
Comment 12 Richard Biener 2022-05-02 13:22:27 UTC
(In reply to Richard Biener from comment #11)
> (In reply to Richard Biener from comment #10)
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104336#c1 contains a workaround
> > for libstdc++
> 
> Maybe
> 
>       _GLIBCXX20_CONSTEXPR
>       basic_string&
>       assign(const _CharT* __s)
>       {
>         __glibcxx_requires_string(__s);
>         return _M_replace(size_type(0), this->size(), __s,
>                           traits_type::length(__s));
>       }
> 
> isn't the most efficient way to assign a string (constant) looking at
> the complexity of _M_replace and its overlap checks (that we all expect
> to be optimized away but which are not).  In fact the _M_disjunct check
> isn't optimized because we cannot tell if the std::string storage
> overlaps the constant pool ...

Maybe

diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h
index c3fbc53953c..e83f3e8afa4 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -407,7 +407,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
       bool
       _M_disjunct(const _CharT* __s) const _GLIBCXX_NOEXCEPT
       {
-       return (less<const _CharT*>()(__s, _M_data())
+       return (__builtin_constant_p (__s)
+               || less<const _CharT*>()(__s, _M_data())
                || less<const _CharT*>()(_M_data() + this->size(), __s));
       }
 
but oddly we folds the __builtin_constant_p to false early
but we treat __builtin_constant_p ("foo") as true.  That's inconsistent in
fold_builtin_constant_p:

  /* If this expression has side effects, show we don't know it to be a
     constant.  Likewise if it's a pointer or aggregate type since in
     those case we only want literals, since those are only optimized
     when generating RTL, not later.
     And finally, if we are compiling an initializer, not code, we
     need to return a definite result now; there's not going to be any
     more optimization done.  */
  if (TREE_SIDE_EFFECTS (arg)
      || AGGREGATE_TYPE_P (TREE_TYPE (arg))
      || POINTER_TYPE_P (TREE_TYPE (arg))
      || cfun == 0
      || folding_initializer
      || force_folding_builtin_constant_p)
    return integer_zero_node;

fixing that (removing POINTER_TYPE_P) makes the workaround work (and
the generated code simplifies greatly).
Comment 13 Jakub Jelinek 2022-05-02 13:33:21 UTC
I think this is quite related to the PR98465 discussions.
The warning is on dead code, for basic_string at least when using reasonable allocators the _M_dataplus._M_p member will always point either into the
_M_local_buf embedded buffer, or to something that has been allocated by the allocator.  In particular, it should never point into .rodata objects, or when using a reasonable allocator to global or automatic vars (with the exception of the embedded buffer).
There is the _M_disjunct method that returns whether the std::string could overlap with the second argument, if we'd say through some magic attribute on the   _M_dataplus._M_p were able to tell the optimizers perhaps through a builtin that there is no overlap, we would optimize away the dead code and be fine (not just for the questionable warnings, but more importantly for code generation as well).
Another approach to deal with this exact case might be some builtin that would find out if the argument must be in read-only memory ("5" in this case) and use that as an additional check in _M_disjunct, say that then the arg must be read-only, it must be disjunct because the string is necessarily writable.
As has been discussed, yet another possibility would be to handle the rare case (not disjunct) in out of line (ideally libstdc++.so.6 contained (at least for the usual instantiations)) code, which would also make the rare case smaller and such warnings not really visible to the optimizers.
Comment 14 Jakub Jelinek 2022-05-02 13:58:50 UTC
Here is an untested patch for the cold part of the function:
--- libstdc++-v3/include/bits/basic_string.h.jj	2022-01-21 22:48:42.220261654 +0100
+++ libstdc++-v3/include/bits/basic_string.h	2022-05-02 15:51:42.381923962 +0200
@@ -2504,6 +2504,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
       _M_replace_aux(size_type __pos1, size_type __n1, size_type __n2,
 		     _CharT __c);
 
+      __attribute__((__noinline__, __noclone__, __cold__)) void
+      _M_replace_cold(pointer __p, size_type __len1, const _CharT* __s,
+		      const size_type __len2, const size_type __how_much);
+
       _GLIBCXX20_CONSTEXPR
       basic_string&
       _M_replace(size_type __pos, size_type __len1, const _CharT* __s,
--- libstdc++-v3/include/bits/basic_string.tcc.jj	2022-01-11 22:31:41.482757256 +0100
+++ libstdc++-v3/include/bits/basic_string.tcc	2022-05-02 15:51:38.630975348 +0200
@@ -471,6 +471,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     }
 
   template<typename _CharT, typename _Traits, typename _Alloc>
+    __attribute__((__noinline__, __noclone__, __cold__)) void
+    basic_string<_CharT, _Traits, _Alloc>::
+    _M_replace_cold(pointer __p, size_type __len1, const _CharT* __s,
+		    const size_type __len2, const size_type __how_much)
+    {
+      // Work in-place.
+      if (__len2 && __len2 <= __len1)
+	this->_S_move(__p, __s, __len2);
+      if (__how_much && __len1 != __len2)
+	this->_S_move(__p + __len2, __p + __len1, __how_much);
+      if (__len2 > __len1)
+	{
+	  if (__s + __len2 <= __p + __len1)
+	    this->_S_move(__p, __s, __len2);
+	  else if (__s >= __p + __len1)
+	    {
+	      // Hint to middle end that __p and __s overlap
+	      // (PR 98465).
+	      const size_type __poff = (__s - __p) + (__len2 - __len1);
+	      this->_S_copy(__p, __p + __poff, __len2);
+	    }
+	  else
+	    {
+	      const size_type __nleft = (__p + __len1) - __s;
+	      this->_S_move(__p, __s, __nleft);
+	      this->_S_copy(__p + __nleft, __p + __len2, __len2 - __nleft);
+	    }
+	}
+    }
+
+  template<typename _CharT, typename _Traits, typename _Alloc>
     _GLIBCXX20_CONSTEXPR
     basic_string<_CharT, _Traits, _Alloc>&
     basic_string<_CharT, _Traits, _Alloc>::
@@ -500,7 +531,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	    }
 	  else
 #endif
-	  if (_M_disjunct(__s))
+	  if (__builtin_expect(_M_disjunct(__s),true))
 	    {
 	      if (__how_much && __len1 != __len2)
 		this->_S_move(__p + __len2, __p + __len1, __how_much);
@@ -508,32 +539,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		this->_S_copy(__p, __s, __len2);
 	    }
 	  else
-	    {
-	      // Work in-place.
-	      if (__len2 && __len2 <= __len1)
-		this->_S_move(__p, __s, __len2);
-	      if (__how_much && __len1 != __len2)
-		this->_S_move(__p + __len2, __p + __len1, __how_much);
-	      if (__len2 > __len1)
-		{
-		  if (__s + __len2 <= __p + __len1)
-		    this->_S_move(__p, __s, __len2);
-		  else if (__s >= __p + __len1)
-		    {
-		      // Hint to middle end that __p and __s overlap
-		      // (PR 98465).
-		      const size_type __poff = (__s - __p) + (__len2 - __len1);
-		      this->_S_copy(__p, __p + __poff, __len2);
-		    }
-		  else
-		    {
-		      const size_type __nleft = (__p + __len1) - __s;
-		      this->_S_move(__p, __s, __nleft);
-		      this->_S_copy(__p + __nleft, __p + __len2,
-				    __len2 - __nleft);
-		    }
-		}
-	    }
+	    _M_replace_cold(__p, __len1, __s, __len2, __how_much);
 	}
       else
 	this->_M_mutate(__pos, __len1, __s, __len2);
Note, on the #c0 testcase alone it actually results in larger generated code (unless we arrange for it to be exported from libstdc++, for which it is quite late for 12.1 right now), but say on:
#include <string>

void f (std::string& s)
{
  s = "5";
}

void g (std::string& s)
{
  s = "56";
}

void h (std::string& s)
{
  s = "6";
}
it is already smaller (note, obviously it doesn't need to be in the same TU).
Comment 15 Jakub Jelinek 2022-05-02 16:04:43 UTC
Note, while the patch makes the warning go away, it exports
_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE15_M_replace_coldEPcmPKcmm@@GLIBCXX_3.4.21
_ZNSt7__cxx1112basic_stringIwSt11char_traitsIwESaIwEE15_M_replace_coldEPwmPKwmm@@GLIBCXX_3.4.21
which is certainly undesirable.  Either we'd need to make sure it isn't exported, or export @@GLIBCXX_3.4.30 instead and somehow arrange for out of line copy being used from libstdc++ rather than emitted into every library or binary that uses it.
Comment 16 Jakub Jelinek 2022-05-02 19:43:39 UTC
(In reply to Richard Biener from comment #11)
> Maybe
> 
>       _GLIBCXX20_CONSTEXPR
>       basic_string&
>       assign(const _CharT* __s)
>       {
>         __glibcxx_requires_string(__s);
>         return _M_replace(size_type(0), this->size(), __s,
>                           traits_type::length(__s));
>       }
> 
> isn't the most efficient way to assign a string (constant) looking at

For string constant sure, but unless we know that __s points to a string literal, we need to do those overlap checks I'm afraid.
Or is something like:
void g (std::string& s)
{
  s = s.c_str() + 16;
}
invalid if s is before known to contain a string longer than 16 chars?
Comment 17 Jonathan Wakely 2022-05-02 22:08:05 UTC
That's valid
Comment 18 Mattias Ellert 2022-05-06 03:59:52 UTC
If the additional symbols are undesirable, using the workaround proposed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104336#c1 is an alternative. This silences the warning without adding additional symbols.

This proposed change is similar to the change introduced a few lines further up in the same file in the following commit:

https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=e14ea108faa6eba6a60a45ff0ca3099ce6ae45c2

This change was introduced to fix a similar false positive warning in GCC 11.
Comment 19 Jakub Jelinek 2022-05-06 08:33:20 UTC
GCC 12.1 is being released, retargeting bugs to GCC 12.2.
Comment 20 Jakub Jelinek 2022-07-26 12:05:15 UTC
Created attachment 53359 [details]
gcc13-pr105329.patch

Untested workaround.
Comment 21 Richard Biener 2022-08-19 08:26:15 UTC
GCC 12.2 is being released, retargeting bugs to GCC 12.3.
Comment 22 CVS Commits 2022-09-12 09:37:13 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:723ef5a937dbab5e7a35761fd7f0ff0c76849340

commit r13-2618-g723ef5a937dbab5e7a35761fd7f0ff0c76849340
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Mon Sep 12 11:31:11 2022 +0200

    libstdc++: Outline the overlapping case of string _M_replace into a separate function [PR105329]
    
    The following patch is partially a workaround for bogus warnings
    when the compiler isn't able to fold _M_disjunct call into constant
    false, but also an optimization attempt - assuming _M_disjunct (__s)
    is rare, the patch should shrink code size for the common case and
    use library or for non-standard instantiations an out of line
    function to handle the rare case.
    
    2022-09-12  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/105329
            * acinclude.m4 (libtool_VERSION): Change to 6:31:0.
            * config/abi/pre/gnu.ver (GLIBCXX_3.4.21): Don't export
            std::basic_string methods with name length of 15.
            (GLIBCXX_3.4.31): Export std::basic_string::_M_replace_cold.
            * testsuite/util/testsuite_abi.cc (check_version): Handle
            GLIBCXX_3.4.31.
            * include/bits/basic_string.h (std::basic_string::_M_replace_cold):
            Declare.
            * include/bits/basic_string.tcc (std::basic_string::_M_replace_cold):
            Define and export even for C++20.
            (std::basic_string::_M_replace): Use __builtin_expect, outline
            the overlapping case to _M_replace_cold.
            * configure: Regenerated.
Comment 23 Martin Sebor 2022-09-29 15:19:59 UTC
*** Bug 107069 has been marked as a duplicate of this bug. ***
Comment 24 Dan Stahlke 2022-09-30 16:55:02 UTC
Here is another test case, broken on 12.2 with "-std=c++20 -Wrestrict -O3" but working on trunk and working on 12.2 with -O2 rather than -O3.

#include <string>
#include <tuple> // std::ignore

int main()
{
    std::string s = "b";
    std::ignore = "a" + std::move(s);
}
Comment 25 Dan Stahlke 2022-09-30 17:16:29 UTC
The test case I just posted appears in the bug 105651 discussion.  So maybe or maybe not related to the present discussion.
Comment 26 yan12125 2023-02-21 18:30:51 UTC
Thanks for the workaround in r13-2618-g723ef5a937dbab5e7a35761fd7f0ff0c76849340, I can confirm it fixes breakage with std::string. Could it be backported to the gcc-12 branch?
Comment 27 Jonathan Wakely 2023-02-21 19:28:26 UTC
No, the new function requires exporting a new symbol from the shared library, which is not possible for the stable release branch.
Comment 28 yan12125 2023-02-22 17:28:35 UTC
Thanks, so that commit changes ABI - objects built by patched GCC will not link to unpatched libstdc++. I will stick to -Wno-restrict for now.
Comment 29 Jonathan Wakely 2023-02-22 17:52:22 UTC
It adds a new symbol to the library, which is not usually considered an ABI change, because it's backwards compatible. Compiling with a new GCC and linking to an old libstdc++ is never supported anyway.
Comment 30 Gustaw Smolarczyk 2023-02-24 14:10:10 UTC
I have encountered a similar bug, reproducer:


#include <string>

std::string foo()
{
    return "_" + std::to_string(0);
}
Comment 31 Richard Biener 2023-05-08 12:24:20 UTC
GCC 12.3 is being released, retargeting bugs to GCC 12.4.