Bug 103534 - [12 regression] Spurious -Wstringop-overflow warning with std::string concatencation
Summary: [12 regression] Spurious -Wstringop-overflow warning with std::string concate...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 12.0
: P1 normal
Target Milestone: 12.0
Assignee: Not yet assigned to anyone
URL:
Keywords: alias, diagnostic, missed-optimization
: 103332 (view as bug list)
Depends on:
Blocks: Wstringop-overflow
  Show dependency treegraph
 
Reported: 2021-12-02 19:47 UTC by scott snyder
Modified: 2023-11-08 13:55 UTC (History)
10 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-12-02 00:00:00


Attachments
libstdc++ fix (805 bytes, patch)
2021-12-10 16:30 UTC, Jason Merrill
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description scott snyder 2021-12-02 19:47:55 UTC
hi -

With a recent checkout of gcc12 (20211201) on a x86_64-pc-linux-gnu host,
compiling the following source with -O --std=c++20 gives a bogus
-Wstringop-overflow warning:

-- x.cc ---------------------------------------------------
#include <string>

std::string foo()
{
  return std::string("1234567890123456")  + std::string("");
}
-----------------------------------------------------------

$ g++ -c -O --std=c++20 x.cc
In file included from /usr/local/gcc/include/c++/12.0.0/string:40,
                 from x.cc:1:
In static member function ‘static constexpr std::char_traits<char>::char_type* std::char_traits<char>::copy(std::char_traits<char>::char_type*, const std::char_traits<char>::char_type*, std::size_t)’,
    inlined from ‘static constexpr void std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_S_copy(_CharT*, const _CharT*, std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/local/gcc/include/c++/12.0.0/bits/basic_string.h:423:21,
    inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Allocator>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_M_append(const _CharT*, std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/local/gcc/include/c++/12.0.0/bits/basic_string.tcc:417:19,
    inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::append(const std::__cxx11::basic_string<_CharT, _Traits, _Alloc>&) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/local/gcc/include/c++/12.0.0/bits/basic_string.h:1385:25,
    inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Allocator> std::operator+(std::__cxx11::basic_string<_CharT, _Traits, _Allocator>&&, std::__cxx11::basic_string<_CharT, _Traits, _Allocator>&&) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/local/gcc/include/c++/12.0.0/bits/basic_string.h:3530:23,
    inlined from ‘std::string foo()’ at x.cc:5:59:
/usr/local/gcc/include/c++/12.0.0/bits/char_traits.h:426:56: warning: ‘void* __builtin_memcpy(void*, const void*, long unsigned int)’ specified bound between 18446744073709551600 and 18446744073709551615 exceeds maximum object size 9223372036854775807 [-Wstringop-overflow=]
  426 |         return static_cast<char_type*>(__builtin_memcpy(__s1, __s2, __n));
      |                                        ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
$ 



Perhaps interestingly, the warning goes away if the first string literal
is less than 16 characters long.
Comment 1 Martin Sebor 2021-12-02 21:03:21 UTC
The warning is for the memcpy() call in BB 12 with _73 being in the reported excessive range:

<bb 12> [local count: 172868773]:
_209 = D.41961._M_dataplus._M_p;
_103 = _69 + 16;
__builtin_memcpy (_103, _209, _73);   <<< 27->12  (T) _73 : 	long unsigned int [18446744073709551600, +INF]

The range comes from BB 27:

=========== BB 27 ============
Imports: _73  
Exports: _73  
_73	long unsigned int [0, 0][18446744073709551600, +INF]
__size_74	const long unsigned int [0, 16]
    <bb 27> [local count: 523844769]:
    if (_73 != 0)
      goto <bb 12>; [33.00%]
    else
      goto <bb 13>; [67.00%]

27->12  (T) _73 : 	long unsigned int [18446744073709551600, +INF]
27->12  (T) __size_74 : 	const long unsigned int [0, 15]
27->13  (F) _73 : 	long unsigned int [0, 0]
27->13  (F) __size_74 : 	const long unsigned int [16, 16]


The magic 16 most likely has to do with the small internal std::string buffer that's 16 bytes big.
Comment 2 Andrew Pinski 2021-12-02 21:53:01 UTC
Note there is one missing optimization:
  D.42086._M_string_length = 0;
  MEM[(char_type &)&D.42086 + 16] = 0;
  MEM[(struct basic_string *)&D.41925] ={v} {CLOBBER};
  MEM[(struct _Alloc_hider *)&D.41925] ={v} {CLOBBER};
  MEM[(struct _Alloc_hider *)&D.41925]._M_p = &D.41925.D.33308._M_local_buf;
  _69 = operator new (17);

  <bb 3> [local count: 1003797485]:
  D.41925._M_dataplus._M_p = _69;
  D.41925.D.33308._M_allocated_capacity = 16;
  __builtin_memcpy (_69, "1234567890123456", 16);
  D.41925._M_string_length = 16;
  MEM[(char_type &)_69 + 16] = 0;
  _75 = D.42086._M_string_length;
  __size_76 = _75 + 16;
  pretmp_56 = D.42086._M_dataplus._M_p;
  if (__size_76 > 16)
    goto <bb 4>; [67.00%]
  else
    goto <bb 12>; [33.00%]


This might be related to that GCC does not handle the operator new better.
Obvious if that missed optimization is fixed, then the warning will be gone.

I also noticed that with -std=c++17 (or before), the constructor for std::string is not inlined fully for both std::string objects which avoids the false positive.

This is another case where it is a "maybe" exceeds maximum object size and we need to come up with a solution for it.
Comment 3 Jonathan Wakely 2021-12-02 23:02:01 UTC
(In reply to Andrew Pinski from comment #2)
> I also noticed that with -std=c++17 (or before), the constructor for
> std::string is not inlined fully for both std::string objects which avoids
> the false positive.

That's because of the explicit instantiation declaration (i.e. extern template) for the std::string instantiation, which is only declared for C++17 and earlier.
Comment 4 Martin Sebor 2021-12-02 23:19:46 UTC
Yes, the warning does disappear when malloc() and free() are used instead of operator new and delete.  foo() also ends up much better optimized, even at -O1:

__attribute__((abi_tag ("cxx11")))
struct string foo ()
{
  struct string & _7(D);
  char * _69;

  <bb 2> [local count: 1073741824]:
  _69 = __builtin_malloc (17);
  __builtin_memcpy (_69, "1234567890123456", 16);
  MEM[(char_type &)_69 + 16] = 0;
  MEM[(struct basic_string *)_7(D)]._M_dataplus._M_p = _69;
  MEM[(struct basic_string *)_7(D)].D.33183._M_allocated_capacity = 16;
  MEM[(struct basic_string *)_7(D)]._M_string_length = 16;
  return _7(D);

}

The -O1 dump in comment #1 doesn't look right, I may have messed something up.  The same issue happens at -O2 where the dump is as follows:

=========== BB 3 ============
Imports: n_5(D)  
Exports: _1  n_5(D)  
         _1 : n_5(D)(I)  
n_5(D)	int [-INF, -1][1, +INF]
    <bb 3> [local count: 536870913]:
    _1 = (sizetype) n_5(D);
    if (_1 == 1)
      goto <bb 4>; [51.12%]
    else
      goto <bb 5>; [48.88%]

_1 : sizetype [1, 2147483647][18446744071562067968, +INF]
3->4  (T) _1 : 	sizetype [1, 1]
3->4  (T) n_5(D) : 	int [1, 1]
3->5  (F) _1 : 	sizetype [2, 2147483647][18446744071562067968, +INF]
3->5  (F) n_5(D) : 	int [-INF, -1][2, +INF]

=========== BB 4 ============
    <bb 4> [local count: 274448412]:
    MEM[(char *)&b] = 0;
    goto <bb 6>; [100.00%]


=========== BB 5 ============
    <bb 5> [local count: 262422500]:
    __builtin_memcpy (&b, &a, _1);
Comment 5 Jason Merrill 2021-12-10 16:29:57 UTC
The dataflow analysis seems to be:

We set the length of one string to 0, and the other string to 16.  Then we store a char to the string buffer, which the compiler thinks could possibly have clobbered the  length we previously set to 0, so we reload it.  And we add the two together.  So now we have a combined length about which we think we know nothing

We should really somehow tell the compiler that stores to the string char buffer can't alias other non-char objects.  And maybe in general we could do branch prediction based on assuming that char stores don't clobber values we knew before?  But let's put that missed-optimization issue in a separate PR.

So, let's focus away from that problem by making the second string unknown:

#include <string>
std::string foo(std::string x)
{
  return std::string("1234567890123456") + x;
}

I get the same surprising warning with this testcase.

Now, we have an unknown total length.  We compare this length to the size of the local buffer, which partitions the range at 16.  On the path where the sum of the lengths is <=16, we conclude that the length of string A must either be 0 or a number so large that adding 16 to it causes it to wrap around to [0,16] (because integer overflow in unsigned arithmetic is defined).  Which branch prediction thinks is just as likely as 0.

So then along that branch we try to append this impossibly large hypothetical string to this string we do know the length of, and we get this warning.

So, the warning seems to be that if we were to call _M_append with a ridiculously large __n argument, we would get undefined behavior.  In other words, if x happened to be the longest possible string.  It seems that we check for unreasonable length arguments in the char* append functions, but not in the string append function.  Changing them to do that check silences the warning.  I'll attach a patch in a moment.
Comment 6 Jason Merrill 2021-12-10 16:30:37 UTC
Created attachment 51970 [details]
libstdc++ fix
Comment 7 Jason Merrill 2021-12-10 16:37:04 UTC
So it seems that this warning did find a real issue with the library, but one that was hard to connect to the actual wording of the message (and didn't affect the original testcase).  In particular, "specified bound" doesn't make sense when we're considering values deduced from VRP.  Is there any way to print the VRP "backtrace" as part of a warning?
Comment 8 Martin Sebor 2021-12-10 16:53:38 UTC
We discussed before (e.g., in PR 93971) the idea of annotating std::string with some attribute telling the optimizer the internal pointer doesn't alias with anything except for the this->_M_local_buf or the result of operator new().  Richi seemed open to it but was concerned about correctness.

I'm experimenting with two things to improve the context of all these warnings.  One is to print the folded condition under which the warning triggers.  This would include the conditions involved in all the ranges used to make decisions.  I posted the result for this warning here (I'm not sure it's complete, it's just a POC):

  https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586524.html

Another is to print the path through the code (the CFG actually) leading up to the warning site, like the static analyzer does.  This will show each GIMPLE_COND after optimization, so it won't correspond exactly to the original source.  For libstdc++ code it will show libstdc++ internal conditions that may not be of much use to users.  I'll have to see how bad it is to decide if it's a viable approach.
Comment 9 Jakub Jelinek 2021-12-10 17:04:03 UTC
Also discussed in PR98465.
Comment 10 GCC Commits 2021-12-11 04:58:25 UTC
The master branch has been updated by Jason Merrill <jason@gcc.gnu.org>:

https://gcc.gnu.org/g:2e8067041d1d69da02bd7578f58abc11eb35a04b

commit r12-5906-g2e8067041d1d69da02bd7578f58abc11eb35a04b
Author: Jason Merrill <jason@redhat.com>
Date:   Fri Dec 10 11:21:50 2021 -0500

    libstdc++: check length in string append [PR103534]
    
    In the testcase for 103534 we get a warning about append leading to memcpy
    of a very large number of bytes overflowing the buffer.  This turns out to
    be because we weren't calling _M_check_length for string append.  Rather
    than do that directly, let's go through the public pointer append that calls
    it.
    
            PR c++/103534
    
    libstdc++-v3/ChangeLog:
    
            * include/bits/basic_string.h (append (basic_string)): Call pointer
            append instead of _M_append directly.
    
    gcc/testsuite/ChangeLog:
    
            * g++.dg/warn/Wstringop-overflow-8.C: New test.
Comment 11 Jason Merrill 2021-12-13 16:11:25 UTC
Fixed.
Comment 12 Jason Merrill 2021-12-13 16:11:52 UTC
*** Bug 103332 has been marked as a duplicate of this bug. ***