hi - With a recent checkout of gcc12 (20211201) on a x86_64-pc-linux-gnu host, compiling the following source with -O --std=c++20 gives a bogus -Wstringop-overflow warning: -- x.cc --------------------------------------------------- #include <string> std::string foo() { return std::string("1234567890123456") + std::string(""); } ----------------------------------------------------------- $ g++ -c -O --std=c++20 x.cc In file included from /usr/local/gcc/include/c++/12.0.0/string:40, from x.cc:1: In static member function ‘static constexpr std::char_traits<char>::char_type* std::char_traits<char>::copy(std::char_traits<char>::char_type*, const std::char_traits<char>::char_type*, std::size_t)’, inlined from ‘static constexpr void std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_S_copy(_CharT*, const _CharT*, std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/local/gcc/include/c++/12.0.0/bits/basic_string.h:423:21, inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Allocator>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_M_append(const _CharT*, std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/local/gcc/include/c++/12.0.0/bits/basic_string.tcc:417:19, inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::append(const std::__cxx11::basic_string<_CharT, _Traits, _Alloc>&) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/local/gcc/include/c++/12.0.0/bits/basic_string.h:1385:25, inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Allocator> std::operator+(std::__cxx11::basic_string<_CharT, _Traits, _Allocator>&&, std::__cxx11::basic_string<_CharT, _Traits, _Allocator>&&) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/local/gcc/include/c++/12.0.0/bits/basic_string.h:3530:23, inlined from ‘std::string foo()’ at x.cc:5:59: /usr/local/gcc/include/c++/12.0.0/bits/char_traits.h:426:56: warning: ‘void* __builtin_memcpy(void*, const void*, long unsigned int)’ specified bound between 18446744073709551600 and 18446744073709551615 exceeds maximum object size 9223372036854775807 [-Wstringop-overflow=] 426 | return static_cast<char_type*>(__builtin_memcpy(__s1, __s2, __n)); | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~ $ Perhaps interestingly, the warning goes away if the first string literal is less than 16 characters long.
The warning is for the memcpy() call in BB 12 with _73 being in the reported excessive range: <bb 12> [local count: 172868773]: _209 = D.41961._M_dataplus._M_p; _103 = _69 + 16; __builtin_memcpy (_103, _209, _73); <<< 27->12 (T) _73 : long unsigned int [18446744073709551600, +INF] The range comes from BB 27: =========== BB 27 ============ Imports: _73 Exports: _73 _73 long unsigned int [0, 0][18446744073709551600, +INF] __size_74 const long unsigned int [0, 16] <bb 27> [local count: 523844769]: if (_73 != 0) goto <bb 12>; [33.00%] else goto <bb 13>; [67.00%] 27->12 (T) _73 : long unsigned int [18446744073709551600, +INF] 27->12 (T) __size_74 : const long unsigned int [0, 15] 27->13 (F) _73 : long unsigned int [0, 0] 27->13 (F) __size_74 : const long unsigned int [16, 16] The magic 16 most likely has to do with the small internal std::string buffer that's 16 bytes big.
Note there is one missing optimization: D.42086._M_string_length = 0; MEM[(char_type &)&D.42086 + 16] = 0; MEM[(struct basic_string *)&D.41925] ={v} {CLOBBER}; MEM[(struct _Alloc_hider *)&D.41925] ={v} {CLOBBER}; MEM[(struct _Alloc_hider *)&D.41925]._M_p = &D.41925.D.33308._M_local_buf; _69 = operator new (17); <bb 3> [local count: 1003797485]: D.41925._M_dataplus._M_p = _69; D.41925.D.33308._M_allocated_capacity = 16; __builtin_memcpy (_69, "1234567890123456", 16); D.41925._M_string_length = 16; MEM[(char_type &)_69 + 16] = 0; _75 = D.42086._M_string_length; __size_76 = _75 + 16; pretmp_56 = D.42086._M_dataplus._M_p; if (__size_76 > 16) goto <bb 4>; [67.00%] else goto <bb 12>; [33.00%] This might be related to that GCC does not handle the operator new better. Obvious if that missed optimization is fixed, then the warning will be gone. I also noticed that with -std=c++17 (or before), the constructor for std::string is not inlined fully for both std::string objects which avoids the false positive. This is another case where it is a "maybe" exceeds maximum object size and we need to come up with a solution for it.
(In reply to Andrew Pinski from comment #2) > I also noticed that with -std=c++17 (or before), the constructor for > std::string is not inlined fully for both std::string objects which avoids > the false positive. That's because of the explicit instantiation declaration (i.e. extern template) for the std::string instantiation, which is only declared for C++17 and earlier.
Yes, the warning does disappear when malloc() and free() are used instead of operator new and delete. foo() also ends up much better optimized, even at -O1: __attribute__((abi_tag ("cxx11"))) struct string foo () { struct string & _7(D); char * _69; <bb 2> [local count: 1073741824]: _69 = __builtin_malloc (17); __builtin_memcpy (_69, "1234567890123456", 16); MEM[(char_type &)_69 + 16] = 0; MEM[(struct basic_string *)_7(D)]._M_dataplus._M_p = _69; MEM[(struct basic_string *)_7(D)].D.33183._M_allocated_capacity = 16; MEM[(struct basic_string *)_7(D)]._M_string_length = 16; return _7(D); } The -O1 dump in comment #1 doesn't look right, I may have messed something up. The same issue happens at -O2 where the dump is as follows: =========== BB 3 ============ Imports: n_5(D) Exports: _1 n_5(D) _1 : n_5(D)(I) n_5(D) int [-INF, -1][1, +INF] <bb 3> [local count: 536870913]: _1 = (sizetype) n_5(D); if (_1 == 1) goto <bb 4>; [51.12%] else goto <bb 5>; [48.88%] _1 : sizetype [1, 2147483647][18446744071562067968, +INF] 3->4 (T) _1 : sizetype [1, 1] 3->4 (T) n_5(D) : int [1, 1] 3->5 (F) _1 : sizetype [2, 2147483647][18446744071562067968, +INF] 3->5 (F) n_5(D) : int [-INF, -1][2, +INF] =========== BB 4 ============ <bb 4> [local count: 274448412]: MEM[(char *)&b] = 0; goto <bb 6>; [100.00%] =========== BB 5 ============ <bb 5> [local count: 262422500]: __builtin_memcpy (&b, &a, _1);
The dataflow analysis seems to be: We set the length of one string to 0, and the other string to 16. Then we store a char to the string buffer, which the compiler thinks could possibly have clobbered the length we previously set to 0, so we reload it. And we add the two together. So now we have a combined length about which we think we know nothing We should really somehow tell the compiler that stores to the string char buffer can't alias other non-char objects. And maybe in general we could do branch prediction based on assuming that char stores don't clobber values we knew before? But let's put that missed-optimization issue in a separate PR. So, let's focus away from that problem by making the second string unknown: #include <string> std::string foo(std::string x) { return std::string("1234567890123456") + x; } I get the same surprising warning with this testcase. Now, we have an unknown total length. We compare this length to the size of the local buffer, which partitions the range at 16. On the path where the sum of the lengths is <=16, we conclude that the length of string A must either be 0 or a number so large that adding 16 to it causes it to wrap around to [0,16] (because integer overflow in unsigned arithmetic is defined). Which branch prediction thinks is just as likely as 0. So then along that branch we try to append this impossibly large hypothetical string to this string we do know the length of, and we get this warning. So, the warning seems to be that if we were to call _M_append with a ridiculously large __n argument, we would get undefined behavior. In other words, if x happened to be the longest possible string. It seems that we check for unreasonable length arguments in the char* append functions, but not in the string append function. Changing them to do that check silences the warning. I'll attach a patch in a moment.
Created attachment 51970 [details] libstdc++ fix
So it seems that this warning did find a real issue with the library, but one that was hard to connect to the actual wording of the message (and didn't affect the original testcase). In particular, "specified bound" doesn't make sense when we're considering values deduced from VRP. Is there any way to print the VRP "backtrace" as part of a warning?
We discussed before (e.g., in PR 93971) the idea of annotating std::string with some attribute telling the optimizer the internal pointer doesn't alias with anything except for the this->_M_local_buf or the result of operator new(). Richi seemed open to it but was concerned about correctness. I'm experimenting with two things to improve the context of all these warnings. One is to print the folded condition under which the warning triggers. This would include the conditions involved in all the ranges used to make decisions. I posted the result for this warning here (I'm not sure it's complete, it's just a POC): https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586524.html Another is to print the path through the code (the CFG actually) leading up to the warning site, like the static analyzer does. This will show each GIMPLE_COND after optimization, so it won't correspond exactly to the original source. For libstdc++ code it will show libstdc++ internal conditions that may not be of much use to users. I'll have to see how bad it is to decide if it's a viable approach.
Also discussed in PR98465.
The master branch has been updated by Jason Merrill <jason@gcc.gnu.org>: https://gcc.gnu.org/g:2e8067041d1d69da02bd7578f58abc11eb35a04b commit r12-5906-g2e8067041d1d69da02bd7578f58abc11eb35a04b Author: Jason Merrill <jason@redhat.com> Date: Fri Dec 10 11:21:50 2021 -0500 libstdc++: check length in string append [PR103534] In the testcase for 103534 we get a warning about append leading to memcpy of a very large number of bytes overflowing the buffer. This turns out to be because we weren't calling _M_check_length for string append. Rather than do that directly, let's go through the public pointer append that calls it. PR c++/103534 libstdc++-v3/ChangeLog: * include/bits/basic_string.h (append (basic_string)): Call pointer append instead of _M_append directly. gcc/testsuite/ChangeLog: * g++.dg/warn/Wstringop-overflow-8.C: New test.
Fixed.
*** Bug 103332 has been marked as a duplicate of this bug. ***