Bug 100366 - [11/12/13 Regression] spurious warning - std::vector::clear followed by std::vector::insert(vec.end(), ...) with -O2
Summary: [11/12/13 Regression] spurious warning - std::vector::clear followed by std::...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 11.1.0
: P2 normal
Target Milestone: 11.4
Assignee: Not yet assigned to anyone
URL:
Keywords: diagnostic, missed-optimization
Depends on:
Blocks: Wstringop-overflow
  Show dependency treegraph
 
Reported: 2021-05-01 02:36 UTC by Sam Varshavchik
Modified: 2022-11-26 15:53 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work: 10.3.0
Known to fail: 11.2.0, 12.0
Last reconfirmed: 2021-12-02 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sam Varshavchik 2021-05-01 02:36:53 UTC
I'm seeing this with both gcc 11.1 on: https://godbolt.org/z/1Wv9jj3r1

and gcc (GCC) 11.0.1 20210324 (Red Hat 11.0.1-0), below:

Compiling the following with -O2 -Wall -Werror:

#include <vector>
#include <memory>

static char UTC[4];

void func(std::vector<char> &vec)
{
	vec.clear();
	vec.insert(vec.end(), UTC, UTC+4);
}

This generates a massive complaint that boils down to:

/usr/include/c++/11/bits/stl_algobase.h:431:30: error: ‘void* __builtin_memcpy(v
oid*, const void*, long unsigned int)’ writing 1 or more bytes into a region of 
size 0 overflows the destination [-Werror=stringop-overflow=]
  431 |             __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
      |             ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/c++/11/x86_64-redhat-linux/bits/c++allocator.

...skipping 1 line
                 from /usr/include/c++/11/bits/allocator.h:46,
                 from /usr/include/c++/11/vector:64,
                 from t.C:1:

Removing vec.clear() makes this go away.

Adding vec.reserve(4) after vec.clear() makes this go away.
Comment 1 Marc Glisse 2021-05-02 11:27:22 UTC
Assuming the warning happens during the strlen pass, we are still missing a lot of optimizations at that point

  if (_6 != _7)
    goto <bb 4>; [70.00%]
  else
    goto <bb 3>; [30.00%]

  <bb 3> [local count: 322122544]:
  _158 = _7 - _6;

once VRP2 (2 passes after strlen) replaces _158 with 0 and propagates it, maybe the code becomes nice enough to avoid confusing this fragile warning (I didn't check).

Before FRE3, we have

  _6 = vec_2(D)->D.33506._M_impl.D.32819._M_start;
  _7 = vec_2(D)->D.33506._M_impl.D.32819._M_finish;
  if (_6 != _7)
    goto <bb 3>; [70.00%]
  else
    goto <bb 4>; [30.00%]

  <bb 4> [local count: 1073741824]:
  _5 = MEM[(char * const &)vec_2(D) + 8];
  MEM[(struct __normal_iterator *)&D.33862] ={v} {CLOBBER};
  MEM[(struct __normal_iterator *)&D.33862]._M_current = _5;
  __position = D.33862;
  _12 = MEM[(const char * const &)vec_2(D)];
  _13 = MEM[(const char * const &)&__position];
  _14 = _13 - _12;

and after FRE3

  <bb 4> [local count: 1073741824]:
  _5 = MEM[(char * const &)vec_2(D) + 8];
  MEM[(struct __normal_iterator *)&D.33862] ={v} {CLOBBER};
  MEM[(struct __normal_iterator *)&D.33862]._M_current = _5;
  __position = D.33862;
  _14 = _5 - _6;

Only PRE manages to notice that _5 is the same as _7, which is already late. And it then takes until VRP2 to realize that _7 - _6 must be 0 in the else branch of _6 != _7.

* I am not sure why FRE manages to optimize _12 and not _5, that seems like the first thing to check (maybe the +8 means it is obviously "partial")
* I don't know if some other pass than VRP could learn that b-a is 0 if not a!=b.
Comment 2 Martin Sebor 2021-05-03 17:51:19 UTC
The warning is issued during expansion of the memcpy call. It seems fishy that although it mentions __builtin_memcpy it points to __builtin_memmove:

error: ‘void* __builtin_memcpy(void*, const void*, long unsigned int)’ writing 1 or more bytes into a region of size 0 overflows the destination [-Werror=stringop-overflow=]
  431 |             __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
      |             ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The IL looks like the warning is justified:

void func (struct vector & vec)
{
  const ptrdiff_t _Num;
  char * _6;
  char * _7;
  char * prephitmp_16;
  char * _21;
  long int _23;
  long unsigned int _24;
  sizetype prephitmp_31;
  char * _36;
  char * _38;
  char * prephitmp_42;
  char * _50;
  long unsigned int _Num.10_52;
  char * _54;
  char * _61;
  char * pretmp_67;
  long int _69;
  char * pretmp_70;
  long int _71;
  unsigned int pretmp_72;
  unsigned int _89;
  char * _90;
  char * _98;
  char * _101;
  long unsigned int _112;
  char * _116;
  long unsigned int _144;
  long int _146;
  char * _147;
  char * _155;
  sizetype _157;
  char * _158;

  <bb 2> [local count: 1073741824]:
  _6 = vec_2(D)->D.33449._M_impl.D.32762._M_start;
  _7 = vec_2(D)->D.33449._M_impl.D.32762._M_finish;
  if (_6 != _7)
    goto <bb 4>; [70.00%]
  else
    goto <bb 3>; [30.00%]

  <bb 3> [local count: 322122544]:
  _21 = vec_2(D)->D.33449._M_impl.D.32762._M_end_of_storage;
  _23 = _21 - _7;
  _24 = (long unsigned int) _23;
  if (_24 > 3)
    goto <bb 5>; [67.00%]
  else
    goto <bb 13>; [33.00%]

  <bb 4> [local count: 751619281]:
  vec_2(D)->D.33449._M_impl.D.32762._M_finish = _6;
  _147 = vec_2(D)->D.33449._M_impl.D.32762._M_end_of_storage;
  _146 = _147 - _6;
  _144 = (long unsigned int) _146;
  if (_144 > 3)
    goto <bb 5>; [67.00%]
  else
    goto <bb 13>; [33.00%]

  <bb 5> [local count: 237404317]:
  # prephitmp_16 = PHI <_7(3), _6(4)>
  _89 = MEM <unsigned int> [(char * {ref-all})&UTC];
  MEM <unsigned int> [(char * {ref-all})prephitmp_16] = _89;
  _36 = vec_2(D)->D.33449._M_impl.D.32762._M_finish;
  _38 = _36 + 4;
  vec_2(D)->D.33449._M_impl.D.32762._M_finish = _38;
  goto <bb 12>; [100.00%]

  <bb 6> [local count: 108584968]:
  __builtin_memmove (_90, pretmp_67, _157);
  MEM <unsigned int> [(char * {ref-all})_155] = pretmp_72;
  _116 = vec_2(D)->D.33449._M_impl.D.32762._M_finish;
  _Num_117 = _116 - prephitmp_42;
  if (_Num_117 != 0)
    goto <bb 8>; [33.00%]                             >>> _Num_117 != 0
  else
    goto <bb 10>; [67.00%]

  <bb 7> [local count: 220460391]:
  MEM <unsigned int> [(char * {ref-all})_155] = pretmp_72;
  _50 = vec_2(D)->D.33449._M_impl.D.32762._M_finish;
  _Num_51 = _50 - prephitmp_42;
  if (_Num_51 != 0)
    goto <bb 8>; [33.00%]                             >>> _Num_51 != 0
  else
    goto <bb 9>; [67.00%]

  <bb 8> [local count: 108584968]:
  # _Num_119 = PHI <_Num_51(7), _Num_117(6)>          <<< _Num_119 != 0
  _Num.10_52 = (long unsigned int) _Num_119;
  __builtin_memcpy (_61, prephitmp_42, _Num.10_52);   <<< -Wstringop-overflow

  <bb 9> [local count: 256293430]:
  # prephitmp_31 = PHI <0(7), _Num.10_52(8)>
  _54 = _61 + prephitmp_31;
  if (pretmp_67 != 0B)
    goto <bb 10>; [40.26%]
  else
    goto <bb 11>; [59.74%]

  <bb 10> [local count: 175940553]:
  # _98 = PHI <_54(9), _61(6)>
  _71 = pretmp_70 - pretmp_67;
  _112 = (long unsigned int) _71;
  operator delete (pretmp_67, _112);

  <bb 11> [local count: 329045359]:
  # _101 = PHI <_54(9), _98(10)>
  vec_2(D)->D.33449._M_impl.D.32762._M_start = _90;
  vec_2(D)->D.33449._M_impl.D.32762._M_finish = _101;
  vec_2(D)->D.33449._M_impl.D.32762._M_end_of_storage = _158;

  <bb 12> [local count: 1048452384]:
  return;

  <bb 13> [local count: 230225493]:
  # prephitmp_42 = PHI <_6(4), _7(3)>
  _90 = operator new (4);                             <<< _90 is 4 bytes
  pretmp_67 = vec_2(D)->D.33449._M_impl.D.32762._M_start;
  _69 = prephitmp_42 - pretmp_67;
  _157 = (sizetype) _69;
  _158 = _90 + 4;
  pretmp_70 = vec_2(D)->D.33449._M_impl.D.32762._M_end_of_storage;
  _155 = _90 + _157;                                  <<< _155 is in [_90, _90 + 4]
  _61 = _155 + 4;                                     <<< _61 = _90 + 4
  pretmp_72 = MEM <unsigned int> [(char * {ref-all})&UTC];
  if (_69 != 0)
    goto <bb 6>; [58.89%]
  else
    goto <bb 7>; [41.11%]

}


In the translation unit the warning is triggered by a call to the __copy_m() function below:

  template<bool _IsMove>
    struct __copy_move<_IsMove, true, random_access_iterator_tag>
    {
      template<typename _Tp>

 static _Tp*
 __copy_m(const _Tp* __first, const _Tp* __last, _Tp* __result)
 {

   using __assignable = conditional<_IsMove,
        is_move_assignable<_Tp>,
        is_copy_assignable<_Tp>>;

   static_assert( __assignable::type::value, "type is not assignable" );

   const ptrdiff_t _Num = __last - __first;
   if (_Num)
     __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
   return __result + _Num;
 }
    };


The test for _Num is what GCC uses to determine that the number of bytes to copy is nonzero.
Comment 3 Sam Varshavchik 2021-05-03 21:02:44 UTC
If the warning is justified then something else isn't adding up.

I double-checked (with cppreference.com) something that I was pretty sure of: and an insert() at the end() iterator is valid. The insert()ed range is valid. If the C++ code is UB, or even potentially UB, I don't see it.
Comment 4 Richard Biener 2021-05-04 07:39:35 UTC
The issue is that FRE sees

  <bb 2> [local count: 1073741824]:
  _6 = vec_2(D)->D.33436._M_impl.D.32749._M_start;
  _7 = vec_2(D)->D.33436._M_impl.D.32749._M_finish;
  if (_6 != _7)
    goto <bb 3>; [70.00%]
  else
    goto <bb 4>; [30.00%]

  <bb 3> [local count: 751619281]:
  vec_2(D)->D.33436._M_impl.D.32749._M_finish = _6;

  <bb 4> [local count: 1073741824]:
  _5 = MEM[(char * const &)vec_2(D) + 8];

so clearly _5 is _not_ equal to _7 but on the path 2 -> 3 -> 4 it is equal to _6.
Only PRE transforms the load in _5 to a select of both (a PHI node):

  <bb 2> [local count: 1073741824]:
  _6 = vec_2(D)->D.33436._M_impl.D.32749._M_start;
  _7 = vec_2(D)->D.33436._M_impl.D.32749._M_finish;
  if (_6 != _7)
    goto <bb 4>; [70.00%]
  else
    goto <bb 3>; [30.00%]

  <bb 3> [local count: 322122544]:
  _86 = _7 - _6;
  _125 = (sizetype) _86;
  goto <bb 5>; [100.00%]

  <bb 4> [local count: 751619281]:
  vec_2(D)->D.33436._M_impl.D.32749._M_finish = _6;

  <bb 5> [local count: 1073741824]:
  # prephitmp_141 = PHI <_7(3), _6(4)>
  # prephitmp_83 = PHI <_86(3), 0(4)>
  # prephitmp_65 = PHI <_125(3), 0(4)>
  _21 = vec_2(D)->D.33436._M_impl.D.32749._M_end_of_storage;

PRE also removes some partial redundancies here (the other two PHIs).
Comment 5 Marc Glisse 2021-05-05 10:55:38 UTC
(In reply to Martin Sebor from comment #2)
> The IL looks like the warning is justified:

The memcpy call is dead code, we just fail to notice it.

>   <bb 13> [local count: 230225493]:
>   # prephitmp_42 = PHI <_6(4), _7(3)>

This is always _6, because in bb 3 we have _6 == _7.

>   pretmp_67 = vec_2(D)->D.33449._M_impl.D.32762._M_start;
>   _69 = prephitmp_42 - pretmp_67;

Always 0.

>   <bb 7> [local count: 220460391]:
>   MEM <unsigned int> [(char * {ref-all})_155] = pretmp_72;
>   _50 = vec_2(D)->D.33449._M_impl.D.32762._M_finish;
>   _Num_51 = _50 - prephitmp_42;

Always 0, in bb 4 we copy _M_start in _M_finish if they are not already equal.

(sorry for the wrong FRE comment earlier)

Note that if I replace operator new/delete with malloc/free

inline void* operator new(std::size_t n){return __builtin_malloc(n);}
inline void operator delete(void*p)noexcept{__builtin_free(p);}
inline void operator delete(void*p,std::size_t)noexcept{__builtin_free(p);}

we optimize quite a bit more and the warning disappears.
Comment 6 Marc Glisse 2021-05-05 11:01:58 UTC
So, apart from the small missed PHI optimization, this is probably the common issue that since operator new is replacable, we can't really assume that it does not clobber anything, and that hurts optimizations :-(
Not sure if there would be any convenient workaround for this specific case.
Comment 7 Marc Glisse 2021-05-05 11:15:48 UTC
It seems to help if we save the values before the allocation in vector.tcc, although I cannot promise it won't pessimize something else... And that's just a workaround, not a solution.

@@ -766,13 +766,16 @@
 	      {
 		const size_type __len =
 		  _M_check_len(__n, "vector::_M_range_insert");
+		pointer __old_start(this->_M_impl._M_start);
+		pointer __old_finish(this->_M_impl._M_finish);
+		pointer __old_end_of_storage(this->_M_impl._M_end_of_storage);
 		pointer __new_start(this->_M_allocate(__len));
 		pointer __new_finish(__new_start);
 		__try
 		  {
 		    __new_finish
 		      = std::__uninitialized_move_if_noexcept_a
-		      (this->_M_impl._M_start, __position.base(),
+		      (__old_start, __position.base(),
 		       __new_start, _M_get_Tp_allocator());
 		    __new_finish
 		      = std::__uninitialized_copy_a(__first, __last,
@@ -780,7 +783,7 @@
 						    _M_get_Tp_allocator());
 		    __new_finish
 		      = std::__uninitialized_move_if_noexcept_a
-		      (__position.base(), this->_M_impl._M_finish,
+		      (__position.base(), __old_finish,
 		       __new_finish, _M_get_Tp_allocator());
 		  }
 		__catch(...)
@@ -790,12 +793,12 @@
 		    _M_deallocate(__new_start, __len);
 		    __throw_exception_again;
 		  }
-		std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
+		std::_Destroy(__old_start, __old_finish,
 			      _M_get_Tp_allocator());
 		_GLIBCXX_ASAN_ANNOTATE_REINIT;
-		_M_deallocate(this->_M_impl._M_start,
-			      this->_M_impl._M_end_of_storage
-			      - this->_M_impl._M_start);
+		_M_deallocate(__old_start,
+			      __old_end_of_storage
+			      - __old_start);
 		this->_M_impl._M_start = __new_start;
 		this->_M_impl._M_finish = __new_finish;
 		this->_M_impl._M_end_of_storage = __new_start + __len;
Comment 8 Sam Varshavchik 2021-05-05 12:18:01 UTC
If the warning is spurious, then changing vector.tcc won't, of course, keep it from popping up elsewhere when the same sequence of events get triggered.

Here, it drew my attention to the missed micro-optimization of using reserve(), which had the benefit of avoiding the warning and allowing me to keep the -Werror option.
Comment 9 Martin Sebor 2021-05-12 23:44:27 UTC
Just to clarify: when I said the warning is justified I meant that there is nothing in the IL to keep the warning from triggering.  I.e., it works as designed.  The problem may be solved by optimizing the code better, which might be possible by enhancing the optimizers or by changing or annotating the libstdc++ code to enable existing optimizations.  In the latter case it will of course only help libstdc++ and not other code like it.  A third possibility is to make the warning itself smarter than the optimizer and figure out the code is unreachable without its help.
Comment 10 Martin Sebor 2021-12-02 21:14:44 UTC
Unchanged in GCC 12.
Comment 11 Roy Stogner 2021-12-06 16:45:16 UTC
This can even be triggered via std::vector::operator=, at least in gcc 11.2.

My own code is slightly more involved than this, but I can boil it down to:

  std::vector<int> src(2,1), dest(2);
  dest = src;

and that's enough to trigger this warning in a -O2 -Wall -Werror build.

Oddly, if I write dest(2,0) rather than dest(2), I don't see any warning.  That's a fine workaround for my own non-performance-critical code path, but I wonder if it also indicates that the underlying optimization is no longer being enabled here for code that would benefit.
Comment 12 Richard Biener 2022-04-21 07:49:23 UTC
GCC 11.3 is being released, retargeting bugs to GCC 11.4.
Comment 13 康桓瑋 2022-07-05 10:07:26 UTC
*** Bug 106199 has been marked as a duplicate of this bug. ***
Comment 14 Kyle Schwarz 2022-11-26 15:53:12 UTC
I also ran into this with:
> auto data = std::make_shared<std::vector<uint8_t>>();
> data->insert(data->end(), {0xAA, 0xBB, 0xCC});

I do not hit it with:
> std::vector<uint8_t> data;
> data.insert(data.end(), {0xAA, 0xBB, 0xCC});

Bisecting shows it was introduced with 81d6cdd335ffc60c216a020d5c99306f659377a2. In gcc/gimple-ssa-warn-access.cc pass_waccess::check() there's now a call to the new pass_waccess::check_builtin() which throws this warning (BUILT_IN_MEMMOVE, check_memop_access()). The old behavior had these checks scattered throughout builtins.c but now with it being part of pass_waccess::check() it is called for each bb as part of pass_waccess::execute().