Bug 105769 - [11/12/13/14/15 Regression] program segmentation fault with -ftree-vectorize and nested lambdas
Summary: [11/12/13/14/15 Regression] program segmentation fault with -ftree-vectorize ...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 11.2.1
: P2 normal
Target Milestone: 11.5
Assignee: Not yet assigned to anyone
URL:
Keywords: needs-bisection, wrong-code
Depends on:
Blocks:
 
Reported: 2022-05-29 21:24 UTC by Cezary Śliwa
Modified: 2024-04-26 10:46 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work: 10.3.0
Known to fail: 11.3.0, 12.1.0
Last reconfirmed: 2022-05-30 00:00:00


Attachments
sample program demonstrating undefined behavior (665 bytes, text/plain)
2022-05-29 21:24 UTC, Cezary Śliwa
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Cezary Śliwa 2022-05-29 21:24:46 UTC
Created attachment 53050 [details]
sample program demonstrating undefined behavior

The attached program triggers undefined behavior with g++ (the resulting binary segfaults in the manager code of a functor wrapper) when compiled as follows (minimal flags to trigger):

g++ -flto -O1 -ftree-vectorize tst.cc

This may be related to memory alignment of data (I see a crash with long double, but not with double). Other things to check is the capture of functor est in program line 57 (adding an ampersand eliminates the issue). Even changing the data in lines 81-82 (for example to 0) affects the outcome.

Platform: amd64, RHEL or Gentoo Linux.
Comment 1 Martin Liška 2022-05-30 10:14:59 UTC
Started with param change in r11-4438-g686c1b70c70a8df4.
Comment 2 Richard Biener 2022-06-02 08:10:46 UTC
It segfaults doing an indirect call

#0  0x0000000000000001 in ?? ()
#1  0x0000000000400c9d in std::_Function_base::~_Function_base (
    this=<optimized out>, this=<optimized out>)
    at /home/space/rguenther/install/gcc-11.3/include/c++/11.3.0/bits/std_function.h:244
#2  0x00000000004011f1 in std::function<unsigned long (unsigned long)>::~function() (this=<optimized out>, this=<optimized out>)
    at /home/space/rguenther/install/gcc-11.3/include/c++/11.3.0/bits/std_function.h:334
#3  print_cov_ratio<ab> () at /tmp/t.C:86
#4  main () at /tmp/t.C:122

with -fno-lifetime-dse it works fine.  I suspect that either GCC or the
source gets things wrong WRT object lifetime in the maze of lambdas.

It's interesting that with -fsanitize=undefined added we still vectorize
but exactly a single load/store:

t.C:65:3: optimized: basic block part vectorized using 16 byte vectors

and then it still crashes.

   0x0000000000401027 <+97>:    mov    %rbx,%rdi
   0x000000000040102a <+100>:   call   *%rbp
=> 0x000000000040102c <+102>:   add    $0x8,%rsp
(gdb) p $rbp
$1 = (void *) 0x1

More investigation is needed.
Comment 3 Richard Biener 2022-10-19 10:15:05 UTC
(In reply to Martin Liška from comment #1)
> Started with param change in r11-4438-g686c1b70c70a8df4.

Can you bisect with very large param?
Comment 4 Martin Liška 2022-10-19 11:01:32 UTC
Can't reproduce with huge param value with the revision before it was removed:

gcc-bisect.py 'g++ pr105769.ii -flto -O1 -ftree-vectorize --param slp-max-insns-in-bb=100000000 && ./a.out' -e 16ad9ae85bb5b9acf80f9d1cf2be5a989ef7ba49 -l -e 7-base -s 16ad9ae85bb5b9acf80f9d1cf2be5a989ef7ba49

Bisecting latest revisions
  16ad9ae85bb5b9ac(27 Oct 2020 09:56)(dmalcolm@redhat.com): [took: 0.56 s] result: OK
will do (ab)
1
1
  633c65dda889eb88(20 Apr 2017 09:44)(thomas.preudhomme@arm.com): [took: 0.55 s] result: OK
will do (ab)
1
1
  bisect finished: there is no change!
Comment 5 Jakub Jelinek 2023-01-16 20:30:00 UTC
At least when using g++ 12.1.1 (20220507), the crash is because the
stack slot holding return value from jacknife is clobbered on the
bias = est(map);
line.
I see in main (well, print_cov_ratio and std::function inlined into it):
   0x00000000004016d1 <main()+274>:	lea    0x50(%rsp),%rsi
   0x00000000004016d6 <main()+279>:	lea    0x30(%rsp),%rdi
=> 0x00000000004016db <main()+284>:	call   *0x48(%rsp)
   0x00000000004016df <main()+288>:	jmp    0x401742 <main()+387>
where 0x30(%rsp) seems to be the est argument to jacknife (32 byte est_t) and 0x50(%rsp) the return value from jacknife (32 byte est_t)
(gdb) p/x $rsp+0x50
$49 = 0x7fffffffdd20
where the indirect call calls:
#0  std::_Function_handler<void (std::function<unsigned long (unsigned long)>), jacknife<2ul, ab>(std::function<vec<2ul, ab> (std::function<unsigned long (unsigned long)>)>, vec<2ul, vec<2ul, ab> >&, vec<2ul, ab>&)::{lambda(std::function<unsigned long (unsigned long)>)#1}>::_M_invoke(std::_Any_data const&, std::function<unsigned long (unsigned long)>&&) (
    __functor=..., __args#0=...) at /usr/include/c++/12/bits/std_function.h:288
#1  0x00000000004016df in std::function<void (std::function<unsigned long (unsigned long)>)>::operator()(std::function<unsigned long (unsigned long)>) const (__args#0=..., 
    this=0x7fffffffdd00) at /usr/include/c++/12/bits/std_function.h:591
#2  print_cov_ratio<ab> () at /usr/src/gcc/obj/gcc/pr105769.C:85
#3  main () at /usr/src/gcc/obj/gcc/pr105769.C:121
But later in
#0  0x00000000004014f3 in jacknife<2ul, ab>(std::function<vec<2ul, ab> (std::function<unsigned long (unsigned long)>)>, vec<2ul, vec<2ul, ab> >&, vec<2ul, ab>&)::{lambda(std::function<unsigned long (unsigned long)>)#1}::operator()(std::function<unsigned long (unsigned long)>) const (map=..., __closure=0x4172c0) at /usr/src/gcc/obj/gcc/pr105769.C:59
#1  std::__invoke_impl<void, jacknife<2ul, ab>(std::function<vec<2ul, ab> (std::function<unsigned long (unsigned long)>)>, vec<2ul, vec<2ul, ab> >&, vec<2ul, ab>&)::{lambda(std::function<unsigned long (unsigned long)>)#1}&, std::function<unsigned long (unsigned long)> >(std::__invoke_other, jacknife<2ul, ab>(std::function<vec<2ul, ab> (std::function<unsigned long (unsigned long)>)>, vec<2ul, vec<2ul, ab> >&, vec<2ul, ab>&)::{lambda(std::function<unsigned long (unsigned long)>)#1}&, std::function<unsigned long (unsigned long)>&&) (__f=...)
    at /usr/include/c++/12/bits/invoke.h:61
#2  std::__invoke_r<void, jacknife<2ul, ab>(std::function<vec<2ul, ab> (std::function<unsigned long (unsigned long)>)>, vec<2ul, vec<2ul, ab> >&, vec<2ul, ab>&)::{lambda(std::function<unsigned long (unsigned long)>)#1}&, std::function<unsigned long (unsigned long)> >(jacknife<2ul, ab>(std::function<vec<2ul, ab> (std::function<unsigned long (unsigned long)>)>, vec<2ul, vec<2ul, ab> >&, vec<2ul, ab>&)::{lambda(std::function<unsigned long (unsigned long)>)#1}&, std::function<unsigned long (unsigned long)>&&) (__fn=...)
    at /usr/include/c++/12/bits/invoke.h:111
#3  std::_Function_handler<void (std::function<unsigned long (unsigned long)>), jacknife<2ul, ab>(std::function<vec<2ul, ab> (std::function<unsigned long (unsigned long)>)>, vec<2ul, vec<2ul, ab> >&, vec<2ul, ab>&)::{lambda(std::function<unsigned long (unsigned long)>)#1}>::_M_invoke(std::_Any_data const&, std::function<unsigned long (unsigned long)>&&) (
    __functor=..., __args#0=...) at /usr/include/c++/12/bits/std_function.h:290
#4  0x00000000004016df in std::function<void (std::function<unsigned long (unsigned long)>)>::operator()(std::function<unsigned long (unsigned long)>) const (__args#0=..., 
    this=0x7fffffffdd00) at /usr/include/c++/12/bits/std_function.h:591
#5  print_cov_ratio<ab> () at /usr/src/gcc/obj/gcc/pr105769.C:85
#6  main () at /usr/src/gcc/obj/gcc/pr105769.C:121
&bias is equal to the address of the jacknife return value:
$50 = (vec<2, ab> *) 0x7fffffffdd20

To make the dumps more readable, I've patched the testcase:
--- pr105769.C~	2023-01-16 19:05:01.000000000 +0100
+++ pr105769.C	2023-01-16 20:38:25.101524077 +0100
@@ -40,7 +40,7 @@ using sq_mat = mat<n, n, T>;
 using map_t = std::function<size_t(size_t)>;
 
 template<class T_v>
-using est_t = std::function<T_v(map_t map)>;
+using est_t = std::function<T_v(map_t map)>; template<class T_v> using est2_t = std::function<T_v(map_t map)>;
 
 map_t id_map() {
   return [](size_t j) -> size_t {
@@ -50,7 +50,7 @@ map_t id_map() {
 
 
 template<size_t n, class T>
-est_t<void> jacknife(const est_t<vec<n, T>> est,
+est2_t<void> jacknife(const est_t<vec<n, T>> est,
     sq_mat<n, T>& cov, vec<n, T>& bias) {
 
   return [est, &cov, &bias](
and with that in the *.gimple dump I see:
void print_cov_ratio<ab> ()
[pr105769.C:88:1] {
  struct est2_t D.85904;
  struct est_t D.85869;
  struct ._anon_95 D.85344;
  struct map_t D.85939;
  struct sq_mat cov_jn;
  struct vec bias;
  typedef struct ._anon_95 ._anon_95;

  try
    {
      [pr105769.C:73:16] vec<2, vec<2, ab> >::vec ([pr105769.C:73:16] &cov_jn);
      [pr105769.C:74:13] vec<2, ab>::vec ([pr105769.C:74:13] &bias);
      [pr105769.C:85:23] try
        {
          try
            {
              try
                {
                  [pr105769.C:85:23] std::function<vec<2, ab>(std::function<long unsigned int(long unsigned int)>)>::function<print_cov_ratio<ab>()::<lambda(map_t)> > ([pr105769.C:85
                  try
                    {
                      [pr105769.C:85:23] D.85904 = jacknife<2, ab> ([pr105769.C:85:23] &D.85869, [pr105769.C:85:10] &cov_jn, [pr105769.C:85:18] &bias); [return slot optimization]
                      try
                        {
                          try
                            {
                              [pr105769.C:85:23] D.85939 = id_map (); [return slot optimization]
                              try
                                {
                                  [pr105769.C:85:23] std::function<void(std::function<long unsigned int(long unsigned int)>)>::operator() ([pr105769.C:85:23] &D.85904, [pr105769.C:85
                                }
                              finally
                                {
                                  [pr105769.C:85:23] std::function<long unsigned int(long unsigned int)>::~function ([pr105769.C:85:23] &D.85939);
                                }
                            }
                          finally
                            {
                              [pr105769.C:85:23] D.85939 = {CLOBBER(eol)};
                            }
                        }
                      finally
                        {
                          [pr105769.C:85:23] std::function<void(std::function<long unsigned int(long unsigned int)>)>::~function ([pr105769.C:85:23] &D.85904);
                        }
                    }
                  finally
                    {
                      [pr105769.C:85:23] std::function<vec<2, ab>(std::function<long unsigned int(long unsigned int)>)>::~function ([pr105769.C:85:23] &D.85869);
                    }
                }
              finally
                {
                  [pr105769.C:77:7] D.85344 = {CLOBBER(eol)};
                }
            }
          finally
            {
              [pr105769.C:85:23] D.85869 = {CLOBBER(eol)};
            }
        }
      finally
        {
          [pr105769.C:85:23] D.85904 = {CLOBBER(eol)};
        }
    }
  finally
    {
      cov_jn = {CLOBBER(eol)};
      bias = {CLOBBER(eol)};
    }
}

D.85904 above is the return value (est2_t), D.85869 is the est argument, and bias variable is actually constructed even before this, all have 32 bytes in size.

So, to me this looks like incorrect stack slot reuse.
Comment 6 Jakub Jelinek 2023-01-16 21:10:05 UTC
The expand dump shows:
Partition 4: size 64 align 16
        cov_jn
Partition 0: size 48 align 16
        D.5642  bias    D.5613
Partition 1: size 32 align 16
        D.5615
Partition 2: size 32 align 16
        D.5614

where D.5615 is the est2_t return value slot, D.5614 is the est_t parameter slot,
D.5613 the map_t object, D.5642 the 48 byte "struct void", bias the 32 byte var,
so there clearly is some stack reuse, but not for the vars I actually see overlapping.
Anyway, -fstack-reuse=none doesn't seem to help (but the -da expand dump still prints the 3 vars there).
Comment 7 Jakub Jelinek 2023-01-16 21:20:43 UTC
Ah, the crash is actually when destructing the map_t temporary (D.5613) and because it shares the stack slot with bias, it isn't surprising.  So now to figure out why the stack sharing happens and why even -fstack-reuse=none doesn't help.
Comment 8 Jakub Jelinek 2023-01-17 11:52:54 UTC
Using last night's trunk, it is:
#include <iostream>
#include <iomanip>
#include <functional>

template<size_t n, class T>
struct vec {
  T dat[n];
  vec() {}
  explicit vec(const T& x) { for(size_t i = 0; i < n; i++) dat[i] = x; }
  T& operator [](size_t i) { return dat[i]; }
  const T& operator [](size_t i) const { return dat[i]; }
};

template<size_t m, size_t n, class T>
using mat = vec<m, vec<n, T>>;
template<size_t n, class T>
using sq_mat = mat<n, n, T>;
using map_t = std::function<size_t(size_t)>;
template<class T_v>
using est_t = std::function<T_v(map_t map)>; template<class T_v> using est2_t = std::function<T_v(map_t map)>;
map_t id_map() { return [](size_t j) -> size_t { return j; }; }

template<size_t n, class T>
est2_t<void> jacknife(const est_t<vec<n, T>> est, sq_mat<n, T>& cov, vec<n, T>& bias) {
  return [est, &cov, &bias](map_t map) -> void { bias = est(map); for(size_t i = 0; i < n; i++) std::cout << bias[i] << std::endl; };
}

template<class T>
void print_cov_ratio() {
  sq_mat<2, T> cov_jn;
  vec<2, T> bias;
  jacknife<2, T>([](map_t map) -> vec<2, T> { vec<2, T> retv; retv[0] = 1; retv[1] = 1; return retv; }, cov_jn, bias)(id_map());
}
struct ab {
  long long unsigned a;
  short unsigned b;
  double operator()() { return a; }
  ab& operator=(double rhs) { a = rhs; return *this; }
  friend std::ostream& operator<<(std::ostream&, const ab&);
};
std::ostream& operator<<(std::ostream& os, const ab& x) { os << x.a; return os; }

int main() {
  std::cout << "will do (ab)" << std::endl;
  print_cov_ratio<ab>();
  return 0;
}

Partition 4: size 64 align 16
        cov_jn
Partition 0: size 48 align 16
        D.5698  bias    D.5681
Partition 1: size 32 align 16
        D.5683
Partition 2: size 32 align 16
        D.5682
where
  struct struct void D.5698;
  struct est2_t D.5683;
  struct est_t D.5682;
  struct map_t D.5681;
  struct sq_mat cov_jn;
  struct vec bias;
aka D.5682 is the est argument slot - const est_t<vec<n, T>> est, D.5683 is the return slot from jacknife est2_t<void>, D.5698 is the lambda object which contains __est, __cov, __bias and cov_jn and bias the automatic variables in print_cov_ratio and
D.5681 return value slot of id_map.
main in ltrans optimized dump is mostly serial code with some EH edges, except for one
_M_manager == nullptr check (if nullptr it throws bad function call, otherwise continues).
Now, those vars are referenced in:
  _12 = (long unsigned int) &bias;
  _15 = (long unsigned int) &cov_jn;
  _21 = {_15, _12};
...
  MEM[(struct vec *)&cov_jn] ={v} {CLOBBER};
  bias ={v} {CLOBBER};
  MEM[(struct function *)&D.5682] ={v} {CLOBBER};
  MEM <char[16]> [(struct _Function_base *)&D.5682] = {};
  MEM <vector(2) long unsigned int> [(bool (*<T72d>) (union _Any_data & {ref-all}, const union _Any_data & {ref-all}, _Manager_operation) *)&D.5682 + 16B] = _8;
  __ct_comp  (&D.5698.__est, &D.5682);
...
  MEM <vector(2) long unsigned int> [(void *)&D.5698 + 32B] = _21;
  MEM[(struct function *)&D.5683] ={v} {CLOBBER};
  MEM[(struct function *)&D.5683].D.5217 = {};
  MEM[(struct function *)&D.5683]._M_invoker = 0B;
...
  _14 = &MEM[(struct struct void *)_13].__est;
  __ct_comp  (_14, &D.5698.__est);
...
  MEM[(struct struct void * &)&D.5683] = _13;
  MEM <vector(2) long unsigned int> [(bool (*<T72d>) (union _Any_data & {ref-all}, const union _Any_data & {ref-all}, _Manager_operation) *)&D.5683 + 16B] = _7;
  __dt_base  (&MEM[(struct function *)&D.5698].D.5235);
  D.5698 ={v} {CLOBBER};
  D.5698 ={v} {CLOBBER(eol)};
  MEM <char[16]> [(struct _Function_base *)&D.5681] = {};
  MEM <vector(2) long unsigned int> [(bool (*<T72d>) (union _Any_data & {ref-all}, const union _Any_data & {ref-all}, _Manager_operation) *)&D.5681 + 16B] = _84;
  _18 = MEM[(const struct _Function_base *)&D.5683]._M_manager;
...
  _19 = D.5683._M_invoker;
  _19 (&D.5683.D.5217._M_functor, &D.5681);
...
  __dt_base  (&D.5681.D.5223);
  D.5681 ={v} {CLOBBER};
  D.5681 ={v} {CLOBBER(eol)};
  __dt_base  (&D.5683.D.5217);
  D.5683 ={v} {CLOBBER};
  __dt_base  (&D.5682.D.5235);
  D.5682 ={v} {CLOBBER};
  D.5682 ={v} {CLOBBER(eol)};
  D.5683 ={v} {CLOBBER(eol)};
  cov_jn ={v} {CLOBBER(eol)};
  bias ={v} {CLOBBER(eol)};
  return 0;

and some EH stuff.
Comment 9 Jakub Jelinek 2023-01-17 12:11:20 UTC
And just statements that refer to those 3 variables that (incorrectly) share the stack slot + basic block boundaries.
grep 'bias\|D.5698\|D.5681\|:' /tmp/00
  struct struct void D.5698;
  struct map_t D.5681;
  struct vec bias;
  <bb 2> [local count: 1073741829]:
  _12 = (long unsigned int) &bias;
  bias ={v} {CLOBBER};
  __ct_comp  (&D.5698.__est, &D.5682);
  <bb 3> [local count: 1073741824]:
  MEM <vector(2) long unsigned int> [(void *)&D.5698 + 32B] = _21;
  <bb 4> [local count: 1073741824]:
  __ct_comp  (_14, &D.5698.__est);
  <bb 5> [local count: 1073741824]:
  vect__16.52_79 = MEM <vector(2) long unsigned int> [(void *)&D.5698 + 32B];
  __dt_base  (&MEM[(struct function *)&D.5698].D.5235);
  D.5698 ={v} {CLOBBER};
  D.5698 ={v} {CLOBBER(eol)};
  MEM <char[16]> [(struct _Function_base *)&D.5681] = {};
  MEM <vector(2) long unsigned int> [(bool (*<T72d>) (union _Any_data & {ref-all}, const union _Any_data & {ref-all}, _Manager_operation) *)&D.5681 + 16B] = _84;
  <bb 6> [count: 0]:
<L5>:
  <bb 7> [count: 0]:
<L4>:
  __dt_base  (&MEM[(struct function *)&D.5698].D.5235);
  D.5698 ={v} {CLOBBER};
  <bb 8> [local count: 429496]:
  <bb 9> [local count: 1073312328]:
  _19 (&D.5683.D.5217._M_functor, &D.5681);
  <bb 10> [local count: 1073312328]:
  __dt_base  (&D.5681.D.5223);
  D.5681 ={v} {CLOBBER};
  D.5681 ={v} {CLOBBER(eol)};
  bias ={v} {CLOBBER(eol)};
  <bb 11> [count: 0]:
<L0>:
  __dt_base  (&D.5681.D.5223);
  D.5681 ={v} {CLOBBER};
  D.5681 ={v} {CLOBBER(eol)};
  <bb 12> [count: 0]:
<L2>:
  D.5698 ={v} {CLOBBER};

Now, perhaps the sharing of stack slot between D.5681 and D.5698 is fine, seems
D.5698 is destructed before D.5681 is constructed:
  D.5698 ={v} {CLOBBER};
  D.5698 ={v} {CLOBBER(eol)};
  MEM <char[16]> [(struct _Function_base *)&D.5681] = {};
and D.5698 is later used just in EH block reachable only from earlier basic blocks
or just as
  D.5698 ={v} {CLOBBER};
in the last EH bb.  But the sharing of the stack slot in between bias and
D.5698 looks wrong.  What can be seen in the IL is:
  _12 = (long unsigned int) &bias;
which has been hoisted before the
  bias ={v} {CLOBBER};
statement by the slp1 pass.
From:
;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _3 = operator<< (&cout, "will do (ab)");
  endl (_3);
  MEM[(struct vec *)&cov_jn] ={v} {CLOBBER};
  bias ={v} {CLOBBER};
  MEM[(struct function *)&D.5682] ={v} {CLOBBER};
  MEM <char[16]> [(struct _Function_base *)&D.5682] = {};
  MEM[(struct function *)&D.5682]._M_invoker = _M_invoke;
  MEM[(struct function *)&D.5682].D.5235._M_manager = _M_manager;
  __ct_comp  (&D.5698.__est, &D.5682);
;;    succ:       5
;;                18

;;   basic block 5, loop depth 0
;;    pred:       2
  D.5698.__cov = &cov_jn;
  D.5698.__bias = &bias;
in dse4 to:
;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _12 = VIEW_CONVERT_EXPR<long unsigned int>(&bias);
  _15 = VIEW_CONVERT_EXPR<long unsigned int>(&cov_jn);
  _21 = {_15, _12};
  _9 = VIEW_CONVERT_EXPR<long unsigned int>(_M_invoke);
  _10 = VIEW_CONVERT_EXPR<long unsigned int>(_M_manager);
  _8 = {_10, _9};
  _3 = operator<< (&cout, "will do (ab)");
  endl (_3);
  MEM[(struct vec *)&cov_jn] ={v} {CLOBBER};
  bias ={v} {CLOBBER};
  MEM[(struct function *)&D.5682] ={v} {CLOBBER};
  MEM <char[16]> [(struct _Function_base *)&D.5682] = {};
  MEM <vector(2) long unsigned int> [(bool (*<T72d>) (union _Any_data & {ref-all}, const union _Any_data & {ref-all}, _Manager_operation) *)&D.5682 + 16B] = _8;
  __ct_comp  (&D.5698.__est, &D.5682);
;;    succ:       5
;;                18
  
;;   basic block 5, loop depth 0
;;    pred:       2
  MEM <vector(2) long unsigned int> [(void *)&D.5698 + 32B] = _21;
in slp1.

Is that what is incorrect? And we should never hoist taking of addresses before a clobber on that var?
Comment 10 Richard Biener 2023-01-17 12:27:53 UTC
(In reply to Jakub Jelinek from comment #9)
> And just statements that refer to those 3 variables that (incorrectly) share
> the stack slot + basic block boundaries.
> grep 'bias\|D.5698\|D.5681\|:' /tmp/00
>   struct struct void D.5698;
>   struct map_t D.5681;
>   struct vec bias;
>   <bb 2> [local count: 1073741829]:
>   _12 = (long unsigned int) &bias;
>   bias ={v} {CLOBBER};
>   __ct_comp  (&D.5698.__est, &D.5682);
>   <bb 3> [local count: 1073741824]:
>   MEM <vector(2) long unsigned int> [(void *)&D.5698 + 32B] = _21;
>   <bb 4> [local count: 1073741824]:
>   __ct_comp  (_14, &D.5698.__est);
>   <bb 5> [local count: 1073741824]:
>   vect__16.52_79 = MEM <vector(2) long unsigned int> [(void *)&D.5698 + 32B];
>   __dt_base  (&MEM[(struct function *)&D.5698].D.5235);
>   D.5698 ={v} {CLOBBER};
>   D.5698 ={v} {CLOBBER(eol)};
>   MEM <char[16]> [(struct _Function_base *)&D.5681] = {};
>   MEM <vector(2) long unsigned int> [(bool (*<T72d>) (union _Any_data &
> {ref-all}, const union _Any_data & {ref-all}, _Manager_operation) *)&D.5681
> + 16B] = _84;
>   <bb 6> [count: 0]:
> <L5>:
>   <bb 7> [count: 0]:
> <L4>:
>   __dt_base  (&MEM[(struct function *)&D.5698].D.5235);
>   D.5698 ={v} {CLOBBER};
>   <bb 8> [local count: 429496]:
>   <bb 9> [local count: 1073312328]:
>   _19 (&D.5683.D.5217._M_functor, &D.5681);
>   <bb 10> [local count: 1073312328]:
>   __dt_base  (&D.5681.D.5223);
>   D.5681 ={v} {CLOBBER};
>   D.5681 ={v} {CLOBBER(eol)};
>   bias ={v} {CLOBBER(eol)};
>   <bb 11> [count: 0]:
> <L0>:
>   __dt_base  (&D.5681.D.5223);
>   D.5681 ={v} {CLOBBER};
>   D.5681 ={v} {CLOBBER(eol)};
>   <bb 12> [count: 0]:
> <L2>:
>   D.5698 ={v} {CLOBBER};
> 
> Now, perhaps the sharing of stack slot between D.5681 and D.5698 is fine,
> seems
> D.5698 is destructed before D.5681 is constructed:
>   D.5698 ={v} {CLOBBER};
>   D.5698 ={v} {CLOBBER(eol)};
>   MEM <char[16]> [(struct _Function_base *)&D.5681] = {};
> and D.5698 is later used just in EH block reachable only from earlier basic
> blocks
> or just as
>   D.5698 ={v} {CLOBBER};
> in the last EH bb.  But the sharing of the stack slot in between bias and
> D.5698 looks wrong.  What can be seen in the IL is:
>   _12 = (long unsigned int) &bias;
> which has been hoisted before the
>   bias ={v} {CLOBBER};
> statement by the slp1 pass.
> From:
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _3 = operator<< (&cout, "will do (ab)");
>   endl (_3);
>   MEM[(struct vec *)&cov_jn] ={v} {CLOBBER};
>   bias ={v} {CLOBBER};
>   MEM[(struct function *)&D.5682] ={v} {CLOBBER};
>   MEM <char[16]> [(struct _Function_base *)&D.5682] = {};
>   MEM[(struct function *)&D.5682]._M_invoker = _M_invoke;
>   MEM[(struct function *)&D.5682].D.5235._M_manager = _M_manager;
>   __ct_comp  (&D.5698.__est, &D.5682);
> ;;    succ:       5
> ;;                18
> 
> ;;   basic block 5, loop depth 0
> ;;    pred:       2
>   D.5698.__cov = &cov_jn;
>   D.5698.__bias = &bias;
> in dse4 to:
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _12 = VIEW_CONVERT_EXPR<long unsigned int>(&bias);
>   _15 = VIEW_CONVERT_EXPR<long unsigned int>(&cov_jn);
>   _21 = {_15, _12};
>   _9 = VIEW_CONVERT_EXPR<long unsigned int>(_M_invoke);
>   _10 = VIEW_CONVERT_EXPR<long unsigned int>(_M_manager);
>   _8 = {_10, _9};
>   _3 = operator<< (&cout, "will do (ab)");
>   endl (_3);
>   MEM[(struct vec *)&cov_jn] ={v} {CLOBBER};
>   bias ={v} {CLOBBER};
>   MEM[(struct function *)&D.5682] ={v} {CLOBBER};
>   MEM <char[16]> [(struct _Function_base *)&D.5682] = {};
>   MEM <vector(2) long unsigned int> [(bool (*<T72d>) (union _Any_data &
> {ref-all}, const union _Any_data & {ref-all}, _Manager_operation) *)&D.5682
> + 16B] = _8;
>   __ct_comp  (&D.5698.__est, &D.5682);
> ;;    succ:       5
> ;;                18
>   
> ;;   basic block 5, loop depth 0
> ;;    pred:       2
>   MEM <vector(2) long unsigned int> [(void *)&D.5698 + 32B] = _21;
> in slp1.
> 
> Is that what is incorrect? And we should never hoist taking of addresses
> before a clobber on that var?

I think that's the usual pattern for the two other stack-slot sharing PRs we
have.  The liveness analysis makes wrong assumptions about CLOBBER and CLOBBER
isn't a barrier for address-takens (and we don't have birth CLOBBERs).

But why does -fstack-reuse=none not help?
Comment 11 Jakub Jelinek 2023-01-17 12:31:44 UTC
struct S { S *p; S *q; S () {} ~S (); };
void bar (S *);

void
foo ()
{
  S a, b;
  bar (nullptr);
  {
    S c;
    c.p = &a;
    c.q = &b;
    bar (&c);
  }
  bar (nullptr);
}

at -O2 gets roughly the same stuff in the IL with taking address of a and b being done before a and b is clobbered, then c being clobbered, initialized, eol clobbered and only then a and b destructed and eol clobbered.  But for some reason no stack sharing happens in that case.
Comment 12 Jakub Jelinek 2023-01-17 12:41:21 UTC
(In reply to Richard Biener from comment #10)
> I think that's the usual pattern for the two other stack-slot sharing PRs we
> have.  The liveness analysis makes wrong assumptions about CLOBBER and
> CLOBBER
> isn't a barrier for address-takens (and we don't have birth CLOBBERs).
> 
> But why does -fstack-reuse=none not help?

Because -fstack-reuse= controls behavior of the gimplifier/inliner (what kind of CLOBBERs are emitted), not whether we reuse stack slots during expansion or not.
And the CLOBBERs that matter here aren't coming from the -fstack-reuse= controlled
ones, but from C++ lifetime DSE.

-flifetime-dse=1 option works as workaround in this case.
Comment 13 rguenther@suse.de 2023-01-17 14:08:31 UTC
On Tue, 17 Jan 2023, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105769
> 
> --- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #10)
> > I think that's the usual pattern for the two other stack-slot sharing PRs we
> > have.  The liveness analysis makes wrong assumptions about CLOBBER and
> > CLOBBER
> > isn't a barrier for address-takens (and we don't have birth CLOBBERs).
> > 
> > But why does -fstack-reuse=none not help?
> 
> Because -fstack-reuse= controls behavior of the gimplifier/inliner (what kind
> of CLOBBERs are emitted), not whether we reuse stack slots during expansion or
> not.
> And the CLOBBERs that matter here aren't coming from the -fstack-reuse=
> controlled
> ones, but from C++ lifetime DSE.

Ah - we possibly want to gate the stack-sharing code with flag_stack_reuse
then?  (OTOH with inlining across TUs with different -fstack-reuse
setting things are murky - both with testing the flag and without)
Comment 14 Jakub Jelinek 2023-01-17 14:45:29 UTC
Dunno, bet we really want to introduce CLOBBER(bol) and only consider bol and eol clobbers for the stack reuse (or e.g. the tree-ssa-live.cc *live_vars* handling).
Wonder what amount of work it would be to add that, I guess main thing will be what to DCE etc., if we have CLOBBER(bol) followed by normal CLOBBER with no aliasing stores in between, bet we must keep the former, if we have CLOBBER(bol followed by CLOBBER(eol) with no aliasing stores in between, we could perhaps remove both as pair, etc.
Comment 15 rguenther@suse.de 2023-01-17 15:14:56 UTC
On Tue, 17 Jan 2023, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105769
> 
> --- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> Dunno, bet we really want to introduce CLOBBER(bol) and only consider bol and
> eol clobbers for the stack reuse (or e.g. the tree-ssa-live.cc *live_vars*
> handling).
> Wonder what amount of work it would be to add that, I guess main thing will be
> what to DCE etc., if we have CLOBBER(bol) followed by normal CLOBBER with no
> aliasing stores in between, bet we must keep the former, if we have CLOBBER(bol
> followed by CLOBBER(eol) with no aliasing stores in between, we could perhaps
> remove both as pair, etc.

See the RFC patches I posted last year ([PATCH 1/4][RFC] middle-end/90348 
- add explicit birth), also see how the handling wasn't entirely correct
but I also never got to finish that ...
Comment 16 Jakub Jelinek 2023-05-29 10:07:06 UTC
GCC 11.4 is being released, retargeting bugs to GCC 11.5.