Bug 90348 - [11 Regression] Partition of char arrays is incorrect in some cases
Summary: [11 Regression] Partition of char arrays is incorrect in some cases
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 8.3.0
: P2 normal
Target Milestone: 11.5
Assignee: Richard Biener
URL:
Keywords: wrong-code
Depends on: 111843
Blocks: 61203
  Show dependency treegraph
 
Reported: 2019-05-04 18:57 UTC by Pieter Wuille
Modified: 2024-06-20 13:24 UTC (History)
9 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 7.3.0, 8.3.0, 9.0
Last reconfirmed: 2019-05-05 00:00:00


Attachments
Source file to reproduce the problem (4.66 KB, text/plain)
2019-05-04 18:57 UTC, Pieter Wuille
Details
Un-preprocessed test case C file (302 bytes, text/x-csrc)
2019-05-04 19:04 UTC, Pieter Wuille
Details
gcc10-pr90348.patch (2.18 KB, patch)
2019-05-07 11:11 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Pieter Wuille 2019-05-04 18:57:12 UTC
Created attachment 46289 [details]
Source file to reproduce the problem

In attachment is a small source file that seems to be miscompiled with "-O1 -finline-small-functions".

Tested versions/environments where the issue appears:
* GCC 7.3, Ubuntu 18.04.2, x86_64, "gcc -O1 -finline-small-functions test.i -o test"
* GCC 8.3, Ubuntu 18.04.2, x86_64, "gcc -O1 -finline-small-functions test.i -o test"
* GCC 8.3, Ubuntu 18.04.2, x86_64, "gcc -m32 -O1 -finline-small-functions test.i -o test"
* GCC 9.0, Fedora 30, i686, "gcc -O2 test.i -o test"

In a loop, a 4-byte (or larger) char array "in" is created, and then increasingly long prefixes initialized to zero. A small inlinable function "set_one_on_stack" is invoked during the loop that should have no effect (it sets a local variable "buf" to one in a roundabout way), but apparently the "buf" variable is given the same stack location as the caller's "in" variable, overwriting the latter. When compiled incorrectly, an unexpected assertion occurs.

The test file is a reduced version of an issue observed on some platforms in the Bitcoin Core unit tests. See https://github.com/bitcoin/bitcoin/issues/14580 for more details.
Comment 1 Pieter Wuille 2019-05-04 19:04:22 UTC
Created attachment 46290 [details]
Un-preprocessed test case C file

Adding the original .c file as well to demonstrate the problem on different platforms (the .i file is for GCC 8.3, x86_64, -m64).
Comment 2 mcccs 2019-05-05 10:46:43 UTC
/*Below's the compiler testsuite code with no headers.

Here's the assembly difference (because the difference is very small. The one on the left works (-O1 only) while the one on the right has `-O1 -finline-small-functions`): https://www.diffchecker.com/hAYLeoPV

I've also tested it on macOS 10.14 with GCC 8, it failed too.

In addition, on Aarch64 - Raspberry/Raspbian using GCC 6 and GCC 9: GCC 9 also failed on ARM, but GCC 6 didn't cause assertion fail, thus it's clearly a 7/8/9/10 regression.*/

/* { dg-do run } */
/* { dg-options "-O2" } */

void __attribute__ ((noinline)) set_one (unsigned char* ptr)
{
    *ptr = 1;
}

int __attribute__ ((noinline)) check_nonzero (unsigned char const* in, unsigned int len)
{
    for (unsigned int i = 0; i < len; ++i) {
        if (in[i] != 0) return 1;
    }
    return 0;
}

void set_one_on_stack () {
    unsigned char buf[1];
    set_one (buf);
}

int main () {
    for (int i = 0; i < 5; ++i) {
        unsigned char in[4];
        for (int j = 0; j < i; ++j) {
            in[j] = 0;
            set_one_on_stack ();
        }
        if (check_nonzero (in, i)) {
            __builtin_abort ();
        }
    }
}


//~ MCCCS  (DesWurstes)
Comment 3 Andrew Pinski 2019-05-05 20:25:16 UTC
Partition 0: size 4 align 8
        in      buf

That is wrong, it should have been two seperate ones.
Confirmed with:
gcc version 9.0.1 20190310 (experimental) [master revision 449a19898aa:0239598e3c8:8fe074cf790f632b22e59c24f102e528407bb04e] (GCC)

On aarch64-linux-gnu, I don't see any changes to the expand sources after that point.

I could not reproduce it with Ubuntu's 7.3.0 but can with a non modified version of GCC 7.3.0 on mips64-linux-gnu:
Partition 0: size 4 align 8
        in      buf
Comment 4 Richard Biener 2019-05-06 08:37:13 UTC
Sth wrong with liveness analysis.  There's obvious liveness of IN from BB2:

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _30 = (unsigned long) &in;
  ivtmp.30_29 = _30 + 1;
  goto <bb 5>; [100.00%]
;;    succ:       5

to

;;   basic block 7, loop depth 1
;;    pred:       5
  in ={v} {CLOBBER};
  i_10 = i_6 + 1;
  if (i_10 != 5)
    goto <bb 3>; [80.00%]
  else
    goto <bb 8>; [20.00%]
;;    succ:       3
;;                8

but maybe that being at different loop depth somehow confuses the algorithm
in fact having it there seems odd to me but the address-taking in BB2 is
the result of IVOPTs hoisting.  The CLOBBER doesn't effect hoisting
the address but RTL expansion liveness compute splits 'in' into multiple
logical instances at the CLOBBER which _does_ make the addresses effectively
different ...
Comment 5 Jakub Jelinek 2019-05-06 15:39:16 UTC
I think before ivopts this looks fine, we have:
  <bb 3> [local count: 858993460]:
  # j_20 = PHI <0(10), j_13(11)>
  in[j_20] = 0;
  set_one (&buf);
  buf ={v} {CLOBBER};
  j_13 = j_20 + 1;
  if (i_6 > j_20)
    goto <bb 11>; [80.00%]
  else
    goto <bb 4>; [20.00%]
in the inner loop and
  <bb 7> [local count: 214748365]:
  in ={v} {CLOBBER};
  i_10 = i_6 + 1;
  ivtmp_22 = ivtmp_4 - 1;
  if (ivtmp_22 != 0)
    goto <bb 8>; [80.00%]
  else
    goto <bb 9>; [20.00%]
near the end of the outer loop.  The problem is that ivopts turns the in[j_20]
access into
  <bb 2> [local count: 42959980]:
  _30 = (unsigned int) &in;
  ivtmp.27_29 = _30 + 1;
  goto <bb 5>; [100.00%]
  
  <bb 10> [local count: 171798691]:
  ivtmp.21_16 = (unsigned int) &in;
  _23 = (unsigned int) &in;
    
  <bb 3> [local count: 858993460]:
  # ivtmp.21_21 = PHI <ivtmp.21_16(10), ivtmp.21_3(11)>
  _18 = (void *) ivtmp.21_21;
  MEM[base: _18, offset: 0B] = 0;
  set_one (&buf);
  buf ={v} {CLOBBER};
which is still fine, for the life analysis we still see in as live in bb 10 and therefore in bb3 too.
But later on dom3 changes
   <bb 3> [local count: 171798691]:
-  ivtmp.21_16 = (unsigned int) &in;
-  _23 = (unsigned int) &in;
+  ivtmp.21_16 = _30;
+  _23 = _30;
and now &in is live clearly only in bb2 before the outer loop for the life analysis purposes.  Now, if we kill that
      if (gimple_clobber_p (stmt))
        {
          tree lhs = gimple_assign_lhs (stmt);
          size_t *v;
          /* Nested function lowering might introduce LHSs
             that are COMPONENT_REFs.  */
          if (!VAR_P (lhs))
            continue;
          if (DECL_RTL_IF_SET (lhs) == pc_rtx
              && (v = decl_to_stack_part->get (lhs)))
            bitmap_clear_bit (work, *v);
        }
you've mentioned, I'm afraid we lose all the stack slot sharing possibilities, or at least most of them.
So I'm afraid we need to do something smarter.
Either we need to track during the life analysis algorithm what variables SSA_NAMEs can point to (even for non-pointers, as in this testcase ivopts uses an integral value (unsigned int) &in and then later casts to integers), or could we use the alias analysis points to information for that?

For this testcase, it would be enough to walk for referenced pointer SSA_NAMEs through their points to variables and mark those variables as also seen in the IL.
Comment 6 Jakub Jelinek 2019-05-06 15:50:58 UTC
We'd probably need to change decl_to_stack_part from hash_map<tree, size_t> to hash_map<unsigned, size_t> that would just map DECL_UIDs which have DECL_RTL_IF_SET equal to pc_rtx to the partition numbers, rather than from decl, or have some other mapping from DECL_UIDs that appear in the points to info vars back to the decls.
Comment 7 Michael Matz 2019-05-06 16:28:24 UTC
No, this is not a problem in the stack slot sharing algorithm, but rather in
the input.  As presented to expand, and only showing the important parts,
and reordering some BBs to make the flow more obvious:

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _30 = (unsigned long) &in;
  ivtmp.27_29 = _30 + 1;
  goto <bb 5>; [100.00%]

So, 'in' becomes "live" here, it's address in _30 and _29.  Fallthrough to bb5,
which also uses in, but otherwise is uninteresting, except that it can reach
only BBs 6 and 7:

;;   basic block 5, loop depth 1
  ...
  _2 = check_zero (&in, _31);
  if (_2 != 0)
    goto <bb 7>; [99.96%]
  else
    goto <bb 6>; [0.04%]

bb6 is a no-return block, hence uninteresting.  bb7 _is_ interesting in that
it clobbers in:

;;   basic block 7, loop depth 1
;;    pred:       5
  in ={v} {CLOBBER};
  if (i_11 != 5)
    goto <bb 8>; [83.33%]
  else
    goto <bb 9>; [16.67%]

Note that the semantics of the clobber is not only that the former contents
are destroyed, but rather that the very storage associated with the clobbered
entity is gone.  So, from now on, any pointers into 'in', and memory accesses
into 'in' are invalid.  Nevertheless the flow from bb7 goes to bb 8 and 9,
the latter being the return block, so:

;;   basic block 8, loop depth 1
;;    pred:       7
  if (i_11 > 0)
    goto <bb 3>; [100.00%]
  else
    goto <bb 4>; [0.00%]

and here we finally have a path into bb3, which is the other interesting one:

;;   basic block 3, loop depth 2
  # ivtmp.20_6 = PHI <_30(8), ivtmp.20_18(3)>

.... BOOM! .... Here _30 is used, and ...

  _4 = (void *) ivtmp.20_6;
  MEM[base: _4, offset: 0B] = 0;

... even written into ...  That's invalid.  _30 is associated with an
object that is clobbered and gone ...

  set_one (&buf);
  buf ={v} {CLOBBER};
  ivtmp.20_18 = ivtmp.20_6 + 1;

... and as the MEM[] write can't have possibly been into 'in' (as that is invalid, as 'in' didn't exist at the MEM access), it's okay and sensible to
allocate 'in' and 'buf' into the same memory.

It seems to be a common misconception of what the clobber is really about.
I designed it to mean what I wrote above, the storage associated with the
clobbered entities is gone after the clobber (not merely it's former contents!). 

But ivopts or dom extends the lifetime of 'in' (by moving an address-taken
earlier) and hence the lifetime of its storage, but without doing anything about the clobber (and hence not _really_ extending the lifetime).  That doesn't work.
It's basically a mirrored transformation of the usual take-address-of-local
and access it out of it's declared scope, just that here the _start_ of the
supposed lifetime is moved out of the declared scope, not the end.
Comment 8 Richard Biener 2019-05-07 09:26:30 UTC
(In reply to Michael Matz from comment #7)
> No, this is not a problem in the stack slot sharing algorithm, but rather in
> the input.  As presented to expand, and only showing the important parts,
> and reordering some BBs to make the flow more obvious:
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _30 = (unsigned long) &in;
>   ivtmp.27_29 = _30 + 1;
>   goto <bb 5>; [100.00%]
> 
> So, 'in' becomes "live" here, it's address in _30 and _29.  Fallthrough to
> bb5,
> which also uses in, but otherwise is uninteresting, except that it can reach
> only BBs 6 and 7:
> 
> ;;   basic block 5, loop depth 1
>   ...
>   _2 = check_zero (&in, _31);
>   if (_2 != 0)
>     goto <bb 7>; [99.96%]
>   else
>     goto <bb 6>; [0.04%]
> 
> bb6 is a no-return block, hence uninteresting.  bb7 _is_ interesting in that
> it clobbers in:
> 
> ;;   basic block 7, loop depth 1
> ;;    pred:       5
>   in ={v} {CLOBBER};
>   if (i_11 != 5)
>     goto <bb 8>; [83.33%]
>   else
>     goto <bb 9>; [16.67%]
> 
> Note that the semantics of the clobber is not only that the former contents
> are destroyed, but rather that the very storage associated with the clobbered
> entity is gone.  So, from now on, any pointers into 'in', and memory accesses
> into 'in' are invalid.  Nevertheless the flow from bb7 goes to bb 8 and 9,
> the latter being the return block, so:
> 
> ;;   basic block 8, loop depth 1
> ;;    pred:       7
>   if (i_11 > 0)
>     goto <bb 3>; [100.00%]
>   else
>     goto <bb 4>; [0.00%]
> 
> and here we finally have a path into bb3, which is the other interesting one:
> 
> ;;   basic block 3, loop depth 2
>   # ivtmp.20_6 = PHI <_30(8), ivtmp.20_18(3)>
> 
> .... BOOM! .... Here _30 is used, and ...
> 
>   _4 = (void *) ivtmp.20_6;
>   MEM[base: _4, offset: 0B] = 0;
> 
> ... even written into ...  That's invalid.  _30 is associated with an
> object that is clobbered and gone ...
> 
>   set_one (&buf);
>   buf ={v} {CLOBBER};
>   ivtmp.20_18 = ivtmp.20_6 + 1;
> 
> ... and as the MEM[] write can't have possibly been into 'in' (as that is
> invalid, as 'in' didn't exist at the MEM access), it's okay and sensible to
> allocate 'in' and 'buf' into the same memory.
> 
> It seems to be a common misconception of what the clobber is really about.
> I designed it to mean what I wrote above, the storage associated with the
> clobbered entities is gone after the clobber (not merely it's former
> contents!). 
> 
> But ivopts or dom extends the lifetime of 'in' (by moving an address-taken
> earlier) and hence the lifetime of its storage, but without doing anything
> about the clobber (and hence not _really_ extending the lifetime).  That
> doesn't work.
> It's basically a mirrored transformation of the usual take-address-of-local
> and access it out of it's declared scope, just that here the _start_ of the
> supposed lifetime is moved out of the declared scope, not the end.

While I spotted this "issue" as well I respectfully disagree about
the semantics.  Not because they are not sound but because of
the implementation challenge resolving around address-takens being
values with no data dependence on the clobbers.

I thought the liveness algorithm would be able to handle this situation.
If not we need to prune clobbers that became "invalid" somehow.
Not sure how - the address value never "dies", so we'd need to compute
liveness of the things the address escapes to - SSA names are easy,
calls - well, we have to give up.  This possiby defeats the whole idea
of doing stack sharing analysis (I remember kernel/fortran testcases
passing the stack slots by reference to a function).

Now if we would want to try forcing the original semantics we need a
verifier verifying the IL state against bogus transforms - but then we
would have a way to prune the invalid ones.
Comment 9 Jakub Jelinek 2019-05-07 11:11:03 UTC
Created attachment 46312 [details]
gcc10-pr90348.patch

Untested patch that implements what was written in #c5.  I agree that without further changes to the IL, determining if one can hoist addresses of local variables or not is going to be hard, would require computing the variable life info in each pass that would do something similar.  On the other side, admittedly such hoisting results in worse code generation because if the address is hoisted earlier than where it used to be live before, then there will be more stack conflicts.
Comment 10 Richard Biener 2019-05-07 12:45:39 UTC
(In reply to Jakub Jelinek from comment #9)
> Created attachment 46312 [details]
> gcc10-pr90348.patch
> 
> Untested patch that implements what was written in #c5.  I agree that
> without further changes to the IL, determining if one can hoist addresses of
> local variables or not is going to be hard, would require computing the
> variable life info in each pass that would do something similar.  On the
> other side, admittedly such hoisting results in worse code generation
> because if the address is hoisted earlier than where it used to be live
> before, then there will be more stack conflicts.

Ick.  You need to handle pt->anything and pt->escaped (walk the escaped
solution) as well.  And !SSA_NAME_PTR_INFO (op) is of course the same
as pt->anything.

I see it quickly degrading given the last PTA compute is far far away...
Comment 11 Jakub Jelinek 2019-05-07 13:15:48 UTC
(In reply to Richard Biener from comment #10)
> (In reply to Jakub Jelinek from comment #9)
> > Created attachment 46312 [details]
> > gcc10-pr90348.patch
> > 
> > Untested patch that implements what was written in #c5.  I agree that
> > without further changes to the IL, determining if one can hoist addresses of
> > local variables or not is going to be hard, would require computing the
> > variable life info in each pass that would do something similar.  On the
> > other side, admittedly such hoisting results in worse code generation
> > because if the address is hoisted earlier than where it used to be live
> > before, then there will be more stack conflicts.
> 
> Ick.  You need to handle pt->anything and pt->escaped (walk the escaped
> solution) as well.  And !SSA_NAME_PTR_INFO (op) is of course the same
> as pt->anything.

Do I?
I thought I don't.  The thing is, I believe the partitioning only cares about where (originally in the source) the automatic variables are referenced.
Now, if I have say:
__attribute__((noipa)) type *foo (type *x) { return x; }
or similar, in order for a variable to be live before the clobber, it needs to escape in the area where the variable is live, we don't care what we do with whatever it returned, unless we actually hoist such escaping calls before that region.  If we can say for:
  for (...)
    {
      unsigned char v[10];
      unsigned char *p = foo (v);
      *p = 1;
      unsigned char w[10];
      bar (w);
    }
hoist the p = foo (v); call before the loop, then indeed we are in big trouble.
But if we can't, the attached patch was just meant as a shorthand for trying to do a dataflow propagation of the can a pointer SSA_NAME point to variable xyz, if we don't consider escaping (to other functions and to global state).
What I'm trying to "undo" is just the hoisting of the address loads, not hoisting of those address loads plus function calls in which it escapes or where that address is saved to memory.

If I have to consider pt->anything and pt->escaped, then it will be as useless for the variable conflicts as is removing the important clearing of the bitmap bit on clobber stmt, we won't share stack slots pretty much at all.
Comment 12 Michael Matz 2019-05-07 13:22:51 UTC
(In reply to Jakub Jelinek from comment #11)
> before that region.  If we can say for:
>   for (...)
>     {
>       unsigned char v[10];
>       unsigned char *p = foo (v);
>       *p = 1;
>       unsigned char w[10];
>       bar (w);
>     }
> hoist the p = foo (v); call before the loop, then indeed we are in big
> trouble.

This is effectively what the testcase is doing (just that 'foo' is no call, but a normal address expression), so yes, we can do that, and yes we are in big
trouble :-/

> If I have to consider pt->anything and pt->escaped, then it will be as
> useless for the variable conflicts as is removing the important clearing of
> the bitmap bit on clobber stmt, we won't share stack slots pretty much at
> all.

Yeah; if we don't want to patch the specific situation for this testcase
(which might be okayish, we haven't seen this problem very often over the
last years), but want to really fix it we might have to take more involved
means like doing stack slot sharing before gimplification and rewriting
the IL to reflect this.  Or give up on sharing (which isn't a good idea).
Gnah.
Comment 13 Richard Biener 2019-05-07 13:36:12 UTC
(In reply to Jakub Jelinek from comment #11)
> (In reply to Richard Biener from comment #10)
> > (In reply to Jakub Jelinek from comment #9)
> > > Created attachment 46312 [details]
> > > gcc10-pr90348.patch
> > > 
> > > Untested patch that implements what was written in #c5.  I agree that
> > > without further changes to the IL, determining if one can hoist addresses of
> > > local variables or not is going to be hard, would require computing the
> > > variable life info in each pass that would do something similar.  On the
> > > other side, admittedly such hoisting results in worse code generation
> > > because if the address is hoisted earlier than where it used to be live
> > > before, then there will be more stack conflicts.
> > 
> > Ick.  You need to handle pt->anything and pt->escaped (walk the escaped
> > solution) as well.  And !SSA_NAME_PTR_INFO (op) is of course the same
> > as pt->anything.
> 
> Do I?
> I thought I don't.  The thing is, I believe the partitioning only cares
> about where (originally in the source) the automatic variables are
> referenced.
> Now, if I have say:
> __attribute__((noipa)) type *foo (type *x) { return x; }
> or similar, in order for a variable to be live before the clobber, it needs
> to escape in the area where the variable is live, we don't care what we do
> with whatever it returned, unless we actually hoist such escaping calls
> before that region.  If we can say for:
>   for (...)
>     {
>       unsigned char v[10];
>       unsigned char *p = foo (v);
>       *p = 1;
>       unsigned char w[10];
>       bar (w);
>     }
> hoist the p = foo (v); call before the loop, then indeed we are in big
> trouble.
> But if we can't, the attached patch was just meant as a shorthand for trying
> to do a dataflow propagation of the can a pointer SSA_NAME point to variable
> xyz, if we don't consider escaping (to other functions and to global state).
> What I'm trying to "undo" is just the hoisting of the address loads, not
> hoisting of those address loads plus function calls in which it escapes or
> where that address is saved to memory.
> 
> If I have to consider pt->anything and pt->escaped, then it will be as
> useless for the variable conflicts as is removing the important clearing of
> the bitmap bit on clobber stmt, we won't share stack slots pretty much at
> all.

The point about !SSA_NAME_PTR_INFO is that passes might clear it, replace
such SSA name with a new one, forgetting to copy it, etc.  The point
about anything is that points-to analysis might just have given up
(consider the provenance thing where we might end up with anything for
going through some uintptr arithmetic), the point about escaped is that
this is just a placeholder for a set of variables and points-to tends
to shrink sets by removing bits also in escaped.  Also
global = ptr; ptr2 = global; will have escaped in the sets but not
necessarily the bit of the original decl.

So my comments are just about correctly interpreting points-to data.
Comment 14 Jakub Jelinek 2019-05-07 13:39:20 UTC
(In reply to Michael Matz from comment #12)
> (In reply to Jakub Jelinek from comment #11)
> > before that region.  If we can say for:
> >   for (...)
> >     {
> >       unsigned char v[10];
> >       unsigned char *p = foo (v);
> >       *p = 1;
> >       unsigned char w[10];
> >       bar (w);
> >     }
> > hoist the p = foo (v); call before the loop, then indeed we are in big
> > trouble.
> 
> This is effectively what the testcase is doing (just that 'foo' is no call,
> but a normal address expression), so yes, we can do that, and yes we are in
> big
> trouble :-/

Well, that p = foo (v); or store of memory is something that will have (unless const call, indeed) a vuse or even a vdef and the alias oracle should usually consider it to be conflicting with the clobber stmt, so I'd hope it isn't hoisted in that case, while for the pure address arithmetics there is nothing that can easily stop the hoisting.

Now, if there isn't really an easy way out of this and given how rarely this has been reported (I think we have this PR and PR61203, don't remember anything else), the options can be do nothing, or do something simple that isn't that expensive, will fix this testcase and while not perfect solution will still allow variable sharing in the general case almost as often as before, or change the IL representation so that such hoisting or sinking of addresses is not possible.
Comment 15 Richard Biener 2019-05-07 13:50:42 UTC
Educating people about -fstack-reuse is also a possibility, thus leave the issue to workarounds like that, experimenting with full rewrites that are obviously not back-portable.
Comment 16 Richard Biener 2020-01-23 09:42:17 UTC
Even before IVOPTs we have an access to in[] "after" the clobber.  I think the
fact that there are multiple instances of in is not well represented and
for the testcase at hand jump-threading essentially exposes two instances
of in[] with the second being live across a backedge (DOM rotates the loop),
crossing liveness with the instance live on loop entry.
I believe in this situation even simple CSE can wreck our liveness analysis
if you'd consider

  addr1 = &in;
  in ={v} CLOBBER;
  addr2 = &in;
...

where the address might be easily exposed in a testcase and the CSE triggering
after some unrolling.

Ideas to do stack-slot sharing very early and expose it in the IL will
have quite an impact throughout the compilation and would not allow
sharing of stack slots only exposed after inlining for example (to be
reliable the stack slot sharing would need to happen as early as
gimple lowering where we remove BINDs and make all variables function-scope).
Comment 17 Alexander Monakov 2020-01-23 16:13:14 UTC
I think part of the problem is trying to make "deaths" explicit via CLOBBERs without making "births" also explicit in the IR. Doing both would have allowed a lifetime verifier that checks that births dominate all references to a variable and likewise deaths/clobbers postdominate all references, which would presumably catch this early and make the IR more rigorous overall.
Comment 18 Michael Matz 2020-01-23 16:52:03 UTC
(In reply to Alexander Monakov from comment #17)
> I think part of the problem is trying to make "deaths" explicit via CLOBBERs
> without making "births" also explicit in the IR.

Yes, that's basically the crux (or rather, that our births (first mention of a
decl name) don't conflict with address-taking or access), and without changing
this we're bound to heap hacks onto hacks to make it all work.  So, ideally we'd
represent the births of local decls as well (i.e. "allocate" them), represent all accesses indirectly via pointers, and "free" them at deaths.  Optionally, very late in the pipeline, we can rewrite alloc/free pairs whose pointers don't
escape into direct accesses to decls again (not sure if that's worthwhile at
all; the 'free/alloc' statements would be virtual anyway, and we would handle
only decls that must lie in memory this way).

With that we'd have to amend some optimizations nevertheless, as those alloc/free pairs would hinder optimizations (no moving of instructions accessing 
the locals over them), so they would have to be moved sometimes in order to allow them (if deemed profitable).
Comment 19 Alexander Monakov 2020-01-23 17:00:02 UTC
(In reply to Michael Matz from comment #18)
> represent all accesses indirectly via pointers

Would that be necessary in presence of a verifier that ensures that all references are dominated by births?
Comment 20 rguenther@suse.de 2020-01-23 17:36:23 UTC
On January 23, 2020 6:00:02 PM GMT+01:00, "amonakov at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90348
>
>--- Comment #19 from Alexander Monakov <amonakov at gcc dot gnu.org>
>---
>(In reply to Michael Matz from comment #18)
>> represent all accesses indirectly via pointers
>
>Would that be necessary in presence of a verifier that ensures that all
>references are dominated by births?

No, but it would automatically ensure this via the dependence.
Comment 21 Jakub Jelinek 2020-03-04 09:52:42 UTC
GCC 8.4.0 has been released, adjusting target milestone.
Comment 22 Jakub Jelinek 2021-05-14 09:51:39 UTC
GCC 8 branch is being closed.
Comment 23 Richard Biener 2021-06-01 08:14:03 UTC
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
Comment 24 GCC Commits 2022-02-04 07:17:12 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:551aa75778a4c5165d9533cd447c8fc822f583e1

commit r12-7044-g551aa75778a4c5165d9533cd447c8fc822f583e1
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Feb 2 14:24:39 2022 +0100

    Add CLOBBER_EOL to mark storage end-of-life clobbers
    
    This adds a flag to CONSTRUCTOR nodes indicating that for
    clobbers this marks the end-of-life of storage as opposed to
    just ending the lifetime of the object that occupied it.
    The dangling pointer diagnostics uses CLOBBERs but is confused
    by those emitted by the C++ frontend for example which emits
    them for the second purpose at the start of CTORs.  The issue
    is also appearant for aarch64 in PR104092.
    
    Distinguishing the two cases is also necessary for the PR90348 fix.
    
    Since I'm going to add another flag I added an enum clobber_flags
    and a defaulted argument to build_clobber plus a convenient way to
    query the enum from the CTOR tree and specify it for gimple_clobber_p.
    Since 'CLOBBER' is already taken and I needed a name for the unspecified
    clobber we have now I used 'CLOBBER_UNDEF'.
    
    2022-02-03  Richard Biener  <rguenther@suse.de>
    
            PR middle-end/90348
            PR middle-end/104092
    gcc/
            * tree-core.h (clobber_kind): New enum.
            (tree_base::u::bits::address_space): Document use in CONSTRUCTORs.
            * tree.h (CLOBBER_KIND): Add.
            (build_clobber): Add clobber kind argument, defaulted to
            CLOBBER_UNDEF.
            * tree.cc (build_clobber): Likewise.
            * gimple.h (gimple_clobber_p): New overload with specified kind.
            * tree-streamer-in.cc (streamer_read_tree_bitfields): Stream
            CLOBBER_KIND.
            * tree-streamer-out.cc (streamer_write_tree_bitfields):
            Likewise.
            * tree-pretty-print.cc (dump_generic_node): Mark EOL CLOBBERs.
            * gimplify.cc (gimplify_bind_expr): Build storage end-of-life clobbers
            with CLOBBER_EOL.
            (gimplify_target_expr): Likewise.
            * tree-inline.cc (expand_call_inline): Likewise.
            * tree-ssa-ccp.cc (insert_clobber_before_stack_restore): Likewise.
            * gimple-ssa-warn-access.cc (pass_waccess::check_stmt): Only treat
            CLOBBER_EOL clobbers as ending lifetime of storage.
    
    gcc/lto/
            * lto-common.cc (compare_tree_sccs_1): Compare CLOBBER_KIND.
    
    gcc/testsuite/
            * gcc.dg/pr87052.c: Adjust.
Comment 25 Richard Biener 2022-05-27 09:40:46 UTC
GCC 9 branch is being closed
Comment 26 Jakub Jelinek 2022-06-28 10:37:20 UTC
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
Comment 27 Richard Biener 2023-05-09 11:06:45 UTC
The testcase in comment#2 still reproduces on the GCC 11 branch but no longer with GCC 12+ (on x86_64-linux, -O2).
Comment 28 Richard Biener 2023-07-07 10:35:22 UTC
GCC 10 branch is being closed.
Comment 29 maic 2023-07-19 11:07:10 UTC
> GCC 11 branch but no longer with GCC 12+

So is this fixed and can be closed? (Also, the title could be edited then to remove gcc 12-14)
Comment 30 rguenther@suse.de 2023-07-19 11:37:42 UTC
On Wed, 19 Jul 2023, gnu.ojxq8 at dralias dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90348
> 
> maic <gnu.ojxq8 at dralias dot com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |gnu.ojxq8 at dralias dot com
> 
> --- Comment #29 from maic <gnu.ojxq8 at dralias dot com> ---
> > GCC 11 branch but no longer with GCC 12+
> 
> So is this fixed and can be closed? (Also, the title could be edited then to
> remove gcc 12-14)

No it's not fixed, just the testcase doesn't expose the issue anymore.
Comment 31 GCC Commits 2024-01-16 10:51:20 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:1251d3957de04dc9b023a23c09400217e13deadb

commit r14-7274-g1251d3957de04dc9b023a23c09400217e13deadb
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Jan 16 11:49:34 2024 +0100

    cfgexpand: Workaround CSE of ADDR_EXPRs in VAR_DECL partitioning [PR113372]
    
    The following patch adds a quick workaround to bugs in VAR_DECL
    partitioning.
    The problem is that there is no dependency between ADDR_EXPRs of local
    decls and CLOBBERs of those vars, so VN can CSE uses of ADDR_EXPRs
    (including ivopts integral variants thereof), which can break
    add_scope_conflicts discovery of what variables are actually used
    in certain region.
    E.g. we can have
      ivtmp.40_3 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B];
    ...
      uses of ivtmp.40_3
    ...
      bitint.6 ={v} {CLOBBER(eos)};
    ...
      ivtmp.28_43 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B];
    ...
      uses of ivtmp.28_43
    before VN (such as dom3), which the add_scope_conflicts code identifies as 2
    independent uses of bitint.6 variable (which is correct), but then VN
    determines ivtmp.28_43 is the same as ivtmp.40_3 and just uses ivtmp.40_3
    even in the second region; at that point add_scope_conflict thinks the
    bitint.6 variable is not used in that region anymore.
    
    The following patch does a simple single def-stmt check for such ADDR_EXPRs
    (rather than say trying to do a full propagation of what SSA_NAMEs can
    contain ADDR_EXPRs of local variables), which seems to workaround all 4 PRs.
    
    In addition to this patch I've used the attached one to gather statistics
    on the total size of all variable partitions in a function and seems besides
    the new testcases nothing is really affected compared to no patch (I've
    actually just modified the patch to == OMP_SCAN instead of == ADDR_EXPR, so
    it looks the same except that it never triggers).  The comparison wasn't
    perfect because I've only gathered BITS_PER_WORD, main_input_filename (did
    some replacement of build directories and /tmp/ccXXXXXX names of LTO to make
    it more similar between the two bootstraps/regtests), current_function_name
    and the total size of all variable partitions if any, because I didn't
    record e.g. the optimization options and so e.g. torture tests which iterate
    over options could have different partition sizes even in one compiler when
    BITS_PER_WORD, main_input_filename and current_function_name are all equal.
    So had to write an awk script to check if the first triple in the second
    build appeared in the first one and the quadruple in the second build
    appeared in the first one too, otherwise print result and that only
    triggered in the new tests.
    Also, the cc1plus binary according to objdump -dr is identical between the
    two builds except for the ADDR_EXPR vs. OMP_SCAN constant in the two spots.
    
    2024-01-16  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/113372
            PR middle-end/90348
            PR middle-end/110115
            PR middle-end/111422
            * cfgexpand.cc (add_scope_conflicts_2): New function.
            (add_scope_conflicts_1): Use it.
    
            * gcc.dg/torture/bitint-49.c: New test.
            * gcc.c-torture/execute/pr90348.c: New test.
            * gcc.c-torture/execute/pr110115.c: New test.
            * gcc.c-torture/execute/pr111422.c: New test.
Comment 32 GCC Commits 2024-03-02 00:38:16 UTC
The releases/gcc-13 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:432708c306838fe1444da0df7d629a60468c0c73

commit r13-8383-g432708c306838fe1444da0df7d629a60468c0c73
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Jan 16 11:49:34 2024 +0100

    cfgexpand: Workaround CSE of ADDR_EXPRs in VAR_DECL partitioning [PR113372]
    
    The following patch adds a quick workaround to bugs in VAR_DECL
    partitioning.
    The problem is that there is no dependency between ADDR_EXPRs of local
    decls and CLOBBERs of those vars, so VN can CSE uses of ADDR_EXPRs
    (including ivopts integral variants thereof), which can break
    add_scope_conflicts discovery of what variables are actually used
    in certain region.
    E.g. we can have
      ivtmp.40_3 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B];
    ...
      uses of ivtmp.40_3
    ...
      bitint.6 ={v} {CLOBBER(eos)};
    ...
      ivtmp.28_43 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B];
    ...
      uses of ivtmp.28_43
    before VN (such as dom3), which the add_scope_conflicts code identifies as 2
    independent uses of bitint.6 variable (which is correct), but then VN
    determines ivtmp.28_43 is the same as ivtmp.40_3 and just uses ivtmp.40_3
    even in the second region; at that point add_scope_conflict thinks the
    bitint.6 variable is not used in that region anymore.
    
    The following patch does a simple single def-stmt check for such ADDR_EXPRs
    (rather than say trying to do a full propagation of what SSA_NAMEs can
    contain ADDR_EXPRs of local variables), which seems to workaround all 4 PRs.
    
    In addition to this patch I've used the attached one to gather statistics
    on the total size of all variable partitions in a function and seems besides
    the new testcases nothing is really affected compared to no patch (I've
    actually just modified the patch to == OMP_SCAN instead of == ADDR_EXPR, so
    it looks the same except that it never triggers).  The comparison wasn't
    perfect because I've only gathered BITS_PER_WORD, main_input_filename (did
    some replacement of build directories and /tmp/ccXXXXXX names of LTO to make
    it more similar between the two bootstraps/regtests), current_function_name
    and the total size of all variable partitions if any, because I didn't
    record e.g. the optimization options and so e.g. torture tests which iterate
    over options could have different partition sizes even in one compiler when
    BITS_PER_WORD, main_input_filename and current_function_name are all equal.
    So had to write an awk script to check if the first triple in the second
    build appeared in the first one and the quadruple in the second build
    appeared in the first one too, otherwise print result and that only
    triggered in the new tests.
    Also, the cc1plus binary according to objdump -dr is identical between the
    two builds except for the ADDR_EXPR vs. OMP_SCAN constant in the two spots.
    
    2024-01-16  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/113372
            PR middle-end/90348
            PR middle-end/110115
            PR middle-end/111422
            * cfgexpand.cc (add_scope_conflicts_2): New function.
            (add_scope_conflicts_1): Use it.
    
            * gcc.c-torture/execute/pr90348.c: New test.
            * gcc.c-torture/execute/pr110115.c: New test.
            * gcc.c-torture/execute/pr111422.c: New test.
    
    (cherry picked from commit 1251d3957de04dc9b023a23c09400217e13deadb)
Comment 33 Jakub Jelinek 2024-03-04 12:11:15 UTC
Worked around for GCC 13.3+ and 14+.
Comment 34 GCC Commits 2024-06-11 10:36:39 UTC
The releases/gcc-12 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:170c2bba7cb85b3ac9380a7d5a1c6d82b3c6aa63

commit r12-10506-g170c2bba7cb85b3ac9380a7d5a1c6d82b3c6aa63
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Jan 16 11:49:34 2024 +0100

    cfgexpand: Workaround CSE of ADDR_EXPRs in VAR_DECL partitioning [PR113372]
    
    The following patch adds a quick workaround to bugs in VAR_DECL
    partitioning.
    The problem is that there is no dependency between ADDR_EXPRs of local
    decls and CLOBBERs of those vars, so VN can CSE uses of ADDR_EXPRs
    (including ivopts integral variants thereof), which can break
    add_scope_conflicts discovery of what variables are actually used
    in certain region.
    E.g. we can have
      ivtmp.40_3 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B];
    ...
      uses of ivtmp.40_3
    ...
      bitint.6 ={v} {CLOBBER(eos)};
    ...
      ivtmp.28_43 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B];
    ...
      uses of ivtmp.28_43
    before VN (such as dom3), which the add_scope_conflicts code identifies as 2
    independent uses of bitint.6 variable (which is correct), but then VN
    determines ivtmp.28_43 is the same as ivtmp.40_3 and just uses ivtmp.40_3
    even in the second region; at that point add_scope_conflict thinks the
    bitint.6 variable is not used in that region anymore.
    
    The following patch does a simple single def-stmt check for such ADDR_EXPRs
    (rather than say trying to do a full propagation of what SSA_NAMEs can
    contain ADDR_EXPRs of local variables), which seems to workaround all 4 PRs.
    
    In addition to this patch I've used the attached one to gather statistics
    on the total size of all variable partitions in a function and seems besides
    the new testcases nothing is really affected compared to no patch (I've
    actually just modified the patch to == OMP_SCAN instead of == ADDR_EXPR, so
    it looks the same except that it never triggers).  The comparison wasn't
    perfect because I've only gathered BITS_PER_WORD, main_input_filename (did
    some replacement of build directories and /tmp/ccXXXXXX names of LTO to make
    it more similar between the two bootstraps/regtests), current_function_name
    and the total size of all variable partitions if any, because I didn't
    record e.g. the optimization options and so e.g. torture tests which iterate
    over options could have different partition sizes even in one compiler when
    BITS_PER_WORD, main_input_filename and current_function_name are all equal.
    So had to write an awk script to check if the first triple in the second
    build appeared in the first one and the quadruple in the second build
    appeared in the first one too, otherwise print result and that only
    triggered in the new tests.
    Also, the cc1plus binary according to objdump -dr is identical between the
    two builds except for the ADDR_EXPR vs. OMP_SCAN constant in the two spots.
    
    2024-01-16  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/113372
            PR middle-end/90348
            PR middle-end/110115
            PR middle-end/111422
            * cfgexpand.cc (add_scope_conflicts_2): New function.
            (add_scope_conflicts_1): Use it.
    
            * gcc.c-torture/execute/pr90348.c: New test.
            * gcc.c-torture/execute/pr110115.c: New test.
            * gcc.c-torture/execute/pr111422.c: New test.
    
    (cherry picked from commit 1251d3957de04dc9b023a23c09400217e13deadb)
Comment 35 Jakub Jelinek 2024-06-11 10:49:01 UTC
Should be worked around for 12.4+ too.
Comment 36 GCC Commits 2024-06-20 13:21:53 UTC
The releases/gcc-11 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:c845244dd7afae95689b8dd02a62c18932441583

commit r11-11490-gc845244dd7afae95689b8dd02a62c18932441583
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Jan 16 11:49:34 2024 +0100

    cfgexpand: Workaround CSE of ADDR_EXPRs in VAR_DECL partitioning [PR113372]
    
    The following patch adds a quick workaround to bugs in VAR_DECL
    partitioning.
    The problem is that there is no dependency between ADDR_EXPRs of local
    decls and CLOBBERs of those vars, so VN can CSE uses of ADDR_EXPRs
    (including ivopts integral variants thereof), which can break
    add_scope_conflicts discovery of what variables are actually used
    in certain region.
    E.g. we can have
      ivtmp.40_3 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B];
    ...
      uses of ivtmp.40_3
    ...
      bitint.6 ={v} {CLOBBER(eos)};
    ...
      ivtmp.28_43 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B];
    ...
      uses of ivtmp.28_43
    before VN (such as dom3), which the add_scope_conflicts code identifies as 2
    independent uses of bitint.6 variable (which is correct), but then VN
    determines ivtmp.28_43 is the same as ivtmp.40_3 and just uses ivtmp.40_3
    even in the second region; at that point add_scope_conflict thinks the
    bitint.6 variable is not used in that region anymore.
    
    The following patch does a simple single def-stmt check for such ADDR_EXPRs
    (rather than say trying to do a full propagation of what SSA_NAMEs can
    contain ADDR_EXPRs of local variables), which seems to workaround all 4 PRs.
    
    In addition to this patch I've used the attached one to gather statistics
    on the total size of all variable partitions in a function and seems besides
    the new testcases nothing is really affected compared to no patch (I've
    actually just modified the patch to == OMP_SCAN instead of == ADDR_EXPR, so
    it looks the same except that it never triggers).  The comparison wasn't
    perfect because I've only gathered BITS_PER_WORD, main_input_filename (did
    some replacement of build directories and /tmp/ccXXXXXX names of LTO to make
    it more similar between the two bootstraps/regtests), current_function_name
    and the total size of all variable partitions if any, because I didn't
    record e.g. the optimization options and so e.g. torture tests which iterate
    over options could have different partition sizes even in one compiler when
    BITS_PER_WORD, main_input_filename and current_function_name are all equal.
    So had to write an awk script to check if the first triple in the second
    build appeared in the first one and the quadruple in the second build
    appeared in the first one too, otherwise print result and that only
    triggered in the new tests.
    Also, the cc1plus binary according to objdump -dr is identical between the
    two builds except for the ADDR_EXPR vs. OMP_SCAN constant in the two spots.
    
    2024-01-16  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/113372
            PR middle-end/90348
            PR middle-end/110115
            PR middle-end/111422
            * cfgexpand.c (add_scope_conflicts_2): New function.
            (add_scope_conflicts_1): Use it.
    
            * gcc.c-torture/execute/pr90348.c: New test.
            * gcc.c-torture/execute/pr110115.c: New test.
            * gcc.c-torture/execute/pr111422.c: New test.
    
    (cherry picked from commit 1251d3957de04dc9b023a23c09400217e13deadb)
Comment 37 Jakub Jelinek 2024-06-20 13:24:32 UTC
Worked around for 11.5 as well.