Bug 50346

Summary: Function call foils VRP/jump-threading of redundant predicate on struct member
Product: gcc Reporter: Ryan Johnson <scovich>
Component: tree-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: enhancement CC: dimhen, rguenth, steven
Priority: P3 Keywords: alias, missed-optimization
Version: 4.6.1   
Target Milestone: ---   
Host: x86_64-unknown-linux-gnu Target:
Build: Known to work:
Known to fail: 4.5.2, 4.7.0 Last reconfirmed: 2021-08-10 00:00:00
Bug Depends on:    
Bug Blocks: 19794, 23384    

Description Ryan Johnson 2011-09-10 03:29:49 UTC
When compiling the following code with options `-O3 -DBUG' :

// === bug.cpp =======
struct foo {
    bool b;
    foo() : b(false) { }
    void baz();
};

bool bar();
void baz();

void test() {
    foo f;
    bool b = false;
    if (bar()) b = f.b = true;
#ifndef BUG
    if (f.b != b) __builtin_unreachable();
#endif
    if (f.b) f.baz();
}
// === end ==========

gcc fails to eliminate the second (redundant) if statement:

_Z4testv:
.LFB3:
        subq    $24, %rsp
        movb    $0, 15(%rsp)	<=== assign f.b = 0
        call    _Z3barv		<=== cannot access f.b
        testb   %al, %al
        je      .L2
        movb    $1, 15(%rsp)
.L3:
        leaq    15(%rsp), %rdi
        call    _ZN3foo3bazEv
        addq    $24, %rsp
        ret
.L2:
        cmpb    $0, 15(%rsp)	<=== always compares equal
        jne     .L3
        addq    $24, %rsp
        ret

Compiling with `-O3 -UBUG' gives the expected results:

_Z4testv:
.LFB3:
        subq    $24, %rsp
        movb    $0, 15(%rsp)
        call    _Z3barv
        testb   %al, %al
        je      .L1
        leaq    15(%rsp), %rdi
        movb    $1, 15(%rsp)
        call    _ZN3foo3bazEv
.L1:
        addq    $24, %rsp
        ret

This sort of scenario comes up a lot with RAII-related code, particularly when some code paths clean up the object manually before the destructor runs (obviating the need for the destructor to do it again).

While it should be possible to give hints using __builtin_unreachable(), it's not always easy to tell where to put it, and it may need to be placed multiple times to be effective.
Comment 1 Steven Bosscher 2011-09-10 13:35:50 UTC
Confirmed. Here is the .143t.optimized dump for trunk r178747:

void test() ()
{
  struct foo f;
  bool D.2119;
  bool retval.0;

<bb 2>:
  # .MEM_13 = VDEF <.MEM_8(D)>
  f.b = 0;
  # .MEM_10 = VDEF <.MEM_13>
  retval.0_2 = bar ();
  if (retval.0_2 != 0)
    goto <bb 3>;
  else
    goto <bb 4>;

<bb 3>:
  # .MEM_11 = VDEF <.MEM_10>
  f.b = 1;
  goto <bb 5>;

<bb 4>:
  # VUSE <.MEM_10>
  D.2119_5 = f.b;
  if (D.2119_5 != 0)
    goto <bb 5>;
  else
    goto <bb 6>;

<bb 5>:
Invalid sum of incoming frequencies 5000, should be 3898
  # .MEM_16 = PHI <.MEM_10(4), .MEM_11(3)>
  # .MEM_12 = VDEF <.MEM_16>
  foo::baz (&f);

<bb 6>:
Invalid sum of incoming frequencies 8898, should be 10000
  # .MEM_7 = PHI <.MEM_10(4), .MEM_12(5)>
  # VUSE <.MEM_7>
  return;

}

Note how the call to bar() clobbers .MEM_13 which is f.b.

Alias related => Richi in CC.
Comment 2 Paolo Carlini 2011-10-11 23:50:46 UTC
So, is this a C++ front-end issue? tree-optimization?
Comment 3 Richard Biener 2011-10-12 10:10:15 UTC
Well, it's a tree optimization issue.  It's simple - the local aggregate f
escapes the function via the member function call to baz:

<bb 5>:
  foo::baz (&f);

and as our points-to analysis is not flow-sensitive for memory/calls this
causes f to be clobbered by the call to bar:

<bb 2>:
  f.b = 0;
  # USE = nonlocal null { f }
  # CLB = nonlocal null { f }
  retval.0_2 = bar ();

as neither the bodies of baz nor bar are visible there is nothing we can do
here (short of re-doing points-to analysis flow-sensitive for memory).

f.b is partially redundant, so you see later jump-threading at work optimizing
the path following the f.b = true assignment.
Comment 4 Ryan Johnson 2011-10-12 12:40:25 UTC
(In reply to comment #3)
> Well, it's a tree optimization issue.  It's simple - the local aggregate f
> escapes the function via the member function call to baz:
> 
> <bb 5>:
>   foo::baz (&f);
> 
> and as our points-to analysis is not flow-sensitive for memory/calls this
> causes f to be clobbered by the call to bar

Is flow-sensitive analysis within single functions prohibitively expensive? All the papers I can find talk about whole-program analysis, where it's very expensive in both time and space; the best I could find (CGO'11 best paper) gets it down to 20-30ms and 2-3MB per kLoC for up to ~300kLoC. 

>
> as neither the bodies of baz nor bar are visible there is nothing we can do

Would knowing the body of bar() help if the latter cannot be inlined?
Comment 5 rguenther@suse.de 2011-10-12 12:44:15 UTC
On Wed, 12 Oct 2011, scovich at gmail dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346
> 
> --- Comment #4 from Ryan Johnson <scovich at gmail dot com> 2011-10-12 12:40:25 UTC ---
> (In reply to comment #3)
> > Well, it's a tree optimization issue.  It's simple - the local aggregate f
> > escapes the function via the member function call to baz:
> > 
> > <bb 5>:
> >   foo::baz (&f);
> > 
> > and as our points-to analysis is not flow-sensitive for memory/calls this
> > causes f to be clobbered by the call to bar
> 
> Is flow-sensitive analysis within single functions prohibitively expensive? All
> the papers I can find talk about whole-program analysis, where it's very
> expensive in both time and space; the best I could find (CGO'11 best paper)
> gets it down to 20-30ms and 2-3MB per kLoC for up to ~300kLoC. 

It would need a complete rewrite, it isn't integratable into the current
solver (which happens to be shared between IPA and non-IPA modes).

> > as neither the bodies of baz nor bar are visible there is nothing we can do
> 
> Would knowing the body of bar() help if the latter cannot be inlined?

Not at present, but it's possible to improve mod-ref analysis on an
IPA level then.

Richard.
Comment 6 Ryan Johnson 2012-03-07 13:31:19 UTC
(In reply to comment #5)
> On Wed, 12 Oct 2011, scovich at gmail dot com wrote:
> 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346
> > 
> > --- Comment #4 from Ryan Johnson <scovich at gmail dot com> 2011-10-12 12:40:25 UTC ---
> > (In reply to comment #3)
> > > Well, it's a tree optimization issue.  It's simple - the local aggregate f
> > > escapes the function via the member function call to baz:
> > > 
> > > <bb 5>:
> > >   foo::baz (&f);
> > > 
> > > and as our points-to analysis is not flow-sensitive for memory/calls this
> > > causes f to be clobbered by the call to bar
> > 
> > Is flow-sensitive analysis within single functions prohibitively expensive? All
> > the papers I can find talk about whole-program analysis, where it's very
> > expensive in both time and space; the best I could find (CGO'11 best paper)
> > gets it down to 20-30ms and 2-3MB per kLoC for up to ~300kLoC. 
> 
> It would need a complete rewrite, it isn't integratable into the current
> solver (which happens to be shared between IPA and non-IPA modes).
That makes sense...

Wild idea: would it be possible to annotate references as "escaped" or "not escaped yet" ? Anything global or passed into the function would be marked as escaped, while anything allocated locally would start out as not escaped; assigning to an escaped location or passing to a function would mark it as escaped if it wasn't already. The status could be determined in linear time using local information only (= scalable), and would benefit strongly as inlining (IPA or not) eliminates escape points.

Alternatively (or maybe it's really the same thing?), I could imagine an SSA "operation" which "moves" the non-escaped variable into an escaped one (which happens to live at the same address) just before it escapes? That might give the same effect with no changes to the current flow-insensitive algorithm, as long as the optimizer knew how to adjust things to account for inlining.
Comment 7 rguenther@suse.de 2012-03-07 13:39:19 UTC
On Wed, 7 Mar 2012, scovich at gmail dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346
> 
> --- Comment #6 from Ryan Johnson <scovich at gmail dot com> 2012-03-07 13:31:19 UTC ---
> (In reply to comment #5)
> > On Wed, 12 Oct 2011, scovich at gmail dot com wrote:
> > 
> > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346
> > > 
> > > --- Comment #4 from Ryan Johnson <scovich at gmail dot com> 2011-10-12 12:40:25 UTC ---
> > > (In reply to comment #3)
> > > > Well, it's a tree optimization issue.  It's simple - the local aggregate f
> > > > escapes the function via the member function call to baz:
> > > > 
> > > > <bb 5>:
> > > >   foo::baz (&f);
> > > > 
> > > > and as our points-to analysis is not flow-sensitive for memory/calls this
> > > > causes f to be clobbered by the call to bar
> > > 
> > > Is flow-sensitive analysis within single functions prohibitively expensive? All
> > > the papers I can find talk about whole-program analysis, where it's very
> > > expensive in both time and space; the best I could find (CGO'11 best paper)
> > > gets it down to 20-30ms and 2-3MB per kLoC for up to ~300kLoC. 
> > 
> > It would need a complete rewrite, it isn't integratable into the current
> > solver (which happens to be shared between IPA and non-IPA modes).
> That makes sense...
> 
> Wild idea: would it be possible to annotate references as "escaped" or "not
> escaped yet" ? Anything global or passed into the function would be marked as
> escaped, while anything allocated locally would start out as not escaped;
> assigning to an escaped location or passing to a function would mark it as
> escaped if it wasn't already. The status could be determined in linear time
> using local information only (= scalable), and would benefit strongly as
> inlining (IPA or not) eliminates escape points.

Well, you can compute the clobber/use sets of individual function calls,
IPA PTA computes a simple mod-ref analysis this way.  You can also
annotate functions whether they make arguments escape or whether it
reads from them or clobbers them.

The plan is to do some simple analysis and propagate that up the
callgraph, similar to pure-const analysis.  The escape part could
be integrated there.

Richard.
Comment 8 Ryan Johnson 2012-03-07 14:28:29 UTC
(In reply to comment #7)
> On Wed, 7 Mar 2012, scovich at gmail dot com wrote:
> 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346
> > 
> > --- Comment #6 from Ryan Johnson <scovich at gmail dot com> 2012-03-07 13:31:19 UTC ---
> > (In reply to comment #5)
> > > On Wed, 12 Oct 2011, scovich at gmail dot com wrote:
> > > 
> > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346
> > > > 
> > > > --- Comment #4 from Ryan Johnson <scovich at gmail dot com> 2011-10-12 12:40:25 UTC ---
> > > > (In reply to comment #3)
> > > > > Well, it's a tree optimization issue.  It's simple - the local aggregate f
> > > > > escapes the function via the member function call to baz:
> > > > > 
> > > > > <bb 5>:
> > > > >   foo::baz (&f);
> > > > > 
> > > > > and as our points-to analysis is not flow-sensitive for memory/calls this
> > > > > causes f to be clobbered by the call to bar
> > > > 
> > > > Is flow-sensitive analysis within single functions prohibitively expensive? All
> > > > the papers I can find talk about whole-program analysis, where it's very
> > > > expensive in both time and space; the best I could find (CGO'11 best paper)
> > > > gets it down to 20-30ms and 2-3MB per kLoC for up to ~300kLoC. 
> > > 
> > > It would need a complete rewrite, it isn't integratable into the current
> > > solver (which happens to be shared between IPA and non-IPA modes).
> > That makes sense...
> > 
> > Wild idea: would it be possible to annotate references as "escaped" or "not
> > escaped yet" ? Anything global or passed into the function would be marked as
> > escaped, while anything allocated locally would start out as not escaped;
> > assigning to an escaped location or passing to a function would mark it as
> > escaped if it wasn't already. The status could be determined in linear time
> > using local information only (= scalable), and would benefit strongly as
> > inlining (IPA or not) eliminates escape points.
> 
> Well, you can compute the clobber/use sets of individual function calls,
> IPA PTA computes a simple mod-ref analysis this way.  You can also
> annotate functions whether they make arguments escape or whether it
> reads from them or clobbers them.
> 
> The plan is to do some simple analysis and propagate that up the
> callgraph, similar to pure-const analysis.  The escape part could
> be integrated there.

That sounds really slick to have in general... but would it actually catch the test case above? What you describe seems to depend on test() having information about foo::baz() -- which it does not -- while analyzing the body of test() could at least identify the part of f's lifetime where it cannot possibly have escaped.

Or does the local analysis come "for free" once those IPA changes are in place?
Comment 9 rguenther@suse.de 2012-03-12 08:56:40 UTC
On Wed, 7 Mar 2012, scovich at gmail dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346
> 
> --- Comment #8 from Ryan Johnson <scovich at gmail dot com> 2012-03-07 14:28:29 UTC ---
> (In reply to comment #7)
> > On Wed, 7 Mar 2012, scovich at gmail dot com wrote:
> > 
> > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346
> > > 
> > > --- Comment #6 from Ryan Johnson <scovich at gmail dot com> 2012-03-07 13:31:19 UTC ---
> > > (In reply to comment #5)
> > > > On Wed, 12 Oct 2011, scovich at gmail dot com wrote:
> > > > 
> > > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50346
> > > > > 
> > > > > --- Comment #4 from Ryan Johnson <scovich at gmail dot com> 2011-10-12 12:40:25 UTC ---
> > > > > (In reply to comment #3)
> > > > > > Well, it's a tree optimization issue.  It's simple - the local aggregate f
> > > > > > escapes the function via the member function call to baz:
> > > > > > 
> > > > > > <bb 5>:
> > > > > >   foo::baz (&f);
> > > > > > 
> > > > > > and as our points-to analysis is not flow-sensitive for memory/calls this
> > > > > > causes f to be clobbered by the call to bar
> > > > > 
> > > > > Is flow-sensitive analysis within single functions prohibitively expensive? All
> > > > > the papers I can find talk about whole-program analysis, where it's very
> > > > > expensive in both time and space; the best I could find (CGO'11 best paper)
> > > > > gets it down to 20-30ms and 2-3MB per kLoC for up to ~300kLoC. 
> > > > 
> > > > It would need a complete rewrite, it isn't integratable into the current
> > > > solver (which happens to be shared between IPA and non-IPA modes).
> > > That makes sense...
> > > 
> > > Wild idea: would it be possible to annotate references as "escaped" or "not
> > > escaped yet" ? Anything global or passed into the function would be marked as
> > > escaped, while anything allocated locally would start out as not escaped;
> > > assigning to an escaped location or passing to a function would mark it as
> > > escaped if it wasn't already. The status could be determined in linear time
> > > using local information only (= scalable), and would benefit strongly as
> > > inlining (IPA or not) eliminates escape points.
> > 
> > Well, you can compute the clobber/use sets of individual function calls,
> > IPA PTA computes a simple mod-ref analysis this way.  You can also
> > annotate functions whether they make arguments escape or whether it
> > reads from them or clobbers them.
> > 
> > The plan is to do some simple analysis and propagate that up the
> > callgraph, similar to pure-const analysis.  The escape part could
> > be integrated there.
> 
> That sounds really slick to have in general... but would it actually catch the
> test case above? What you describe seems to depend on test() having information
> about foo::baz() -- which it does not -- while analyzing the body of test()
> could at least identify the part of f's lifetime where it cannot possibly have
> escaped.
> 
> Or does the local analysis come "for free" once those IPA changes are in place?

No, the local analysis is what makes the IPA changes "free" ;)  Of course
the local analysis would need to be flow sensitive.

Richard.
Comment 10 Andrew Pinski 2021-08-11 04:31:03 UTC
This is basically PR 23384 really.