Bug 5738 - Missed code hoisting opportunity
Summary: Missed code hoisting opportunity
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 3.3
: P3 enhancement
Target Milestone: 7.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
: 11820 (view as bug list)
Depends on: 15559
Blocks: 16996 23286
  Show dependency treegraph
 
Reported: 2002-02-20 13:56 UTC by Dan Nicolaescu
Modified: 2023-12-28 07:18 UTC (History)
7 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2019-03-06 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dan Nicolaescu 2002-02-20 13:56:01 UTC
GCSE not working

The following code: 


struct foo 
{
  unsigned short *p;
};


void
func (struct foo *s, unsigned int *coord, _Bool delta)
{
  unsigned short change;

  if (delta) {
    change = *((s)->p++);
    *coord += change;
  } else {
    change = *((s)->p++);
    *coord += change;
    *coord += *((s)->p++) << 8;
  }
}

generates when compiled with gcc from CVS on SPARC (with -O2)

func:
        !#PROLOGUE# 0
        !#PROLOGUE# 1
        andcc   %o2, 0xff, %g0
        mov     %o0, %o5
        be      .LL2
        mov     %o1, %o4
        ld      [%o0], %o0
        ld      [%o1], %o2
        lduh    [%o0], %o3
        sll     %o3, 16, %o1
        srl     %o1, 16, %o1
        add     %o2, %o1, %o2
        add     %o0, 2, %o0
        st      %o0, [%o5]
        b       .LL1
        st      %o2, [%o4]
.LL2:
        ld      [%o0], %o0
        lduh    [%o0], %o3
        ld      [%o1], %o2
        add     %o0, 2, %o0
        lduh    [%o0], %o1
        add     %o2, %o3, %o2
        sll     %o1, 8, %o1
        add     %o2, %o1, %o2
        add     %o0, 2, %o0
        st      %o2, [%o4]
        st      %o0, [%o5]
.LL1:
        retl
        nop

GCSE should be able to realize that
The sequence:
    change = *((s)->p++);
    *coord += change;

occurs on both branches of the conditional and move it before.

Release:
CVS

Environment:
sparc-sun-solaris2.8

How-To-Repeat:
Compile the code above with -O2 and look at the assembly,
one branch of the conditional should be empty.
Comment 1 Dan Nicolaescu 2002-02-20 13:56:01 UTC
Fix:
n/a
Comment 2 Richard Henderson 2002-04-03 02:25:09 UTC
State-Changed-From-To: open->closed
State-Changed-Why: That's not how partial redundancy elimination (PRE) works.
    The object with PRE is to minimize the number of evaluations
    of an expression *along a path*.  There is already one
    evaluation along each path, thus PRE considers things
    optimal.
    
    You want global value numbering or something, which we 
    don't implement.
Comment 3 Daniel Berlin 2002-04-03 07:58:55 UTC
From: Daniel Berlin <dan@dberlin.org>
To: rth@gcc.gnu.org, <dann@godzilla.ics.uci.edu>, <gcc-bugs@gcc.gnu.org>,
   <gcc-prs@gcc.gnu.org>, <nobody@gcc.gnu.org>, <gcc-gnats@gcc.gnu.org>
Cc:  
Subject: Re: optimization/5738: GCSE missed optimization
Date: Wed, 3 Apr 2002 07:58:55 -0500 (EST)

 On 3 Apr 2002 rth@gcc.gnu.org wrote:
 
 > Synopsis: GCSE missed optimization
 > 
 > State-Changed-From-To: open->closed
 > State-Changed-By: rth
 > State-Changed-When: Wed Apr  3 02:25:09 2002
 > State-Changed-Why:
 >     That's not how partial redundancy elimination (PRE) works.
 >     The object with PRE is to minimize the number of evaluations
 >     of an expression *along a path*. 
 
 No, the main object of PRE (besides performing GCSE) is to suppress 
 partial redundancies. 
 IE expressions that are available along one or more paths, but missing from some path.
 It does so by making it fully redundant, copying it to a block (or 
 blocks) such that it reaches all of the paths.  It then eliminates the 
 other copies.
 
 
 >  There is already one
 >     evaluation along each path, thus PRE considers things
 >     optimal.
 
 No it won't.
 The expression is not in the earliest place possible, and is fully 
 redundant.
 It *should* copy it to the predecessor, and eliminate the other two 
 copies.
 
 >     
 >     You want global value numbering or something, which we 
 >     don't implement.
 GVN wouldn't help here, actually.
 GVN doesn't insert new copies, it only eliminates values that are really 
 still available from some other block.
 
 
 Please don't close this PR, it's correct.
 --Dan
 

Comment 4 Daniel Berlin 2002-04-03 08:26:44 UTC
From: Daniel Berlin <dan@dberlin.org>
To: rth@gcc.gnu.org, <dann@godzilla.ics.uci.edu>, <gcc-bugs@gcc.gnu.org>,
   <gcc-prs@gcc.gnu.org>, <nobody@gcc.gnu.org>, <gcc-gnats@gcc.gnu.org>
Cc:  
Subject: Re: optimization/5738: GCSE missed optimization
Date: Wed, 3 Apr 2002 08:26:44 -0500 (EST)

 On Wed, 3 Apr 2002, Daniel Berlin wrote:
 
 > On 3 Apr 2002 rth@gcc.gnu.org wrote:
 > 
 > > Synopsis: GCSE missed optimization
 > > 
 > > State-Changed-From-To: open->closed
 > > State-Changed-By: rth
 > > State-Changed-When: Wed Apr  3 02:25:09 2002
 > > State-Changed-Why:
 > >     That's not how partial redundancy elimination (PRE) works.
 > >     The object with PRE is to minimize the number of evaluations
 > >     of an expression *along a path*. 
 > 
 > No, the main object of PRE (besides performing GCSE) is to suppress 
 > partial redundancies. 
 > IE expressions that are available along one or more paths, but missing from some path.
 > It does so by making it fully redundant, copying it to a block (or 
 > blocks) such that it reaches all of the paths.  It then eliminates the 
 > other copies.
 
 See, for instance http://www.cs.rice.edu/~keith/512/Lectures/LCM.pdf, 
 which explains this quite well in the first few pages.
  > 
 > 
 > >  There is already one
 > >     evaluation along each path, thus PRE considers things
 > >     optimal.
 > 
 > No it won't.
 > The expression is not in the earliest place possible, and is fully 
 > redundant.
 > It *should* copy it to the predecessor, and eliminate the other two 
 > copies.
 
 And for the record, I verified this by running the code through two other 
 compilers PRE passes.
 Both remove it.
 
 > 
 > >     
 > >     You want global value numbering or something, which we 
 > >     don't implement.
 > GVN wouldn't help here, actually.
 > GVN doesn't insert new copies, it only eliminates values that are really 
 > still available from some other block.
 
 I.E. if the code was
 
 a;
 if (b)
 {
 	a;
 	c;
 }
 else
 {
 	a;
 	d;
 }
 
 GVN would remove the a's inside the if block.
 
 Here, we have no value available yet.
 
 we have something like
 
 if (b)
 {
 	a;
 	c;
 }
 else
 {
 	a;
 	d;
 }
 
 which is a job for PRE.
 
 It should first notice a is locally available in both blocks.
 Global availability will tell a is available in all the blocks 
 afterwards.
 Transparency will say a is transparent everywhere.
 
 Computing antic will say a is anticipable everywhere (edgewise) 
 because of the transparency.
 
 Earliestness will say a could be placed anywhere as well.
 As will latest.
 
 So in this case,  it will eliminate both copies, and place a copy in the 
 successor of the if.
 
 if (b)
 {
 	c;
 }
 else
 {
 	d;
 }
 a;
 
 
 This is, of course, the placement when neither c, nor d, change or need 
 the value of a.
 
 It's still fully redundant in the example Dan gave, it just needs to be 
 placed earlier.
 
 > 
 > 
 > Please don't close this PR, it's correct.
 > --Dan
 > 
 > 
 

Comment 5 Dan Nicolaescu 2002-04-03 10:01:16 UTC
From: Dan Nicolaescu <dann@godzilla.ICS.UCI.EDU>
To: Daniel Berlin <dan@dberlin.org>
Cc: rth@gcc.gnu.org, gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org, 
    nobody@gcc.gnu.org, gcc-gnats@gcc.gnu.org
Subject: Re: optimization/5738: GCSE missed optimization
Date: Wed, 03 Apr 2002 10:01:16 -0800

 Daniel Berlin <dan@dberlin.org> writes:
 
   > On 3 Apr 2002 rth@gcc.gnu.org wrote:
   > 
   > > Synopsis: GCSE missed optimization
   > > 
   > > State-Changed-From-To: open->closed
   > > State-Changed-By: rth
   > > State-Changed-When: Wed Apr  3 02:25:09 2002
   > > State-Changed-Why:
   > >     That's not how partial redundancy elimination (PRE) works.
   > >     The object with PRE is to minimize the number of evaluations
   > >     of an expression *along a path*. 
   > 
   > Please don't close this PR, it's correct.
 
 I want to add one more data point to this: the cfg-branch does a
 little better on this testcase, it finds some of the common code on
 the 2 branches of the if, but not all of it. 
 
 struct foo { unsigned short * p;};
 
 unsigned short
 foo (struct foo *s, unsigned long *coord, _Bool delta)
 {
   unsigned short change;
   if (delta) {
     change = *((s->p)++);
     coord +=  change;
     return change;
   } else {
     change = *((s->p)++);
     coord +=  change; 
     coord += *((s)->p++) <<  8;
     return change;
   }
 }
 
 the generated SPARC assembly for "foo" is: 
 
 CVS HEAD gcc -O2                   cfg-branch gcc -O2
 foo:                               foo:
         !#PROLOGUE# 0                      !#PROLOGUE# 0
         !#PROLOGUE# 1                      save    %sp, -112, %sp  
         andcc   %o2, 0xff, %g0             !#PROLOGUE# 1
         be      .LL2                       andcc   %i2, 0xff, %g0
         mov     %o0, %o3                   be      .LL2
         ld      [%o0], %o0                 mov     %i0, %i3
         lduh    [%o0], %o2                 ld      [%i0], %i0
         add     %o0, 2, %o0                b       .LL4
         sll     %o2, 16, %o1               add     %i0, 2, %i2
         st      %o0, [%o3]         .LL2:
         b       .LL1                       ld      [%i0], %i0
         srl     %o1, 16, %o0               add     %i0, 4, %i2
 .LL2:                              .LL4:
         ld      [%o0], %o0                 lduh    [%i0], %i1
         lduh    [%o0], %o2                 st      %i2, [%i3]
         add     %o0, 4, %o1                sll     %i1, 16, %i1
         st      %o1, [%o3]                 srl     %i1, 16, %i0
         mov     %o2, %o0                   ret
 .LL1:                                      restore
         retl                       
         nop                        
 
 
 the function "foo" above should be roughly simplified to:
 
 unsigned short
 bar (struct foo *s, unsigned long *coord, _Bool delta)
 {
   unsigned short change;
   change = *((s->p)++);
   coord +=  change;
   if (delta)
     ;
   else 
     coord += *((s)->p++) <<  8;
   return change;
 }
 
 bar:
         !#PROLOGUE# 0
         !#PROLOGUE# 1
         ld      [%o0], %o1
         mov     %o0, %o3
         lduh    [%o1], %o0
         andcc   %o2, 0xff, %g0
         add     %o1, 2, %o1
         sll     %o0, 16, %o0
         st      %o1, [%o3]
         srl     %o0, 16, %o0
         bne     .LL6
         add     %o1, 2, %o2
         st      %o2, [%o3]
 .LL6:
         retl
         nop
 
Comment 6 Richard Henderson 2002-04-03 10:55:56 UTC
State-Changed-From-To: closed->open
State-Changed-Why: .
Comment 7 Jeffrey A. Law 2002-04-04 09:11:30 UTC
From: law@redhat.com
To: Daniel Berlin <dan@dberlin.org>
Cc: rth@gcc.gnu.org, dann@godzilla.ics.uci.edu, gcc-bugs@gcc.gnu.org,
   gcc-prs@gcc.gnu.org, nobody@gcc.gnu.org, gcc-gnats@gcc.gnu.org
Subject: Re: optimization/5738: GCSE missed optimization 
Date: Thu, 04 Apr 2002 09:11:30 -0700

  In message <Pine.LNX.4.44.0204030807080.6886-100000@dberlin.org>, Daniel 
 Berlin
  > > No, the main object of PRE (besides performing GCSE) is to suppress 
  > > partial redundancies. 
  > > IE expressions that are available along one or more paths, but missing fro
  > m some path.
  > > It does so by making it fully redundant, copying it to a block (or 
  > > blocks) such that it reaches all of the paths.  It then eliminates the 
  > > other copies.
  > 
  > See, for instance http://www.cs.rice.edu/~keith/512/Lectures/LCM.pdf, 
  > which explains this quite well in the first few pages.
 As does Morgan, Muchnick and the various papers in PLDI.  Our implementation
 is directly derived from Morgan.
 
  > we have something like
  > 
  > if (b)
  > {
  > 	a;
  > 	c;
  > }
  > else
  > {
  > 	a;
  > 	d;
  > }
  > 
  > which is a job for PRE.
 If you have something like this, then that's not a job for PRE/LCM as no
 path through the CFG has more than one evaluation of "a".
 
 What you're looking for is actually tail merging, which would drop "a"
 down the CFG.  If you wanted to move "a" up in the CFG you should look 
 at the code hoisting code, which we enable at -Os.
 
 Jeff
 
 
Comment 8 Wolfgang Bangerth 2003-03-15 04:46:36 UTC
State-Changed-From-To: open->analyzed
State-Changed-Why: Has already been discussed in great length
Comment 9 Andrew Pinski 2003-06-18 16:44:06 UTC
was reported against 3.3 when it was 3.2 in the cvs.
Comment 10 Andrew Pinski 2003-07-28 00:25:47 UTC
still happens on the mainline (20030727).
Comment 11 Andrew Pinski 2003-08-06 12:16:57 UTC
*** Bug 11820 has been marked as a duplicate of this bug. ***
Comment 12 Andrew Pinski 2003-10-21 01:19:36 UTC
On the mainline on powerpc-apple-darwin GCC for the given testcase GCC produces something 
like this
f()
{
if(a) goto b
ifacode
b:
ifnotacode
return
} 
Comment 13 Andrew Pinski 2004-01-04 07:08:00 UTC
I messed up in saying what it does on PPC:
That should read instead:
f()
{
if(a) goto b
ifacode
return
b:
ifnotacode
return
} 
which is not optimial.
Comment 14 Kazu Hirata 2004-01-04 07:22:47 UTC
On mainline i686-pc-linux-gnu, I get roughly equivalent code with -O2.
Interestingly, I see the commom part merged with -Os.

tree-ssa generates basically the same code.
Comment 15 Kazu Hirata 2004-01-04 07:30:47 UTC
gcse_main() in gcse.c says

      /* It does not make sense to run code hoisting unless we optimizing
	 for code size -- it rarely makes programs faster, and can make
	 them bigger if we did partial redundancy elimination (when optimizing
	 for space, we use a classic gcse algorithm instead of partial
	 redundancy algorithms).  */
      if (optimize_size)
Comment 16 Dale Johannesen 2004-11-04 23:31:13 UTC
It does say that, and I expect cases can be constructed where the comment is true, but it is not
completely right.  When you replace 2 copies of code with 1 copy you are generally making it
smaller.  Plus, there is a beneficial interaction with RTL invariant hoisting, as it exposes more
invariants when you do hoisting inside a loop.

Comment 17 Andrew Pinski 2021-08-21 21:55:33 UTC
Fixed in GCC 7 by r7-1935.

On the gimple level we get:
  <bb 3> [local count: 536870913]:
  _2 = pretmp_31 + 2;
  s_19(D)->p = _2;
  goto <bb 5>; [100.00%]

  <bb 4> [local count: 536870913]:
  _11 = pretmp_31 + 4;
  s_19(D)->p = _11;
  _12 = MEM[(short unsigned int *)pretmp_31 + 2B];
  _13 = (int) _12;
  _23 = _13 << 8;
  _14 = (unsigned int) _23;
  _15 = _14 + _28;

  <bb 5> [local count: 1073741824]:
  # _6 = PHI <_28(3), _15(4)>
  *coord_21(D) = _6;

The store for s_19(D)->p is not done but I think there might be already another bug about that.