GCSE not working The following code: struct foo { unsigned short *p; }; void func (struct foo *s, unsigned int *coord, _Bool delta) { unsigned short change; if (delta) { change = *((s)->p++); *coord += change; } else { change = *((s)->p++); *coord += change; *coord += *((s)->p++) << 8; } } generates when compiled with gcc from CVS on SPARC (with -O2) func: !#PROLOGUE# 0 !#PROLOGUE# 1 andcc %o2, 0xff, %g0 mov %o0, %o5 be .LL2 mov %o1, %o4 ld [%o0], %o0 ld [%o1], %o2 lduh [%o0], %o3 sll %o3, 16, %o1 srl %o1, 16, %o1 add %o2, %o1, %o2 add %o0, 2, %o0 st %o0, [%o5] b .LL1 st %o2, [%o4] .LL2: ld [%o0], %o0 lduh [%o0], %o3 ld [%o1], %o2 add %o0, 2, %o0 lduh [%o0], %o1 add %o2, %o3, %o2 sll %o1, 8, %o1 add %o2, %o1, %o2 add %o0, 2, %o0 st %o2, [%o4] st %o0, [%o5] .LL1: retl nop GCSE should be able to realize that The sequence: change = *((s)->p++); *coord += change; occurs on both branches of the conditional and move it before. Release: CVS Environment: sparc-sun-solaris2.8 How-To-Repeat: Compile the code above with -O2 and look at the assembly, one branch of the conditional should be empty.
Fix: n/a
State-Changed-From-To: open->closed State-Changed-Why: That's not how partial redundancy elimination (PRE) works. The object with PRE is to minimize the number of evaluations of an expression *along a path*. There is already one evaluation along each path, thus PRE considers things optimal. You want global value numbering or something, which we don't implement.
From: Daniel Berlin <dan@dberlin.org> To: rth@gcc.gnu.org, <dann@godzilla.ics.uci.edu>, <gcc-bugs@gcc.gnu.org>, <gcc-prs@gcc.gnu.org>, <nobody@gcc.gnu.org>, <gcc-gnats@gcc.gnu.org> Cc: Subject: Re: optimization/5738: GCSE missed optimization Date: Wed, 3 Apr 2002 07:58:55 -0500 (EST) On 3 Apr 2002 rth@gcc.gnu.org wrote: > Synopsis: GCSE missed optimization > > State-Changed-From-To: open->closed > State-Changed-By: rth > State-Changed-When: Wed Apr 3 02:25:09 2002 > State-Changed-Why: > That's not how partial redundancy elimination (PRE) works. > The object with PRE is to minimize the number of evaluations > of an expression *along a path*. No, the main object of PRE (besides performing GCSE) is to suppress partial redundancies. IE expressions that are available along one or more paths, but missing from some path. It does so by making it fully redundant, copying it to a block (or blocks) such that it reaches all of the paths. It then eliminates the other copies. > There is already one > evaluation along each path, thus PRE considers things > optimal. No it won't. The expression is not in the earliest place possible, and is fully redundant. It *should* copy it to the predecessor, and eliminate the other two copies. > > You want global value numbering or something, which we > don't implement. GVN wouldn't help here, actually. GVN doesn't insert new copies, it only eliminates values that are really still available from some other block. Please don't close this PR, it's correct. --Dan
From: Daniel Berlin <dan@dberlin.org> To: rth@gcc.gnu.org, <dann@godzilla.ics.uci.edu>, <gcc-bugs@gcc.gnu.org>, <gcc-prs@gcc.gnu.org>, <nobody@gcc.gnu.org>, <gcc-gnats@gcc.gnu.org> Cc: Subject: Re: optimization/5738: GCSE missed optimization Date: Wed, 3 Apr 2002 08:26:44 -0500 (EST) On Wed, 3 Apr 2002, Daniel Berlin wrote: > On 3 Apr 2002 rth@gcc.gnu.org wrote: > > > Synopsis: GCSE missed optimization > > > > State-Changed-From-To: open->closed > > State-Changed-By: rth > > State-Changed-When: Wed Apr 3 02:25:09 2002 > > State-Changed-Why: > > That's not how partial redundancy elimination (PRE) works. > > The object with PRE is to minimize the number of evaluations > > of an expression *along a path*. > > No, the main object of PRE (besides performing GCSE) is to suppress > partial redundancies. > IE expressions that are available along one or more paths, but missing from some path. > It does so by making it fully redundant, copying it to a block (or > blocks) such that it reaches all of the paths. It then eliminates the > other copies. See, for instance http://www.cs.rice.edu/~keith/512/Lectures/LCM.pdf, which explains this quite well in the first few pages. > > > > There is already one > > evaluation along each path, thus PRE considers things > > optimal. > > No it won't. > The expression is not in the earliest place possible, and is fully > redundant. > It *should* copy it to the predecessor, and eliminate the other two > copies. And for the record, I verified this by running the code through two other compilers PRE passes. Both remove it. > > > > > You want global value numbering or something, which we > > don't implement. > GVN wouldn't help here, actually. > GVN doesn't insert new copies, it only eliminates values that are really > still available from some other block. I.E. if the code was a; if (b) { a; c; } else { a; d; } GVN would remove the a's inside the if block. Here, we have no value available yet. we have something like if (b) { a; c; } else { a; d; } which is a job for PRE. It should first notice a is locally available in both blocks. Global availability will tell a is available in all the blocks afterwards. Transparency will say a is transparent everywhere. Computing antic will say a is anticipable everywhere (edgewise) because of the transparency. Earliestness will say a could be placed anywhere as well. As will latest. So in this case, it will eliminate both copies, and place a copy in the successor of the if. if (b) { c; } else { d; } a; This is, of course, the placement when neither c, nor d, change or need the value of a. It's still fully redundant in the example Dan gave, it just needs to be placed earlier. > > > Please don't close this PR, it's correct. > --Dan > >
From: Dan Nicolaescu <dann@godzilla.ICS.UCI.EDU> To: Daniel Berlin <dan@dberlin.org> Cc: rth@gcc.gnu.org, gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org, nobody@gcc.gnu.org, gcc-gnats@gcc.gnu.org Subject: Re: optimization/5738: GCSE missed optimization Date: Wed, 03 Apr 2002 10:01:16 -0800 Daniel Berlin <dan@dberlin.org> writes: > On 3 Apr 2002 rth@gcc.gnu.org wrote: > > > Synopsis: GCSE missed optimization > > > > State-Changed-From-To: open->closed > > State-Changed-By: rth > > State-Changed-When: Wed Apr 3 02:25:09 2002 > > State-Changed-Why: > > That's not how partial redundancy elimination (PRE) works. > > The object with PRE is to minimize the number of evaluations > > of an expression *along a path*. > > Please don't close this PR, it's correct. I want to add one more data point to this: the cfg-branch does a little better on this testcase, it finds some of the common code on the 2 branches of the if, but not all of it. struct foo { unsigned short * p;}; unsigned short foo (struct foo *s, unsigned long *coord, _Bool delta) { unsigned short change; if (delta) { change = *((s->p)++); coord += change; return change; } else { change = *((s->p)++); coord += change; coord += *((s)->p++) << 8; return change; } } the generated SPARC assembly for "foo" is: CVS HEAD gcc -O2 cfg-branch gcc -O2 foo: foo: !#PROLOGUE# 0 !#PROLOGUE# 0 !#PROLOGUE# 1 save %sp, -112, %sp andcc %o2, 0xff, %g0 !#PROLOGUE# 1 be .LL2 andcc %i2, 0xff, %g0 mov %o0, %o3 be .LL2 ld [%o0], %o0 mov %i0, %i3 lduh [%o0], %o2 ld [%i0], %i0 add %o0, 2, %o0 b .LL4 sll %o2, 16, %o1 add %i0, 2, %i2 st %o0, [%o3] .LL2: b .LL1 ld [%i0], %i0 srl %o1, 16, %o0 add %i0, 4, %i2 .LL2: .LL4: ld [%o0], %o0 lduh [%i0], %i1 lduh [%o0], %o2 st %i2, [%i3] add %o0, 4, %o1 sll %i1, 16, %i1 st %o1, [%o3] srl %i1, 16, %i0 mov %o2, %o0 ret .LL1: restore retl nop the function "foo" above should be roughly simplified to: unsigned short bar (struct foo *s, unsigned long *coord, _Bool delta) { unsigned short change; change = *((s->p)++); coord += change; if (delta) ; else coord += *((s)->p++) << 8; return change; } bar: !#PROLOGUE# 0 !#PROLOGUE# 1 ld [%o0], %o1 mov %o0, %o3 lduh [%o1], %o0 andcc %o2, 0xff, %g0 add %o1, 2, %o1 sll %o0, 16, %o0 st %o1, [%o3] srl %o0, 16, %o0 bne .LL6 add %o1, 2, %o2 st %o2, [%o3] .LL6: retl nop
State-Changed-From-To: closed->open State-Changed-Why: .
From: law@redhat.com To: Daniel Berlin <dan@dberlin.org> Cc: rth@gcc.gnu.org, dann@godzilla.ics.uci.edu, gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org, nobody@gcc.gnu.org, gcc-gnats@gcc.gnu.org Subject: Re: optimization/5738: GCSE missed optimization Date: Thu, 04 Apr 2002 09:11:30 -0700 In message <Pine.LNX.4.44.0204030807080.6886-100000@dberlin.org>, Daniel Berlin > > No, the main object of PRE (besides performing GCSE) is to suppress > > partial redundancies. > > IE expressions that are available along one or more paths, but missing fro > m some path. > > It does so by making it fully redundant, copying it to a block (or > > blocks) such that it reaches all of the paths. It then eliminates the > > other copies. > > See, for instance http://www.cs.rice.edu/~keith/512/Lectures/LCM.pdf, > which explains this quite well in the first few pages. As does Morgan, Muchnick and the various papers in PLDI. Our implementation is directly derived from Morgan. > we have something like > > if (b) > { > a; > c; > } > else > { > a; > d; > } > > which is a job for PRE. If you have something like this, then that's not a job for PRE/LCM as no path through the CFG has more than one evaluation of "a". What you're looking for is actually tail merging, which would drop "a" down the CFG. If you wanted to move "a" up in the CFG you should look at the code hoisting code, which we enable at -Os. Jeff
State-Changed-From-To: open->analyzed State-Changed-Why: Has already been discussed in great length
was reported against 3.3 when it was 3.2 in the cvs.
still happens on the mainline (20030727).
*** Bug 11820 has been marked as a duplicate of this bug. ***
On the mainline on powerpc-apple-darwin GCC for the given testcase GCC produces something like this f() { if(a) goto b ifacode b: ifnotacode return }
I messed up in saying what it does on PPC: That should read instead: f() { if(a) goto b ifacode return b: ifnotacode return } which is not optimial.
On mainline i686-pc-linux-gnu, I get roughly equivalent code with -O2. Interestingly, I see the commom part merged with -Os. tree-ssa generates basically the same code.
gcse_main() in gcse.c says /* It does not make sense to run code hoisting unless we optimizing for code size -- it rarely makes programs faster, and can make them bigger if we did partial redundancy elimination (when optimizing for space, we use a classic gcse algorithm instead of partial redundancy algorithms). */ if (optimize_size)
It does say that, and I expect cases can be constructed where the comment is true, but it is not completely right. When you replace 2 copies of code with 1 copy you are generally making it smaller. Plus, there is a beneficial interaction with RTL invariant hoisting, as it exposes more invariants when you do hoisting inside a loop.
Fixed in GCC 7 by r7-1935. On the gimple level we get: <bb 3> [local count: 536870913]: _2 = pretmp_31 + 2; s_19(D)->p = _2; goto <bb 5>; [100.00%] <bb 4> [local count: 536870913]: _11 = pretmp_31 + 4; s_19(D)->p = _11; _12 = MEM[(short unsigned int *)pretmp_31 + 2B]; _13 = (int) _12; _23 = _13 << 8; _14 = (unsigned int) _23; _15 = _14 + _28; <bb 5> [local count: 1073741824]: # _6 = PHI <_28(3), _15(4)> *coord_21(D) = _6; The store for s_19(D)->p is not done but I think there might be already another bug about that.