Bug 16799 - PowerPC - load reuse opportunity
Summary: PowerPC - load reuse opportunity
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.0.0
: P1 enhancement
Target Milestone: 4.0.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2004-07-28 17:00 UTC by Pete Steinmetz
Modified: 2004-10-28 04:33 UTC (History)
2 users (show)

See Also:
Host: powerpc64-linux
Target: powerpc64-linux
Build: powerpc64-linux
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pete Steinmetz 2004-07-28 17:00:38 UTC
Description:
A non-optimal code sequence is illustrated.  There exists an opportunity to reuse a value loaded in a loop and avoid reloading it on a subsequent iteration of the loop.  Duplicate using gcc 3.5 and command line:

gcc -O3 -m64 -c test.c

Testcase:
typedef struct {
    unsigned int e;
} str;
char *q;

void foo (char *p) {

  while (1) {
    q = p - ((str *)p)->e;
    if (((str *)q)->e) break;
    p = q;
  }

}

Assembly:
 On entry to the loop body, the first "lwz 0,0(9)" is reloading the value already loaded into gpr0 by the peeled iteration of the loop.  On subsequent iterations of the loop, the value has already been loaded by the second lwz on the previous iteration.  Thus, the first lwz is unnecessary.

.foo:
	lwz 0,0(3)
	ld 11,.LC0@toc(2)
	subf 3,0,3
	std 3,0(11)
	lwz 0,0(3)
	cmpwi 7,0,0
	bnelr- 7
	mr 9,3
.L4:
	lwz 0,0(9)  <-- Unnecessary, value is already in gpr 0.
	subf 9,0,9
	std 9,0(11)
	lwz 0,0(9)
	cmpwi 7,0,0
	beq+ 7,.L4
	blr

Comment 1 Falk Hueffner 2004-07-28 18:37:01 UTC
Same thing happens on Alpha, so this is not a target bug.
Comment 2 Andrew Pinski 2004-07-29 04:58:21 UTC
hmm, this seems like a case where -fmodulo-sched should catch but does not.
Comment 3 Andrew Pinski 2004-10-28 04:27:59 UTC
Fixed on the mainline (note this is powerp64-darwin but should represent  powerpc64-linux closely):
_foo:
        lwz r0,0(r3)
        .align32 4,0x60000000
L2:
        subf r9,r0,r3
        lwz r0,0(r9)
        mr r3,r9
        cmpdi cr7,r0,0
        beq cr7,L2
        lis r2,ha16(L_q$non_lazy_ptr)
        ld r2,lo16(L_q$non_lazy_ptr)(r2)
        std r9,0(r2)
        blr
.comm _q,8