Description: A non-optimal code sequence is illustrated. There exists an opportunity to reuse a value loaded in a loop and avoid reloading it on a subsequent iteration of the loop. Duplicate using gcc 3.5 and command line: gcc -O3 -m64 -c test.c Testcase: typedef struct { unsigned int e; } str; char *q; void foo (char *p) { while (1) { q = p - ((str *)p)->e; if (((str *)q)->e) break; p = q; } } Assembly: On entry to the loop body, the first "lwz 0,0(9)" is reloading the value already loaded into gpr0 by the peeled iteration of the loop. On subsequent iterations of the loop, the value has already been loaded by the second lwz on the previous iteration. Thus, the first lwz is unnecessary. .foo: lwz 0,0(3) ld 11,.LC0@toc(2) subf 3,0,3 std 3,0(11) lwz 0,0(3) cmpwi 7,0,0 bnelr- 7 mr 9,3 .L4: lwz 0,0(9) <-- Unnecessary, value is already in gpr 0. subf 9,0,9 std 9,0(11) lwz 0,0(9) cmpwi 7,0,0 beq+ 7,.L4 blr
Same thing happens on Alpha, so this is not a target bug.
hmm, this seems like a case where -fmodulo-sched should catch but does not.
Fixed on the mainline (note this is powerp64-darwin but should represent powerpc64-linux closely): _foo: lwz r0,0(r3) .align32 4,0x60000000 L2: subf r9,r0,r3 lwz r0,0(r9) mr r3,r9 cmpdi cr7,r0,0 beq cr7,L2 lis r2,ha16(L_q$non_lazy_ptr) ld r2,lo16(L_q$non_lazy_ptr)(r2) std r9,0(r2) blr .comm _q,8