The ivopts pass fails to set the REG_POINTER attribute on a pseudo that is equal to two pseudos, one of which has teh REG_POINTER attribute and the other is just a n offset from it. This leads us to not sort the operands on an indexed load/store instruction causing performance problems on POWER6. The test case used to recreate the problem is: linux> cat > bug.c <<EOF unsigned int *quadrant; unsigned int fullGtU (int i1, int n) { unsigned int s1 = 0; for (; n >= 0; n--) { s1 = quadrant[i1]; i1 += n; } return s1; } EOF linux> /path/to/gcc -O1 -S -da bug.c Resulting assembly: fullGtU: li 0,0 cmpwi 0,4,0 blt 0,.L3 lis 9,quadrant@ha lwz 11,quadrant@l(9) addi 0,4,1 mtctr 0 addi 0,4,-1 cmpwi 7,0,-1 bge+ 7,.L4 li 0,1 mtctr 0 .L4: slwi 9,3,2 lwzx 0,9,11 <- bad operand ordering add 3,3,4 addi 4,4,-1 bdnz .L4 .L3: mr 3,0 blr Bad RTL in bug.c.127r.expand: (insn 21 20 22 bug.c:8 (set (reg/f:SI 129) (plus:SI (reg:SI 128) (reg:SI 121 [ quadrant.0 ]))) -1 (nil)) (insn 22 21 0 bug.c:8 (set (reg/v:SI 120 [ s1 ]) (mem:SI (reg/f:SI 129) [0 S4 A32])) -1 (nil)) Eventually, pseudo 129 )which has the REG_POINTER attribute) gets exchanged with the (plus:SI (reg:SI 128) (reg:SI 121 [ quadrant.0 ])) and pseudo 121 doesn't have the attribute, so we end up failing to swap them.
;; quadrant.0 = quadrant (insn 16 15 17 t.c:8 (set (reg:SI 127) (high:SI (symbol_ref:SI ("quadrant") [flags 0x84] <var_decl 0xf7d9f070 quadrant>))) -1 (nil)) (insn 17 16 18 t.c:8 (set (reg/f:SI 126) (lo_sum:SI (reg:SI 127) (symbol_ref:SI ("quadrant") [flags 0x84] <var_decl 0xf7d9f070 quadrant>))) -1 (expr_list:REG_EQUAL (symbol_ref:SI ("quadrant") [flags 0x84] <var_decl 0xf7d9f070 quadrant>) (nil))) (insn 18 17 0 t.c:8 (set (reg:SI 121 [ quadrant.0 ]) (mem/f/c/i:SI (reg/f:SI 126) [2 quadrant+0 S4 A32])) -1 (nil)) The problem here is that DECL_ARTIFICIAL causes REG_POINTER not to be set. See: http://gcc.gnu.org/ml/gcc-patches/2004-06/msg00020.html Reverting that patch was the next step on my pointer plus work (after fixing up the regressions that still need fixing).
.L4: slwi 0,3,2 lwzx 0,9,0 add 3,3,4 addi 4,4,-1 bdnz .L4 Much better correct?
Created attachment 14008 [details] Patch which needs full testing still Hi, this reverts Jeff's patch in both stmt.c and cfgexpand.c (I don't know how much code of stmt.c's function is still used though). This patch has not been bootstrapped and tested yet and after it gets done, it still needs an extra test on hppa-hpux as that is what Jeff's patch was trying to fix.
Andrew's patch from Comment #3 bootstrapped and regtested with no regressions on powerpc64-linux running the testsuite in both 32-bit and 64-bit modes. I can also confirm that it fixes the test case I posted above. Pat is running SPEC2000 with the patch now.
This patch gives us an additional 2-3% overall on CPU2000, running on Power6. Facerec was the big winner with about 40% improvement, there were a few improvements in the 5-10% range and a few degradations in the 1-2% range that could be attributed to other issues like loop alignment or possibly an issue with r0 getting assigned to the base reg by renaming which I'm currently testing a patch for.
This patch helps more than just PR 28690 as we now have the ablity to use r0 more for the index which causes us to use one less callee saved register in some cases and reduces the stack size. I saw that result while compiling tramp3d.
Subject: Bug 32940 Author: pinskia Date: Mon Aug 20 00:48:09 2007 New Revision: 127634 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=127634 Log: 2007-08-19 Andrew Pinski <andrew_pinski@playstation.sony.com> PR middle-end/32940 * cfgexpand.c (expand_one_register_var): Mark pointer DECL_ARTIFICIAL as REG_POINTER also. * stmt.c (expand_decl): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/cfgexpand.c trunk/gcc/stmt.c
Fixed.