We should produce the same code in the two loops unsigned short *q; #define NOSB 10 int last; void h1() { int i; for (i=0;i<last+NOSB;i++) { q[i] = 0; } } ----- unsigned short q[100]; #define NOSB 10 int last; void h1() { int i; for (i=0;i<last+NOSB;i++) { q[i] = 0; } } For the array one we produce: L4: sth r0,0(r2) addi r2,r2,2 bdnz L4 which is optimial but for the pointer one we produce: L4: slwi r2,r9,1 addi r9,r9,1 sthx r0,r2,r11 bdnz L4 which sucks as we have tree instructions in the loop instead of two. I think this is a place where IV-OPTS is failing.
Note with powerpc64 we get much worse than array case which stays the same: L4: sldi r2,r9,1 addi r0,r9,1 sthx r10,r2,r11 extsw r9,r0 bdnz L4
Rewritting the code this way, shows how we ahould be optimizing the code: void h() { int i; unsigned short *q1 = q; for (i=0;i<last+NOSB;i++) { *q1 = 0; q1++; } }
It is not what I thought it was. The array case is optimized at the tree level, the pointer case is optimized at the rtl level.
The problem seems to be that licm does not move load of q from the loop. Ivopts then do not recognize q + 2*i as induction variable, and thus they are optimizing it not like an address of memory reference, but just like a computation of 2*i, for which it obviously does not make sense to perform strength reduction.
Patch here which should fix not pulling the load of q out of the loop: http://gcc.gnu.org/ml/gcc-patches/2004-11/msg00957.html Then the only thing left is for IV-OPTS to be fixed.
Subject: Re: Code for arrays and pointers are not the same > Patch here which should fix not pulling the load of q out of the loop: > http://gcc.gnu.org/ml/gcc-patches/2004-11/msg00957.html > > Then the only thing left is for IV-OPTS to be fixed. no -- once this is done, ivopts work just fine, producing the same code as for array access.
huh a compiler built with that patch gives: L4: slwi r2,r9,1 addi r9,r9,1 sthx r0,r2,r11 bdnz L4 Also pulling the load manually out loop also produce the same asm as I just produced: unsigned short *q; #define NOSB 10 int last; void h1() { int i; unsigned short *q1 = q; for (i=0;i<last+NOSB;i++) { q1[i] = 0; } }
Subject: Re: Code for arrays and pointers are not the same > > Patch here which should fix not pulling the load of q out of the loop: > > http://gcc.gnu.org/ml/gcc-patches/2004-11/msg00957.html > > > > Then the only thing left is for IV-OPTS to be fixed. > > no -- once this is done, ivopts work just fine, producing the same code > as for array access. btw. the patch above might be wrong -- I think ssa names also pass the is_gimple_addressable test: is_gimple_addressable -> is_gimple_id -> is_gimple_variable -> SSA_NAME. So the correct patch would be Index: tree-ssa-loop-im.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-im.c,v retrieving revision 2.21 diff -c -3 -p -r2.21 tree-ssa-loop-im.c *** tree-ssa-loop-im.c 8 Nov 2004 13:54:39 -0000 2.21 --- tree-ssa-loop-im.c 12 Nov 2004 11:29:40 -0000 *************** stmt_cost (tree stmt) *** 367,373 **** /* Hoisting memory references out should almost surely be a win. */ if (!is_gimple_variable (lhs)) cost += 20; ! if (is_gimple_addressable (rhs) && !is_gimple_variable (rhs)) cost += 20; switch (TREE_CODE (rhs)) --- 367,373 ---- /* Hoisting memory references out should almost surely be a win. */ if (!is_gimple_variable (lhs)) cost += 20; ! if (is_gimple_addressable (rhs) && TREE_CODE (rhs) != SSA_NAME) cost += 20; switch (TREE_CODE (rhs))
Subject: Re: Code for arrays and pointers are not the same > huh a compiler built with that patch gives: > L4: > slwi r2,r9,1 > addi r9,r9,1 > sthx r0,r2,r11 > bdnz L4 > > Also pulling the load manually out loop also produce the same asm as I just produced: > unsigned short *q; > #define NOSB 10 > int last; > void h1() > { > int i; > unsigned short *q1 = q; > for (i=0;i<last+NOSB;i++) > { > q1[i] = 0; > } > } not for me: .L4: sth 0,0(9) addi 9,9,2 bdnz .L4 what architecture/flags are you using? Zdenek
powerpc-darwin just -O3 hmm, must be a local modification which changes it.
(In reply to comment #8) > > is_gimple_addressable -> is_gimple_id -> is_gimple_variable -> SSA_NAME. > > So the correct patch would be I did post that patch also before it was rejected as I did not look into the aliasing problem we had.
Subject: Re: Code for arrays and pointers are not the same > powerpc-darwin > > just -O3 > > hmm, must be a local modification which changes it. maybe you are checking 64 bit? That still looks terrible.
The local patch which I had in which caused this was a not so correct for PR 18293 (which we remove an extra copy RTL as we expand it so it looks like the cost analysis is doing something wrong which is why we don't do the IV-OPT when at the RTL either).
Subject: Bug 18431 CVSROOT: /cvs/gcc Module name: gcc Changes by: rakdver@gcc.gnu.org 2004-11-14 18:04:26 Modified files: gcc : ChangeLog tree-flow.h tree-ssa-loop-im.c tree-ssa.c Log message: PR tree-optimization/18431 * tree-flow.h (stmt_references_memory_p): Declare. * tree-ssa-loop-im.c (stmt_cost): Use stmt_references_memory_p. * tree-ssa.c (stmt_references_memory_p): New function. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6332&r2=2.6333 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-flow.h.diff?cvsroot=gcc&r1=2.62&r2=2.63 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-loop-im.c.diff?cvsroot=gcc&r1=2.21&r2=2.22 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa.c.diff?cvsroot=gcc&r1=2.57&r2=2.58
Subject: Bug 18431 CVSROOT: /cvs/gcc Module name: gcc Changes by: rakdver@gcc.gnu.org 2004-11-15 00:18:37 Modified files: gcc : ChangeLog fold-const.c tree-ssa-loop-niter.c tree.c tree.h gcc/testsuite : ChangeLog Added files: gcc/testsuite/gcc.c-torture/execute: 20041114-1.c Log message: PR tree-optimization/18431 * fold-const.c (associate_trees): Do not produce x + 0. (fold_widened_comparison, fold_sign_changed_comparison): New functions. (fold): Use them. * tree-ssa-loop-niter.c (upper_bound_in_type, lower_bound_in_type): Moved ... * tree.c (upper_bound_in_type, lower_bound_in_type): Here. * tree.h (upper_bound_in_type, lower_bound_in_type): Declare. * testsuite/gcc.c-torture/execute/20041114-1.c: New test. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6340&r2=2.6341 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/fold-const.c.diff?cvsroot=gcc&r1=1.473&r2=1.474 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-loop-niter.c.diff?cvsroot=gcc&r1=2.15&r2=2.16 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree.c.diff?cvsroot=gcc&r1=1.447&r2=1.448 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree.h.diff?cvsroot=gcc&r1=1.654&r2=1.655 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/ChangeLog.diff?cvsroot=gcc&r1=1.4602&r2=1.4603 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/gcc.c-torture/execute/20041114-1.c.diff?cvsroot=gcc&r1=NONE&r2=1.1
Fixed.
Does this need to be reopened? I see the following for mainline. -m32: .L4: slwi %r9,%r11,1 #, tmp130, i addi %r11,%r11,1 # i, i, sthx %r0,%r9,%r10 #* q.1, tmp132 bdnz .L4 # -m64: .L4: sthx %r0,%r11,%r10 #* ivtmp.11, tmp132 addi %r10,%r10,2 # ivtmp.11, ivtmp.11, bdnz .L4 # Both can be improved to use a non-indexed store and increment of that base reg, similar to what Zdenek said he saw in comment #9.