Take the following two functions, they should produce the same asm, the second is better on powerpc at least for the inner loop (6 instructions vs 8): void daxpy(int n, float da, float dx[], int incx, float dy[], int incy) { int i,ix=0,iy=0,m,mp1; mp1 = 0; m = 0; for (i = 0;i < n; i++){ dy[iy] = dy[iy] + dx[ix]; ix = ix + incx; iy = iy + incy; } } void daxpy1(int n, float da, float dx[], int incx, float dy[], int incy) { int i,ix=0,iy=0,m,mp1; mp1 = 0; m = 0; for (i = 0;i < n; i++){ *(float*)(((char*)dy)+iy) = *(float*)(((char*)dy)+iy) + *(float*)(((char*)dx)+ix); ix = ix + incx*4; iy = iy + incy*4; } } inner loop for the first one: L4: slwi r2,r9,2 slwi r0,r11,2 lfsx f13,r5,r0 add r11,r11,r6 lfsx f0,r7,r2 add r9,r9,r8 fadds f0,f0,f13 stfsx f0,r7,r2 bdnz L4 the second one: L11: lfsx f0,r7,r0 lfsx f13,r5,r2 add r2,r2,r6 fadds f0,f0,f13 stfsx f0,r7,r0 add r0,r0,r8 bdnz L11 Yes this shows up in real code.
Confirmed.
I get different asm for AMD64 as well: .L4: | .L11: movslq %r10d,%rax movslq %r10d,%rax movslq %r11d,%rdx movslq %r11d,%rdx incl %r9d incl %r9d leaq (%rcx,%rax,4), %rax | leaq (%rsi,%rax), %rax movss (%rbx,%rdx,4), %xmm0 | movss (%rbx,%rdx), %xmm0 addl %esi, %r11d | addl %ecx, %r11d addl %r8d, %r10d addl %r8d, %r10d cmpl %r9d, %edi cmpl %r9d, %edi addss (%rax), %xmm0 addss (%rax), %xmm0 movss %xmm0, (%rax) movss %xmm0, (%rax) jg .L4 | jg .L11
(In reply to comment #2) > I get different asm for AMD64 as well: Since AMD64 and x86 have lea, it is not as impressive on PPC or any other target that does have a+b*c instructions.
This shows up a lot in fortran code like the following (and yes I copied this from some where and reduced it): SUBROUTINE d ( a, b, ndim ) IMPLICIT NONE INTEGER, INTENT(IN) :: ndim REAL,DIMENSION(ndim,ndim) :: a REAL,DIMENSION(ndim,ndim) :: b b(1,:) = a(2,:) END SUBROUTINE d
I should note that XLC does this.
Will be fixed by this patch for PR 19126: http://gcc.gnu.org/ml/gcc-patches/2005-04/msg01959.html
Subject: Bug 19126 CVSROOT: /cvs/gcc Module name: gcc Changes by: rakdver@gcc.gnu.org 2005-05-01 08:08:14 Modified files: gcc : ChangeLog tree-scalar-evolution.c tree-scalar-evolution.h tree-ssa-loop-ivopts.c tree-ssa-loop-manip.c tree-ssa-loop-niter.c tree.c gcc/testsuite : ChangeLog Added files: gcc/testsuite/gcc.dg/tree-ssa: loop-8.c Log message: PR tree-optimization/18316 PR tree-optimization/19126 * tree.c (build_int_cst_type): Avoid shift by size of type. * tree-scalar-evolution.c (simple_iv): Add allow_nonconstant_step argument. * tree-scalar-evolution.h (simple_iv): Declaration changed. * tree-ssa-loop-ivopts.c (struct iv_cand): Add depends_on field. (dump_cand): Dump depends_on information. (determine_biv_step): Add argument to simple_iv call. (contains_abnormal_ssa_name_p): Handle case expr == NULL. (find_bivs, find_givs_in_stmt_scev): Do not require step to be a constant. (add_candidate_1): Record depends_on for candidates. (tree_int_cst_sign_bit, constant_multiple_of): New functions. (get_computation_at, get_computation_cost_at, may_eliminate_iv): Handle ivs with nonconstant step. (iv_ca_set_remove_invariants, iv_ca_set_add_invariants): New functions. (iv_ca_set_no_cp, iv_ca_set_cp): Handle cand->depends_on. (create_new_iv): Unshare the step before passing it to create_iv. (free_loop_data): Free cand->depends_on. (build_addr_strip_iref): New function. (find_interesting_uses_address): Use build_addr_strip_iref. (strip_offset_1): Split the recursive part from strip_offset. Strip constant offset component_refs and array_refs. (strip_offset): Split the recursive part to strip_offset_1. (add_address_candidates): Removed. (add_derived_ivs_candidates): Do not use add_address_candidates. (add_iv_value_candidates): Add candidates with stripped constant offset. Consider all candidates with initial value 0 important. (struct affine_tree_combination): New. (aff_combination_const, aff_combination_elt, aff_combination_scale, aff_combination_add_elt, aff_combination_add, tree_to_aff_combination, add_elt_to_tree, aff_combination_to_tree, fold_affine_sum): New functions. (get_computation_at): Use fold_affine_sum. * tree-ssa-loop-manip.c (create_iv): Handle ivs with nonconstant step. * tree-ssa-loop-niter.c (number_of_iterations_exit): Add argument to simple_iv call. * gcc.dg/tree-ssa/loop-8.c: New test. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.8543&r2=2.8544 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-scalar-evolution.c.diff?cvsroot=gcc&r1=2.21&r2=2.22 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-scalar-evolution.h.diff?cvsroot=gcc&r1=2.3&r2=2.4 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-loop-ivopts.c.diff?cvsroot=gcc&r1=2.65&r2=2.66 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-loop-manip.c.diff?cvsroot=gcc&r1=2.31&r2=2.32 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-loop-niter.c.diff?cvsroot=gcc&r1=2.24&r2=2.25 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree.c.diff?cvsroot=gcc&r1=1.476&r2=1.477 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/ChangeLog.diff?cvsroot=gcc&r1=1.5421&r2=1.5422 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/gcc.dg/tree-ssa/loop-8.c.diff?cvsroot=gcc&r1=NONE&r2=1.1
Fixed. Thanks Zdenek.