19126 – Missed IV optimization (redundant instruction in loop)

Bug 19126 - Missed IV optimization (redundant instruction in loop)

Summary: Missed IV optimization (redundant instruction in loop)

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.0.0

Importance:	P2 enhancement
Target Milestone:	4.1.0
Assignee:	Zdenek Dvorak

URL:
Keywords:	missed-optimization, patch

Depends on:
Blocks:

Reported:	2004-12-22 05:08 UTC by Andrew Pinski
Modified:	2005-05-01 13:46 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:	powerpc-darwin
Build:
Known to work:
Known to fail:
Last reconfirmed:	2005-04-09 20:45:06

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andrew Pinski 2004-12-22 05:08:28 UTC

Take the following two functions, they should produce the same asm, the second is better on powerpc 
at least for the inner loop (6 instructions vs 8):
void daxpy(int n, float  da, float  dx[], int incx, float  dy[], int incy)
{
  int i,ix=0,iy=0,m,mp1;
  
  mp1 = 0;
  m = 0;
  for (i = 0;i < n; i++){
    dy[iy] = dy[iy] + dx[ix];
    ix = ix + incx;
    iy = iy + incy;
  }
}
void daxpy1(int n, float  da, float  dx[], int incx, float  dy[], int incy)
{
  int i,ix=0,iy=0,m,mp1;
  mp1 = 0;
  m = 0;
  for (i = 0;i < n; i++){
    *(float*)(((char*)dy)+iy) = *(float*)(((char*)dy)+iy) + *(float*)(((char*)dx)+ix);
    ix = ix + incx*4;
    iy = iy + incy*4;
  }
}

inner loop for the first one:
L4:
	slwi r2,r9,2
	slwi r0,r11,2
	lfsx f13,r5,r0
	add r11,r11,r6
	lfsx f0,r7,r2
	add r9,r9,r8
	fadds f0,f0,f13
	stfsx f0,r7,r2
	bdnz L4

the second one:
L11:
	lfsx f0,r7,r0
	lfsx f13,r5,r2
	add r2,r2,r6
	fadds f0,f0,f13
	stfsx f0,r7,r0
	add r0,r0,r8
	bdnz L11

Yes this shows up in real code.

Comment 1 Andrew Pinski 2004-12-28 23:54:33 UTC

Confirmed.

Comment 2 Steven Bosscher 2005-01-23 16:03:44 UTC

I get different asm for AMD64 as well: 
 
.L4:                                  | .L11: 
        movslq  %r10d,%rax                      movslq  %r10d,%rax 
        movslq  %r11d,%rdx                      movslq  %r11d,%rdx 
        incl    %r9d                            incl    %r9d 
        leaq    (%rcx,%rax,4), %rax   |         leaq    (%rsi,%rax), %rax 
        movss   (%rbx,%rdx,4), %xmm0  |         movss   (%rbx,%rdx), %xmm0 
        addl    %esi, %r11d           |         addl    %ecx, %r11d 
        addl    %r8d, %r10d                     addl    %r8d, %r10d 
        cmpl    %r9d, %edi                      cmpl    %r9d, %edi 
        addss   (%rax), %xmm0                   addss   (%rax), %xmm0 
        movss   %xmm0, (%rax)                   movss   %xmm0, (%rax) 
        jg      .L4                   |         jg      .L11

Comment 3 Andrew Pinski 2005-03-01 18:09:10 UTC

(In reply to comment #2)
> I get different asm for AMD64 as well: 
Since AMD64 and x86 have lea, it is not as impressive on PPC or any other target that does have a+b*c 
instructions.

Comment 4 Andrew Pinski 2005-04-09 20:45:06 UTC

This shows up a lot in fortran code like the following (and yes I copied this from some where and 
reduced it):
SUBROUTINE d ( a, b, ndim )
IMPLICIT NONE
INTEGER, INTENT(IN) :: ndim
REAL,DIMENSION(ndim,ndim) :: a
REAL,DIMENSION(ndim,ndim) :: b
b(1,:) = a(2,:)
END SUBROUTINE d

Comment 5 Andrew Pinski 2005-04-10 23:20:47 UTC

I should note that XLC does this.

Comment 6 Giovanni Bajo 2005-04-19 09:59:57 UTC

Will be fixed by this patch for PR 19126:
http://gcc.gnu.org/ml/gcc-patches/2005-04/msg01959.html

Comment 7 GCC Commits 2005-05-01 08:08:24 UTC

Subject: Bug 19126

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	rakdver@gcc.gnu.org	2005-05-01 08:08:14

Modified files:
	gcc            : ChangeLog tree-scalar-evolution.c 
	                 tree-scalar-evolution.h tree-ssa-loop-ivopts.c 
	                 tree-ssa-loop-manip.c tree-ssa-loop-niter.c 
	                 tree.c 
	gcc/testsuite  : ChangeLog 
Added files:
	gcc/testsuite/gcc.dg/tree-ssa: loop-8.c 

Log message:
	PR tree-optimization/18316
	PR tree-optimization/19126
	* tree.c (build_int_cst_type): Avoid shift by size of type.
	* tree-scalar-evolution.c (simple_iv): Add allow_nonconstant_step
	argument.
	* tree-scalar-evolution.h (simple_iv): Declaration changed.
	* tree-ssa-loop-ivopts.c (struct iv_cand): Add depends_on
	field.
	(dump_cand): Dump depends_on information.
	(determine_biv_step): Add argument to simple_iv call.
	(contains_abnormal_ssa_name_p): Handle case expr == NULL.
	(find_bivs, find_givs_in_stmt_scev): Do not require step to be a
	constant.
	(add_candidate_1): Record depends_on for candidates.
	(tree_int_cst_sign_bit, constant_multiple_of): New functions.
	(get_computation_at, get_computation_cost_at, may_eliminate_iv):
	Handle ivs with nonconstant step.
	(iv_ca_set_remove_invariants, iv_ca_set_add_invariants): New functions.
	(iv_ca_set_no_cp, iv_ca_set_cp): Handle cand->depends_on.
	(create_new_iv): Unshare the step before passing it to create_iv.
	(free_loop_data): Free cand->depends_on.
	(build_addr_strip_iref): New function.
	(find_interesting_uses_address): Use build_addr_strip_iref.
	(strip_offset_1): Split the recursive part from strip_offset.
	Strip constant offset component_refs and array_refs.
	(strip_offset): Split the recursive part to strip_offset_1.
	(add_address_candidates): Removed.
	(add_derived_ivs_candidates): Do not use add_address_candidates.
	(add_iv_value_candidates): Add candidates with stripped constant
	offset.  Consider all candidates with initial value 0 important.
	(struct affine_tree_combination): New.
	(aff_combination_const, aff_combination_elt, aff_combination_scale,
	aff_combination_add_elt, aff_combination_add,
	tree_to_aff_combination, add_elt_to_tree, aff_combination_to_tree,
	fold_affine_sum): New functions.
	(get_computation_at): Use fold_affine_sum.
	* tree-ssa-loop-manip.c (create_iv): Handle ivs with nonconstant step.
	* tree-ssa-loop-niter.c (number_of_iterations_exit): Add argument
	to simple_iv call.
	
	* gcc.dg/tree-ssa/loop-8.c: New test.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.8543&r2=2.8544
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-scalar-evolution.c.diff?cvsroot=gcc&r1=2.21&r2=2.22
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-scalar-evolution.h.diff?cvsroot=gcc&r1=2.3&r2=2.4
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-loop-ivopts.c.diff?cvsroot=gcc&r1=2.65&r2=2.66
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-loop-manip.c.diff?cvsroot=gcc&r1=2.31&r2=2.32
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-loop-niter.c.diff?cvsroot=gcc&r1=2.24&r2=2.25
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree.c.diff?cvsroot=gcc&r1=1.476&r2=1.477
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/ChangeLog.diff?cvsroot=gcc&r1=1.5421&r2=1.5422
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/gcc.dg/tree-ssa/loop-8.c.diff?cvsroot=gcc&r1=NONE&r2=1.1

Comment 8 Andrew Pinski 2005-05-01 13:46:45 UTC

Fixed.  Thanks Zdenek.