This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] cse: Lookup for both PLUS operands in an address


Hello,

the attached patch fixes a performance regression on S/390 against gcc 4.0.

In the testcase we have an array accessed with an index resulting in the 
following gcc 4.0 pseudo code:

r1 = &x
r2 = r3 + 4
r4 = [r2 + r1]

In find_best_addr there is a code part optimizing addresses like REG + x looking
for a replacement for REG. Afterwards we have:

r1 = &x
(r2 = r3 + 4)         can potentially removed later
r4 = [r3 + r1 + 4]

With gcc 4.1 the address is generated like this:

r1 = &x
r2 = r3 + 4
r4 = [r1 + r2] 

which unfortunately does not trigger that optimization anymore. My guess is that
the order of the operands has changed due to the TARGET_MEM_REF usage in gcc 4.1. 

The attached patch changes find_best_addr to perform the cse lookup for both
operands of the sum what brings us back to the performance of the gcc 4.0 code.


Bootstrapped on i686, s390 and s390x. No testsuite regressions.

OK for mainline?

:ADDPATCH rtl (cse):

Bye,

-Andreas-


2005-09-19  Andreas Krebbel  <krebbel1@de.ibm.com>

	* cse.c (find_best_addr): Perform lookup for both REGs in a
	REG + REG address.


Index: gcc/cse.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cse.c,v
retrieving revision 1.360
diff -p -c -r1.360 cse.c
*** gcc/cse.c	6 Sep 2005 08:52:51 -0000	1.360
--- gcc/cse.c	19 Sep 2005 09:20:53 -0000
*************** find_best_addr (rtx insn, rtx *loc, enum
*** 2954,3036 ****
       have REG+const and the register is another REG+const.  We can often merge
       the constants and eliminate one insn and one register.  It may also be
       that a machine has a cheap REG+REG+const.  Finally, this improves the
!      code on the Alpha for unaligned byte stores.  */
  
    if (flag_expensive_optimizations
        && ARITHMETIC_P (*loc)
        && REG_P (XEXP (*loc, 0)))
      {
!       rtx op1 = XEXP (*loc, 1);
  
        do_not_record = 0;
-       hash = HASH (XEXP (*loc, 0), Pmode);
        do_not_record = save_do_not_record;
        hash_arg_in_memory = save_hash_arg_in_memory;
  
!       elt = lookup (XEXP (*loc, 0), hash, Pmode);
!       if (elt == 0)
  	return;
  
!       /* We need to find the best (under the criteria documented above) entry
! 	 in the class that is valid.  We use the `flag' field to indicate
! 	 choices that were invalid and iterate until we can't find a better
! 	 one that hasn't already been tried.  */
! 
!       for (p = elt->first_same_value; p; p = p->next_same_value)
! 	p->flag = 0;
! 
!       while (found_better)
! 	{
! 	  int best_addr_cost = address_cost (*loc, mode);
! 	  int best_rtx_cost = (COST (*loc) + 1) >> 1;
! 	  struct table_elt *best_elt = elt;
! 	  rtx best_rtx = *loc;
! 	  int count;
! 
! 	  /* This is at worst case an O(n^2) algorithm, so limit our search
! 	     to the first 32 elements on the list.  This avoids trouble
! 	     compiling code with very long basic blocks that can easily
! 	     call simplify_gen_binary so many times that we run out of
! 	     memory.  */
! 
! 	  found_better = 0;
! 	  for (p = elt->first_same_value, count = 0;
! 	       p && count < 32;
! 	       p = p->next_same_value, count++)
! 	    if (! p->flag
! 		&& (REG_P (p->exp)
! 		    || exp_equiv_p (p->exp, p->exp, 1, false)))
! 	      {
! 		rtx new = simplify_gen_binary (GET_CODE (*loc), Pmode,
! 					       p->exp, op1);
! 		int new_cost;
! 		
! 		/* Get the canonical version of the address so we can accept
! 		   more.  */
! 		new = canon_for_address (new);
! 		
! 		new_cost = address_cost (new, mode);
! 
! 		if (new_cost < best_addr_cost
! 		    || (new_cost == best_addr_cost
! 			&& (COST (new) + 1) >> 1 > best_rtx_cost))
  		  {
! 		    found_better = 1;
! 		    best_addr_cost = new_cost;
! 		    best_rtx_cost = (COST (new) + 1) >> 1;
! 		    best_elt = p;
! 		    best_rtx = new;
  		  }
! 	      }
! 
! 	  if (found_better)
! 	    {
! 	      if (validate_change (insn, loc,
! 				   canon_reg (copy_rtx (best_rtx),
! 					      NULL_RTX), 0))
! 		return;
! 	      else
! 		best_elt->flag = 1;
  	    }
  	}
      }
--- 2954,3052 ----
       have REG+const and the register is another REG+const.  We can often merge
       the constants and eliminate one insn and one register.  It may also be
       that a machine has a cheap REG+REG+const.  Finally, this improves the
!      code on the Alpha for unaligned byte stores.
!      If we have an address like REG + REG we try to find replacements for
!      both registers.  */
  
    if (flag_expensive_optimizations
        && ARITHMETIC_P (*loc)
        && REG_P (XEXP (*loc, 0)))
      {
!       rtx ops[2];
!       struct table_elt * elts[2] = {0, 0};
!       int i;
  
        do_not_record = 0;
        do_not_record = save_do_not_record;
        hash_arg_in_memory = save_hash_arg_in_memory;
  
!       ops[0] = XEXP (*loc, 0);
!       ops[1] = XEXP (*loc, 1);
! 
!       elts[0] = lookup (ops[0], HASH (ops[0], Pmode), Pmode);      
!       elts[1] = lookup (ops[1], HASH (ops[1], Pmode), Pmode);
! 
!       if (elts[0] == 0 && elts[1] == 0)
  	return;
  
!       for (i = 0; i < 2; i++)
! 	{
! 	  elt = elts[i];
! 
! 	  if (!elt)
! 	    continue;
! 
! 	  /* We need to find the best (under the criteria documented above) entry
! 	     in the class that is valid.  We use the `flag' field to indicate
! 	     choices that were invalid and iterate until we can't find a better
! 	     one that hasn't already been tried.  */
! 	  
! 	  for (p = elt->first_same_value; p; p = p->next_same_value)
! 	    p->flag = 0;
! 	  
! 	  while (found_better)
! 	    {
! 	      int best_addr_cost = address_cost (*loc, mode);
! 	      int best_rtx_cost = (COST (*loc) + 1) >> 1;
! 	      struct table_elt *best_elt = elt;
! 	      rtx best_rtx = *loc;
! 	      int count;
! 	      
! 	      /* This is at worst case an O(n^2) algorithm, so limit our search
! 		 to the first 32 elements on the list.  This avoids trouble
! 		 compiling code with very long basic blocks that can easily
! 		 call simplify_gen_binary so many times that we run out of
! 		 memory.  */
! 	      
! 	      found_better = 0;
! 	      for (p = elt->first_same_value, count = 0;
! 		   p && count < 32;
! 		   p = p->next_same_value, count++)
! 		if (! p->flag
! 		    && (REG_P (p->exp)
! 			|| exp_equiv_p (p->exp, p->exp, 1, false)))
  		  {
! 		    rtx new = simplify_gen_binary (GET_CODE (*loc), Pmode,
! 						   p->exp, ops[1 - i]);
! 		    int new_cost;
! 		    
! 		    /* Get the canonical version of the address so we can accept
! 		       more.  */
! 		    new = canon_for_address (new);
! 		    
! 		    new_cost = address_cost (new, mode);
! 		    
! 		    if (new_cost < best_addr_cost
! 			|| (new_cost == best_addr_cost
! 			    && (COST (new) + 1) >> 1 > best_rtx_cost))
! 		      {
! 			found_better = 1;
! 			best_addr_cost = new_cost;
! 			best_rtx_cost = (COST (new) + 1) >> 1;
! 			best_elt = p;
! 			best_rtx = new;
! 		      }
  		  }
! 	      
! 	      if (found_better)
! 		{
! 		  if (validate_change (insn, loc,
! 				       canon_reg (copy_rtx (best_rtx),
! 						  NULL_RTX), 0))
! 		    return;
! 		  else
! 		    best_elt->flag = 1;
! 		}
  	    }
  	}
      }


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]