This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Problems with the way we calculate costs


[ Port maintainers, please read the whole thing. ]

This is a summary of a discussion that has so far taken place on a
Redhat local mailing list.  It started out as an attempt to fix loop
optimizer problems for an sh-elf target.  The patch below has had
quite a bit of review, but I have been asked by Jeff not to check it
in yet to give more people a chance to comment on it.

There is one recurring problem with the loop optimizer: we often strength
reduce several givs with the same mult_val, leading to obviously stupid
code (recognizable by the sequence of near-identical add statements near
the end of a loop) with high register pressure.  Once code of this type
has passed through register allocation and reload, all hope for performance
is usually lost unless the machine has a very large number of registers.

While investigating the causes, I noticed several problems.  First of all,
the cost calculations in loop.c are dubious in many cases:

1. We compute add_cost globally as the cost of adding a register to itself.
   On the sh, this is less than the cost of adding a constant to a register;
   this leads to artificially high benefits.  It also does not take into
   account that different constants may have different costs.  I've tried to
   solve this by introducing a new function iv_add_mult_cost.  A simpler but
   less accurate fix would be to change the initialization of add_cost to
   compute the cost of adding an arbitrary constant to a register.
2. In simplify_giv_expr, benefits are increased in a somewhat dubious
   fashion.  This increment leads to inflated benefits in several cases
   I've looked at and causes us to strength reduce although it isn't
   beneficial.
3. We try to guess whether increments of reduced givs will be handled by
   autoinc addressing modes.  As in all cases, increasing the benefit of
   givs can lead to high register pressure; I've found that disabling this
   code in cases where the giv would otherwise have no benefit seems to
   improve code quality.  It seems like the right thing to do given that
   this is only a guess, and it can even make it harder for the following
   passes to create autoinc opportunities - I've seen cases where that
   happens.
4. There are also cases where there is too little strength reduction:
   Earlier passes occasionally create situations where there are several
   blocks that jump back to the loop start, and each of these blocks
   contains a biv increment.  These biv increments, although only one is
   executed for every loop iteration, all count against reducing a giv.  The
   criterion we use reflects differences in code size (more or less)
   accurately, but is not suited for estimates of execution speed in such
   cases.  I have no fix for this at this moment.  The structure of loop.c
   makes detecting such cases hard.
5. It turns out that on the PA, the loop optimizer believes a simple move
   insn has a cost of 4, while an addition has a cost of 2.

The last point is interesting, and it led to some further discoveries.  The
way we implement RTX_COSTS is highly inconsistent between ports.  At a first
glance, the whole mechanism of computing rtx costs seems totally bogus.  For
example,
  COSTS_N_INSNS (2) == 3 * COSTS_N_INSNS (1).

It does make sense in a weird way if you take into account all the other
pieces.  RTX_COSTS is supposed to call rtx_cost recursively for the
operands of the expression that it has been given.  Since pseudo registers
are assigned an arbitrary value of 1, it turns out that
  COSTS_N_INSNS (2) + 2 * 1 == 2 * (COSTS_N_INSNS (1) + 2 * 1)

This means that if a port implements the RTX_COSTS macro with the recursive
calls, it will usually return the right values.  (Needless to say, these
calls are missing in the PA port).  Still, the whole thing is rather
counter-intuitive, and in some situations it's not necessarily clear that a
recursive call of rtx_costs is a good or even correct way to express costs
of a certain operation.  The sh port also didn't get this quite right in
some cases.

It appears that the value of 1 for pseudos was chosen so that cse will
prefer expressions that contain fewer pseudos, this is apparently an
attempt to decrease register pressure.  In the discussion we had, most
people seemed to think that this is misguided, rtx_cost should only
return the true costs of an operation and not try to guess how a given
piece of rtl will affect register pressure - most likely these guesses
will be wrong once rtx_costs is used in other passes such as loop.

The patch below tries to address all these problems (except #4 above).
The cost calculations are changed as follows: the cost of all registers
is 0 and COSTS_N_INSNS is defined as

#define COSTS_N_INSNS(N) (2 * (N))

This happens to make both the PA and other ports do the right thing wrt.
relative costs of operations.

For the benefit of cse, there is a new function to compute the number of
registers in a given expression; this is used as a separate factor when
determining relative costs between expressions.

Port maintainers may need to update their RTX_COSTS macros after this
patch.  For example, on the i386 we were defining the cost of a
multiplication by a power of 2 to be the same as cost as the equivalent
shift.  With the current code it works for some reason, but after this
patch it will cause cse to not replace a multiplication with a cheaper
operation.  Other ports may have similar problems.  Another potential
source of problems are hardcoded constants; everything (including
CONST_COSTS) should be using the COSTS_N_INSNS macro.  Recursive calls
of rtx_costs are still necessary to handle some cases; e.g. if you have
a PLUS of a register and a large constant that can't be added directly,
then the cost of the constant must be taken into account (essentially
returning the cost of two operations: loading a constant and adding
two registers.

Jeff Law wrote:
>      > !   copy_cost = 2;
>      >   
>      >     /* Free the objects we just allocated.  */
>      >     obfree (free_point);
>    You might as well use COSTS_N_INSNS (1) here instead of hardcoding it to
>    "2".  That helps make at least one of the points behind this patch more
>    clear -- to make costs computed by cse & loop more consistent.

Any preference as to which header file should define COSTS_N_INSNS (it's in
cse.c at the moment)?

For the record, the patch has just survived a bootstrap on i586-linux, and
the testsuite results look pretty good (except C++ which seems to have
unrelated problems).


Bernd
 
	* Makefile.in (cse.o): Depend on $(BASIC_BLOCK_H).
	* cse.c: Include "basic-block.h".
	(struct table_elt): New field REGCOST.
	(CHEAP_REG): Delete macro.
	(COST): Return 0 for REGs.
	(approx_reg_cost_1, approx_reg_cost, preferrable): New functions.
	(notreg_cost): Return 0 for appropriate SUBREGs.
	(COSTS_N_INSNS): Return N * 2.
	(rtx_cost): Return 0 for REGs, and use cost of nested rtx for cheap
	SUBREGs.
	(CHEAPER): Use new function preferrable.
	(insert): Initialize REGCOST member.
	(find_best_addr): Use approx_reg_cost for estimation of register
	usage.
	(cse_insn): Likewise.
	* loop.c (iv_add_mult_cost): New function.
	(add_cost, shift_cost, mult_cost): Delete variables.
	(init_loop): Don't initialize add_cost; reduce copy_cost by half.
	(strength_reduce): Use iv_add_mult_cost instead of fixed add_cost.
	Make code that detects autoinc opportunities slightly less optimistic.
	(simplify_giv_expr): If expression contains other reg that is also a
	giv, only increment benefit if this is the only use of that reg.
	(consec_sets_giv): Take that change into account.
	(combine_givs): Slightly more verbose output.

	* i386.h (RTX_COSTS): For MULT, return true cost of multiplication,
	not the cost of an equivalent shift.
	* sh-protos.h (addsubcosts): Declare.
	* sh.c (addsubcosts): New function.
	* sh.h (CONST_COSTS): If CONST_OK_FOR_I, then return 0.
	(RTX_COSTS): Tweak.  Use addsubcosts.
	(ADDRESS_COST): Return higher cost for reg+reg addressing.

Index: Makefile.in
===================================================================
RCS file: /cvs/gcc/egcs/gcc/Makefile.in,v
retrieving revision 1.496
diff -c -p -r1.496 Makefile.in
*** Makefile.in	2000/08/22 16:16:20	1.496
--- Makefile.in	2000/08/23 17:45:21
*************** simplify-rtx.o : simplify-rtx.c $(CONFIG
*** 1319,1325 ****
     hard-reg-set.h flags.h real.h insn-config.h $(RECOG_H) $(EXPR_H) toplev.h \
     output.h function.h cselib.h ggc.h $(OBSTACK_H)
  cse.o : cse.c $(CONFIG_H) system.h $(RTL_H) $(REGS_H) hard-reg-set.h flags.h \
!    real.h insn-config.h $(RECOG_H) $(EXPR_H) toplev.h output.h function.h $(GGC_H)
  gcse.o : gcse.c $(CONFIG_H) system.h $(RTL_H) $(REGS_H) hard-reg-set.h \
     flags.h real.h insn-config.h $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) \
     function.h output.h toplev.h
--- 1319,1326 ----
     hard-reg-set.h flags.h real.h insn-config.h $(RECOG_H) $(EXPR_H) toplev.h \
     output.h function.h cselib.h ggc.h $(OBSTACK_H)
  cse.o : cse.c $(CONFIG_H) system.h $(RTL_H) $(REGS_H) hard-reg-set.h flags.h \
!    real.h insn-config.h $(RECOG_H) $(EXPR_H) toplev.h output.h function.h \
!    $(BASIC_BLOCK_H) $(GGC_H)
  gcse.o : gcse.c $(CONFIG_H) system.h $(RTL_H) $(REGS_H) hard-reg-set.h \
     flags.h real.h insn-config.h $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) \
     function.h output.h toplev.h
Index: cse.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/cse.c,v
retrieving revision 1.152
diff -c -p -r1.152 cse.c
*** cse.c	2000/08/23 07:59:06	1.152
--- cse.c	2000/08/23 17:45:31
*************** Boston, MA 02111-1307, USA.  */
*** 28,33 ****
--- 28,34 ----
  #include "tm_p.h"
  #include "regs.h"
  #include "hard-reg-set.h"
+ #include "basic-block.h"
  #include "flags.h"
  #include "real.h"
  #include "insn-config.h"
*************** static int hash_arg_in_memory;
*** 434,439 ****
--- 435,442 ----
     chain is not useful.
  
     The `cost' field stores the cost of this element's expression.
+    The `regcost' field stores the value returned by approx_reg_cost for
+    this element's expression.
  
     The `is_const' flag is set if the element is a constant (including
     a fixed address).
*************** struct table_elt
*** 456,461 ****
--- 459,465 ----
    struct table_elt *first_same_value;
    struct table_elt *related_value;
    int cost;
+   int regcost;
    enum machine_mode mode;
    char in_memory;
    char is_const;
*************** struct table_elt
*** 477,483 ****
    ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (X)))	\
    : canon_hash (X, M)) & HASH_MASK)
  
! /* Determine whether register number N is considered a fixed register for CSE.
     It is desirable to replace other regs with fixed regs, to reduce need for
     non-fixed hard regs.
     A reg wins if it is either the frame pointer or designated as fixed.  */
--- 481,488 ----
    ? (((unsigned) REG << 7) + (unsigned) REG_QTY (REGNO (X)))	\
    : canon_hash (X, M)) & HASH_MASK)
  
! /* Determine whether register number N is considered a fixed register for the
!    purpose of approximating register costs.
     It is desirable to replace other regs with fixed regs, to reduce need for
     non-fixed hard regs.
     A reg wins if it is either the frame pointer or designated as fixed.  */
*************** struct table_elt
*** 497,516 ****
     || ((N) < FIRST_PSEUDO_REGISTER					\
         && FIXED_REGNO_P (N) && REGNO_REG_CLASS (N) != NO_REGS))
  
! /* A register is cheap if it is a user variable assigned to the register
!    or if its register number always corresponds to a cheap register.  */
  
- #define CHEAP_REG(N) \
-   ((REG_USERVAR_P (N) && REGNO (N) < FIRST_PSEUDO_REGISTER)	\
-    || CHEAP_REGNO (REGNO (N)))
- 
- #define COST(X)								\
-   (GET_CODE (X) == REG							\
-    ? (CHEAP_REG (X) ? 0							\
-       : REGNO (X) >= FIRST_PSEUDO_REGISTER ? 1				\
-       : 2)								\
-    : notreg_cost(X))
- 
  /* Get the info associated with register N.  */
  
  #define GET_CSE_REG_INFO(N) 			\
--- 502,509 ----
     || ((N) < FIRST_PSEUDO_REGISTER					\
         && FIXED_REGNO_P (N) && REGNO_REG_CLASS (N) != NO_REGS))
  
! #define COST(X) (GET_CODE (X) == REG ? 0 : notreg_cost (X))
  
  /* Get the info associated with register N.  */
  
  #define GET_CSE_REG_INFO(N) 			\
*************** struct cse_basic_block_data
*** 644,649 ****
--- 637,645 ----
     || GET_CODE (X) == ADDRESSOF)
  
  static int notreg_cost		PARAMS ((rtx));
+ static int approx_reg_cost_1	PARAMS ((rtx *, void *));
+ static int approx_reg_cost	PARAMS ((rtx));
+ static int preferrable		PARAMS ((int, int, int, int));
  static void new_basic_block	PARAMS ((void));
  static void make_new_qty	PARAMS ((unsigned int, enum machine_mode));
  static void make_regs_eqv	PARAMS ((unsigned int, unsigned int));
*************** dump_class (classp)
*** 716,721 ****
--- 712,773 ----
      }
  }
  
+ /* Subroutine of approx_reg_cost; called through for_each_rtx.  */
+ static int
+ approx_reg_cost_1 (xp, data)
+      rtx *xp;
+      void *data;
+ {
+   rtx x = *xp;
+   regset set = (regset) data;
+ 
+   if (x && GET_CODE (x) == REG)
+     SET_REGNO_REG_SET (set, REGNO (x));
+   return 0;
+ }
+ 
+ /* Return an estimate of the cost of the registers used in an rtx.
+    This is mostly the number of different REG expressions in the rtx;
+    however for some excecptions like fixed registers we use a cost of
+    0.  */
+ 
+ static int
+ approx_reg_cost (x)
+      rtx x;
+ {
+   regset_head set;
+   int i;
+   int cost = 0;
+ 
+   INIT_REG_SET (&set);
+   for_each_rtx (&x, approx_reg_cost_1, (void *)&set);
+ 
+   EXECUTE_IF_SET_IN_REG_SET
+     (&set, 0, i,
+      {
+        if (! CHEAP_REGNO (i))
+ 	 cost++;
+      });
+ 
+   CLEAR_REG_SET (&set);
+   return cost;
+ }
+ 
+ /* Return a negative value if an rtx A, whose costs are given by COST_A
+    and REGCOST_A, is more desirable than an rtx B.
+    Return a positive value if A is less desirable, or 0 if the two are
+    equally good.  */
+ static int
+ preferrable (cost_a, regcost_a, cost_b, regcost_b)
+      int cost_a, regcost_a, cost_b, regcost_b;
+ {
+   if (cost_a != cost_b)
+     return cost_a - cost_b;
+   if (regcost_a != regcost_b)
+     return regcost_a - regcost_b;
+   return 0;
+ }
+ 
  /* Internal function, to compute cost when X is not a register; called
     from COST macro to keep it simple.  */
  
*************** notreg_cost (x)
*** 732,740 ****
  	   && subreg_lowpart_p (x)
  	   && TRULY_NOOP_TRUNCATION (GET_MODE_BITSIZE (GET_MODE (x)),
  				     GET_MODE_BITSIZE (GET_MODE (SUBREG_REG (x)))))
! 	  ? (CHEAP_REG (SUBREG_REG (x)) ? 0
! 	     : (REGNO (SUBREG_REG (x)) >= FIRST_PSEUDO_REGISTER ? 1
! 		: 2))
  	  : rtx_cost (x, SET) * 2);
  }
  
--- 784,790 ----
  	   && subreg_lowpart_p (x)
  	   && TRULY_NOOP_TRUNCATION (GET_MODE_BITSIZE (GET_MODE (x)),
  				     GET_MODE_BITSIZE (GET_MODE (SUBREG_REG (x)))))
! 	  ? 0
  	  : rtx_cost (x, SET) * 2);
  }
  
*************** notreg_cost (x)
*** 742,748 ****
     to make the cost of the corresponding register-to-register instruction
     N times that of a fast register-to-register instruction.  */
  
! #define COSTS_N_INSNS(N) ((N) * 4 - 2)
  
  /* Return an estimate of the cost of computing rtx X.
     One use is in cse, to decide which expression to keep in the hash table.
--- 792,798 ----
     to make the cost of the corresponding register-to-register instruction
     N times that of a fast register-to-register instruction.  */
  
! #define COSTS_N_INSNS(N) ((N) * 2)
  
  /* Return an estimate of the cost of computing rtx X.
     One use is in cse, to decide which expression to keep in the hash table.
*************** rtx_cost (x, outer_code)
*** 800,806 ****
    switch (code)
      {
      case REG:
!       return ! CHEAP_REG (x);
  
      case SUBREG:
        /* If we can't tie these modes, make this expensive.  The larger
--- 850,856 ----
    switch (code)
      {
      case REG:
!       return 0;
  
      case SUBREG:
        /* If we can't tie these modes, make this expensive.  The larger
*************** rtx_cost (x, outer_code)
*** 808,814 ****
        if (! MODES_TIEABLE_P (GET_MODE (x), GET_MODE (SUBREG_REG (x))))
  	return COSTS_N_INSNS (2
  			      + GET_MODE_SIZE (GET_MODE (x)) / UNITS_PER_WORD);
!       return 2;
  #ifdef RTX_COSTS
        RTX_COSTS (x, code, outer_code);
  #endif
--- 858,865 ----
        if (! MODES_TIEABLE_P (GET_MODE (x), GET_MODE (SUBREG_REG (x))))
  	return COSTS_N_INSNS (2
  			      + GET_MODE_SIZE (GET_MODE (x)) / UNITS_PER_WORD);
!       break;
! 
  #ifdef RTX_COSTS
        RTX_COSTS (x, code, outer_code);
  #endif
*************** address_cost (x, mode)
*** 865,870 ****
--- 916,922 ----
    return rtx_cost (x, MEM);
  #endif
  }
+ 
  
  static struct cse_reg_info *
  get_cse_reg_info (regno)
*************** lookup_as_function (x, code)
*** 1479,1485 ****
  
     If necessary, update table showing constant values of quantities.  */
  
! #define CHEAPER(X,Y)   ((X)->cost < (Y)->cost)
  
  static struct table_elt *
  insert (x, classp, hash, mode)
--- 1531,1538 ----
  
     If necessary, update table showing constant values of quantities.  */
  
! #define CHEAPER(X, Y) \
!  (preferrable ((X)->cost, (X)->regcost, (Y)->cost, (Y)->regcost) < 0)
  
  static struct table_elt *
  insert (x, classp, hash, mode)
*************** insert (x, classp, hash, mode)
*** 1526,1531 ****
--- 1579,1585 ----
    elt->exp = x;
    elt->canon_exp = NULL_RTX;
    elt->cost = COST (x);
+   elt->regcost = approx_reg_cost (x);
    elt->next_same_value = 0;
    elt->prev_same_value = 0;
    elt->next_same_hash = table[hash];
*************** find_best_addr (insn, loc, mode)
*** 2717,2723 ****
    int save_hash_arg_in_memory = hash_arg_in_memory;
    int addr_volatile;
    int regno;
-   int folded_cost, addr_cost;
    unsigned hash;
  
    /* Do not try to replace constant addresses or addresses of local and
--- 2771,2776 ----
*************** find_best_addr (insn, loc, mode)
*** 2750,2763 ****
    if (GET_CODE (addr) != REG)
      {
        rtx folded = fold_rtx (copy_rtx (addr), NULL_RTX);
! 
!       folded_cost = address_cost (folded, mode);
!       addr_cost = address_cost (addr, mode);
  
!       if ((folded_cost < addr_cost
! 	   || (folded_cost == addr_cost
! 	       && rtx_cost (folded, MEM) > rtx_cost (addr, MEM)))
! 	  && rtx_cost (folded, MEM) < rtx_cost (addr, MEM)
  	  && validate_change (insn, loc, folded, 0))
  	addr = folded;
      }
--- 2803,2817 ----
    if (GET_CODE (addr) != REG)
      {
        rtx folded = fold_rtx (copy_rtx (addr), NULL_RTX);
!       int addr_folded_cost = address_cost (folded, mode);
!       int addr_cost = address_cost (addr, mode);
  
!       if ((addr_folded_cost < addr_cost
! 	   || (addr_folded_cost == addr_cost
! 	       /* ??? The rtx_cost comparison is left over from an older
! 		  version of this code.  It is probably no longer helpful.  */
! 	       && (rtx_cost (folded, MEM) > rtx_cost (addr, MEM)
! 		   || approx_reg_cost (folded) < approx_reg_cost (addr))))
  	  && validate_change (insn, loc, folded, 0))
  	addr = folded;
      }
*************** cse_insn (insn, libcall_insn)
*** 4764,4769 ****
--- 4818,4825 ----
        struct table_elt *src_const_elt = 0;
        int src_cost = 10000, src_eqv_cost = 10000, src_folded_cost = 10000;
        int src_related_cost = 10000, src_elt_cost = 10000;
+       int src_regcost, src_eqv_regcost, src_folded_regcost;
+       int src_related_regcost, src_elt_regcost;
        /* Set non-zero if we need to call force_const_mem on with the
  	 contents of src_folded before using it.  */
        int src_folded_force_flag = 0;
*************** cse_insn (insn, libcall_insn)
*** 5172,5178 ****
  	  if (rtx_equal_p (src, dest))
  	    src_cost = -1;
  	  else
! 	    src_cost = COST (src);
  	}
  
        if (src_eqv_here)
--- 5228,5237 ----
  	  if (rtx_equal_p (src, dest))
  	    src_cost = -1;
  	  else
! 	    {
! 	      src_cost = COST (src);
! 	      src_regcost = approx_reg_cost (src);
! 	    }
  	}
  
        if (src_eqv_here)
*************** cse_insn (insn, libcall_insn)
*** 5180,5186 ****
  	  if (rtx_equal_p (src_eqv_here, dest))
  	    src_eqv_cost = -1;
  	  else
! 	    src_eqv_cost = COST (src_eqv_here);
  	}
  
        if (src_folded)
--- 5239,5248 ----
  	  if (rtx_equal_p (src_eqv_here, dest))
  	    src_eqv_cost = -1;
  	  else
! 	    {
! 	      src_eqv_cost = COST (src_eqv_here);
! 	      src_eqv_regcost = approx_reg_cost (src_eqv_here);
! 	    }
  	}
  
        if (src_folded)
*************** cse_insn (insn, libcall_insn)
*** 5188,5194 ****
  	  if (rtx_equal_p (src_folded, dest))
  	    src_folded_cost = -1;
  	  else
! 	    src_folded_cost = COST (src_folded);
  	}
  
        if (src_related)
--- 5250,5259 ----
  	  if (rtx_equal_p (src_folded, dest))
  	    src_folded_cost = -1;
  	  else
! 	    {
! 	      src_folded_cost = COST (src_folded);
! 	      src_folded_regcost = approx_reg_cost (src_folded);
! 	    }
  	}
  
        if (src_related)
*************** cse_insn (insn, libcall_insn)
*** 5196,5202 ****
  	  if (rtx_equal_p (src_related, dest))
  	    src_related_cost = -1;
  	  else
! 	    src_related_cost = COST (src_related);
  	}
  
        /* If this was an indirect jump insn, a known label will really be
--- 5261,5270 ----
  	  if (rtx_equal_p (src_related, dest))
  	    src_related_cost = -1;
  	  else
! 	    {
! 	      src_related_cost = COST (src_related);
! 	      src_related_regcost = approx_reg_cost (src_related);
! 	    }
  	}
  
        /* If this was an indirect jump insn, a known label will really be
*************** cse_insn (insn, libcall_insn)
*** 5234,5263 ****
  	      continue;
  	    }
  
! 	  if (elt)
! 	    src_elt_cost = elt->cost;
  
! 	  /* Find cheapest and skip it for the next time.   For items
  	     of equal cost, use this order:
  	     src_folded, src, src_eqv, src_related and hash table entry.  */
! 	  if (src_folded_cost <= src_cost
! 	      && src_folded_cost <= src_eqv_cost
! 	      && src_folded_cost <= src_related_cost
! 	      && src_folded_cost <= src_elt_cost)
  	    {
  	      trial = src_folded, src_folded_cost = 10000;
  	      if (src_folded_force_flag)
  		trial = force_const_mem (mode, trial);
  	    }
! 	  else if (src_cost <= src_eqv_cost
! 		   && src_cost <= src_related_cost
! 		   && src_cost <= src_elt_cost)
  	    trial = src, src_cost = 10000;
! 	  else if (src_eqv_cost <= src_related_cost
! 		   && src_eqv_cost <= src_elt_cost)
  	    trial = copy_rtx (src_eqv_here), src_eqv_cost = 10000;
! 	  else if (src_related_cost <= src_elt_cost)
! 	    trial = copy_rtx (src_related), src_related_cost = 10000;
  	  else
  	    {
  	      trial = copy_rtx (elt->exp);
--- 5302,5344 ----
  	      continue;
  	    }
  
!           if (elt)
! 	    {
! 	      src_elt_cost = elt->cost;
! 	      src_elt_regcost = elt->regcost;
! 	    }
  
!           /* Find cheapest and skip it for the next time.   For items
  	     of equal cost, use this order:
  	     src_folded, src, src_eqv, src_related and hash table entry.  */
! 	  if (preferrable (src_folded_cost, src_folded_regcost,
! 			   src_cost, src_regcost) <= 0
! 	      && preferrable (src_folded_cost, src_folded_regcost,
! 			      src_eqv_cost, src_eqv_regcost) <= 0
! 	      && preferrable (src_folded_cost, src_folded_regcost,
! 			      src_related_cost, src_related_regcost) <= 0
! 	      && preferrable (src_folded_cost, src_folded_regcost,
! 			      src_elt_cost, src_elt_regcost) <= 0)
  	    {
  	      trial = src_folded, src_folded_cost = 10000;
  	      if (src_folded_force_flag)
  		trial = force_const_mem (mode, trial);
  	    }
! 	  else if (preferrable (src_cost, src_regcost,
! 				src_eqv_cost, src_eqv_regcost) <= 0
! 		   && preferrable (src_cost, src_regcost,
! 				   src_related_cost, src_related_regcost) <= 0
! 		   && preferrable (src_cost, src_regcost,
! 				   src_elt_cost, src_elt_regcost) <= 0)
  	    trial = src, src_cost = 10000;
! 	  else if (preferrable (src_eqv_cost, src_eqv_regcost,
! 				src_related_cost, src_related_regcost) <= 0
! 		   && preferrable (src_eqv_cost, src_eqv_regcost,
! 				   src_elt_cost, src_elt_regcost) <= 0)
  	    trial = copy_rtx (src_eqv_here), src_eqv_cost = 10000;
! 	  else if (preferrable (src_related_cost, src_related_regcost,
! 				src_elt_cost, src_elt_regcost) <= 0)
!   	    trial = copy_rtx (src_related), src_related_cost = 10000;
  	  else
  	    {
  	      trial = copy_rtx (elt->exp);
Index: loop.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/loop.c,v
retrieving revision 1.271
diff -c -p -r1.271 loop.c
*** loop.c	2000/08/19 16:34:44	1.271
--- loop.c	2000/08/23 17:45:57
*************** static void try_swap_copy_prop PARAMS ((
*** 311,316 ****
--- 311,317 ----
  static int replace_label PARAMS ((rtx *, void *));
  static rtx check_insn_for_givs PARAMS((struct loop *, rtx, int, int));
  static rtx check_insn_for_bivs PARAMS((struct loop *, rtx, int, int));
+ static int iv_add_mult_cost PARAMS ((rtx, rtx, rtx, rtx));
  
  typedef struct rtx_and_int {
    rtx r;
*************** static int compute_luids PARAMS ((rtx, r
*** 337,349 ****
  static int biv_elimination_giv_has_0_offset PARAMS ((struct induction *,
  						   struct induction *, rtx));
  
- /* Relative gain of eliminating various kinds of operations.  */
- static int add_cost;
- #if 0
- static int shift_cost;
- static int mult_cost;
- #endif
- 
  /* Benefit penalty, if a giv is not replaceable, i.e. must emit an insn to
     copy the value of the strength reduced giv to its original register.  */
  static int copy_cost;
--- 338,343 ----
*************** init_loop ()
*** 357,371 ****
    char *free_point = (char *) oballoc (1);
    rtx reg = gen_rtx_REG (word_mode, LAST_VIRTUAL_REGISTER + 1);
  
-   add_cost = rtx_cost (gen_rtx_PLUS (word_mode, reg, reg), SET);
- 
    reg_address_cost = address_cost (reg, SImode);
- 
-   /* We multiply by 2 to reconcile the difference in scale between
-      these two ways of computing costs.  Otherwise the cost of a copy
-      will be far less than the cost of an add.  */
  
!   copy_cost = 2 * 2;
  
    /* Free the objects we just allocated.  */
    obfree (free_point);
--- 351,359 ----
    char *free_point = (char *) oballoc (1);
    rtx reg = gen_rtx_REG (word_mode, LAST_VIRTUAL_REGISTER + 1);
  
    reg_address_cost = address_cost (reg, SImode);
  
!   copy_cost = 2;
  
    /* Free the objects we just allocated.  */
    obfree (free_point);
*************** strength_reduce (loop, insn_count, flags
*** 3872,3877 ****
--- 3860,3866 ----
    rtx loop_start = loop->start;
    rtx loop_end = loop->end;
    rtx loop_scan_start = loop->scan_start;
+   rtx test_reg = gen_rtx_REG (word_mode, LAST_VIRTUAL_REGISTER + 1);
  
    VARRAY_INT_INIT (reg_iv_type, max_reg_before_loop, "reg_iv_type");
    VARRAY_GENERIC_PTR_INIT (reg_iv_info, max_reg_before_loop, "reg_iv_info");
*************** strength_reduce (loop, insn_count, flags
*** 4482,4492 ****
--- 4471,4485 ----
        for (v = bl->giv; v; v = v->next_iv)
  	{
  	  struct induction *tv;
+ 	  int add_cost;
  
  	  if (v->ignore || v->same)
  	    continue;
  
  	  benefit = v->benefit;
+ 	  PUT_MODE (test_reg, v->mode);
+ 	  add_cost = iv_add_mult_cost (bl->biv->add_val, v->mult_val,
+ 				       test_reg, test_reg);
  
  	  /* Reduce benefit if not replaceable, since we will insert
  	     a move-insn to replace the insn that calculates this giv.
*************** strength_reduce (loop, insn_count, flags
*** 4503,4509 ****
  	    benefit -= copy_cost;
  
  	  /* Decrease the benefit to count the add-insns that we will
! 	     insert to increment the reduced reg for the giv.  */
  	  benefit -= add_cost * bl->biv_count;
  
  	  /* Decide whether to strength-reduce this giv or to leave the code
--- 4496,4509 ----
  	    benefit -= copy_cost;
  
  	  /* Decrease the benefit to count the add-insns that we will
! 	     insert to increment the reduced reg for the giv.
! 	     ??? This can overestimate the run-time cost of the additional
! 	     insns, e.g. if there are multiple basic blocks that increment
! 	     the biv, but only one of these blocks is executed during each
! 	     iteration.  There is no good way to detect cases like this with
! 	     the current structure of the loop optimizer.
! 	     This code is more accurate for determining code size than
! 	     run-time benefits.  */
  	  benefit -= add_cost * bl->biv_count;
  
  	  /* Decide whether to strength-reduce this giv or to leave the code
*************** strength_reduce (loop, insn_count, flags
*** 4515,4520 ****
--- 4515,4524 ----
  	     new add insns; if so, increase BENEFIT (undo the subtraction of
  	     add_cost that was done above).  */
  	  if (v->giv_type == DEST_ADDR
+ 	      /* Increasing the benefit is risky, since this is only a guess.
+ 		 Avoid increasing register pressure in cases where there would
+ 		 be no other benefit from reducing this giv.  */
+ 	      && benefit > 0
  	      && GET_CODE (v->mult_val) == CONST_INT)
  	    {
  	      if (HAVE_POST_INCREMENT
*************** simplify_giv_expr (loop, x, benefit)
*** 6479,6485 ****
  
  	    /* Form expression from giv and add benefit.  Ensure this giv
  	       can derive another and subtract any needed adjustment if so.  */
! 	    *benefit += v->benefit;
  	    if (v->cant_derive)
  	      return 0;
  
--- 6483,6502 ----
  
  	    /* Form expression from giv and add benefit.  Ensure this giv
  	       can derive another and subtract any needed adjustment if so.  */
! 
! 	    /* Increasing the benefit here is risky.  The only case in which it
! 	       is arguably correct is if this is the only use of V.  In other
! 	       cases, this will artificially inflate the benefit of the current
! 	       giv, and lead to suboptimal code.  Thus, it is disabled, since
! 	       potentially not reducing an only marginally beneficial giv is
! 	       less harmful than reducing many givs that are not really
! 	       beneficial.  */
! 	    {
! 	      rtx single_use = VARRAY_RTX (reg_single_usage, REGNO (x));
! 	      if (single_use && single_use != const0_rtx)
! 		*benefit += v->benefit;
! 	    }
! 
  	    if (v->cant_derive)
  	      return 0;
  
*************** consec_sets_giv (loop, first_benefit, p,
*** 6723,6729 ****
  	  count--;
  	  v->mult_val = *mult_val;
  	  v->add_val = *add_val;
! 	  v->benefit = benefit;
  	}
        else if (code != NOTE)
  	{
--- 6740,6746 ----
  	  count--;
  	  v->mult_val = *mult_val;
  	  v->add_val = *add_val;
! 	  v->benefit += benefit;
  	}
        else if (code != NOTE)
  	{
*************** restart:
*** 7138,7145 ****
  
  	      if (loop_dump_stream)
  		fprintf (loop_dump_stream,
! 			 "giv at %d combined with giv at %d\n",
! 			 INSN_UID (g2->insn), INSN_UID (g1->insn));
  	    }
  	}
  
--- 7155,7163 ----
  
  	      if (loop_dump_stream)
  		fprintf (loop_dump_stream,
! 			 "giv at %d combined with giv at %d; new benefit %d + %d, lifetime %d\n",
! 			 INSN_UID (g2->insn), INSN_UID (g1->insn),
! 			 g1->benefit, g1_add_benefit, g1->lifetime);
  	    }
  	}
  
*************** emit_iv_add_mult (b, m, a, reg, insert_b
*** 7652,7657 ****
--- 7670,7703 ----
    else if (GET_CODE (seq) == SET
  	   && GET_CODE (SET_DEST (seq)) == REG)
      record_base_value (REGNO (SET_DEST (seq)), SET_SRC (seq), 0);
+ }
+ 
+ /* Similar to emit_iv_add_mult, but compute cost rather than emitting
+    insns.  */
+ static int
+ iv_add_mult_cost (b, m, a, reg)
+      rtx b;          /* initial value of basic induction variable */
+      rtx m;          /* multiplicative constant */
+      rtx a;          /* additive constant */
+      rtx reg;        /* destination register */
+ {
+   int cost = 0;
+   rtx last, result;
+ 
+   start_sequence ();
+   result = expand_mult_add (b, reg, m, a, GET_MODE (reg), 0);
+   if (reg != result)
+     emit_move_insn (reg, result);
+   last = get_last_insn ();
+   while (last)
+     {
+       rtx t = single_set (last);
+       if (t)
+ 	cost += rtx_cost (SET_SRC (t), SET);
+       last = PREV_INSN (last);
+     }
+   end_sequence ();
+   return cost;
  }
  
  /* Test whether A * B can be computed without
Index: config/i386/i386.h
===================================================================
RCS file: /cvs/gcc/egcs/gcc/config/i386/i386.h,v
retrieving revision 1.122
diff -c -p -r1.122 i386.h
*** i386.h	2000/08/05 18:24:19	1.122
--- config/i386/i386.h	2000/08/23 17:46:13
*************** while (0)
*** 1966,1987 ****
  	unsigned HOST_WIDE_INT value = INTVAL (XEXP (X, 1));		\
  	int nbits = 0;							\
  									\
- 	if (value == 2)							\
- 	  TOPLEVEL_COSTS_N_INSNS (ix86_cost->add);			\
- 	if (value == 4 || value == 8)					\
- 	  TOPLEVEL_COSTS_N_INSNS (ix86_cost->lea);			\
- 									\
  	while (value != 0)						\
  	  {								\
  	    nbits++;							\
  	    value >>= 1;						\
  	  } 								\
  									\
! 	if (nbits == 1)							\
! 	  TOPLEVEL_COSTS_N_INSNS (ix86_cost->shift_const);		\
! 	else								\
! 	  TOPLEVEL_COSTS_N_INSNS (ix86_cost->mult_init			\
! 			          + nbits * ix86_cost->mult_bit);	\
        }									\
      else			/* This is arbitrary */			\
        TOPLEVEL_COSTS_N_INSNS (ix86_cost->mult_init			\
--- 1966,1979 ----
  	unsigned HOST_WIDE_INT value = INTVAL (XEXP (X, 1));		\
  	int nbits = 0;							\
  									\
  	while (value != 0)						\
  	  {								\
  	    nbits++;							\
  	    value >>= 1;						\
  	  } 								\
  									\
! 	TOPLEVEL_COSTS_N_INSNS (ix86_cost->mult_init			\
! 			        + nbits * ix86_cost->mult_bit);		\
        }									\
      else			/* This is arbitrary */			\
        TOPLEVEL_COSTS_N_INSNS (ix86_cost->mult_init			\
Index: config/sh/sh-protos.h
===================================================================
RCS file: /cvs/gcc/egcs/gcc/config/sh/sh-protos.h,v
retrieving revision 1.6
diff -c -p -r1.6 sh-protos.h
*** sh-protos.h	2000/04/05 23:12:52	1.6
--- config/sh/sh-protos.h	2000/08/23 17:46:13
*************** extern int prepare_move_operands PARAMS 
*** 51,56 ****
--- 51,57 ----
  extern void from_compare PARAMS ((rtx *, int));
  extern int shift_insns_rtx PARAMS ((rtx));
  extern int shiftcosts PARAMS ((rtx));
+ extern int addsubcosts PARAMS ((rtx));
  extern int andcosts PARAMS ((rtx));
  extern int multcosts PARAMS ((rtx));
  extern void gen_ashift PARAMS ((int, int, rtx));
Index: config/sh/sh.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/config/sh/sh.c,v
retrieving revision 1.59
diff -c -p -r1.59 sh.c
*** sh.c	2000/08/22 19:39:56	1.59
--- config/sh/sh.c	2000/08/23 17:46:19
*************** andcosts (x)
*** 981,986 ****
--- 981,1005 ----
    return 3;
  }
  
+ /* Return the cost of an addition or a subtraction.  */
+ 
+ int
+ addsubcosts (x)
+      rtx x;
+ {
+   /* Adding a register is a single cycle insn.  */
+   if (GET_CODE (XEXP (x, 1)) != CONST_INT)
+     return 1;
+ 
+   /* Likewise for small constants.  */
+   if (CONST_OK_FOR_I (INTVAL (XEXP (x, 1))))
+     return 1;
+ 
+   /* Any other constant requires a 2 cycle pc-relative load plus an
+      addition.  */
+   return 3;
+ }
+ 
  /* Return the cost of a multiply.  */
  int
  multcosts (x)
Index: config/sh/sh.h
===================================================================
RCS file: /cvs/gcc/egcs/gcc/config/sh/sh.h,v
retrieving revision 1.67
diff -c -p -r1.67 sh.h
*** sh.h	2000/08/07 09:58:29	1.67
--- config/sh/sh.h	2000/08/23 17:46:22
*************** extern int current_function_anonymous_ar
*** 1630,1644 ****
  #define Pmode  SImode
  #define FUNCTION_MODE  Pmode
  
! /* The relative costs of various types of constants.  Note that cse.c defines
!    REG = 1, SUBREG = 2, any node = (2 + sum of subnodes).  */
  
  #define CONST_COSTS(RTX, CODE, OUTER_CODE)	\
    case CONST_INT:				\
!     if (INTVAL (RTX) == 0)			\
        return 0;					\
-     else if (CONST_OK_FOR_I (INTVAL (RTX)))	\
-       return 1;					\
      else if (((OUTER_CODE) == AND || (OUTER_CODE) == IOR || (OUTER_CODE) == XOR) \
  	     && CONST_OK_FOR_L (INTVAL (RTX)))	\
        return 1;					\
--- 1630,1641 ----
  #define Pmode  SImode
  #define FUNCTION_MODE  Pmode
  
! /* The relative costs of various types of constants.  */
  
  #define CONST_COSTS(RTX, CODE, OUTER_CODE)	\
    case CONST_INT:				\
!     if (CONST_OK_FOR_I (INTVAL (RTX)))		\
        return 0;					\
      else if (((OUTER_CODE) == AND || (OUTER_CODE) == IOR || (OUTER_CODE) == XOR) \
  	     && CONST_OK_FOR_L (INTVAL (RTX)))	\
        return 1;					\
*************** extern int current_function_anonymous_ar
*** 1653,1662 ****
  
  #define RTX_COSTS(X, CODE, OUTER_CODE)			\
    case PLUS:						\
!     return (COSTS_N_INSNS (1)				\
! 	    + rtx_cost (XEXP ((X), 0), PLUS)		\
! 	    + (rtx_equal_p (XEXP ((X), 0), XEXP ((X), 1))\
! 	       ? 0 : rtx_cost (XEXP ((X), 1), PLUS)));\
    case AND:						\
      return COSTS_N_INSNS (andcosts (X));		\
    case MULT:						\
--- 1650,1656 ----
  
  #define RTX_COSTS(X, CODE, OUTER_CODE)			\
    case PLUS:						\
!     return COSTS_N_INSNS (addsubcosts (X));		\
    case AND:						\
      return COSTS_N_INSNS (andcosts (X));		\
    case MULT:						\
*************** extern int current_function_anonymous_ar
*** 1664,1676 ****
    case ASHIFT:						\
    case ASHIFTRT:					\
    case LSHIFTRT:					\
!     /* Add one extra unit for the matching constraint.	\
!        Otherwise loop strength reduction would think that\
!        a shift with different sourc and destination is	\
!        as cheap as adding a constant to a register.  */	\
!     return (COSTS_N_INSNS (shiftcosts (X))		\
! 	    + rtx_cost (XEXP ((X), 0), (CODE))		\
! 	    + 1);					\
    case DIV:						\
    case UDIV:						\
    case MOD:						\
--- 1658,1664 ----
    case ASHIFT:						\
    case ASHIFTRT:					\
    case LSHIFTRT:					\
!     return COSTS_N_INSNS (shiftcosts (X));		\
    case DIV:						\
    case UDIV:						\
    case MOD:						\
*************** extern int current_function_anonymous_ar
*** 1709,1719 ****
      && get_attr_is_sfunc (X)))
  
  /* Compute the cost of an address.  For the SH, all valid addresses are
!    the same cost.  */
! /* ??? Perhaps we should make reg+reg addresses have higher cost because
!    they add to register pressure on r0.  */
  
! #define ADDRESS_COST(RTX) 1
  
  /* Compute extra cost of moving data between one register class
     and another.  */
--- 1697,1707 ----
      && get_attr_is_sfunc (X)))
  
  /* Compute the cost of an address.  For the SH, all valid addresses are
!    the same cost.  Use a slightly higher cost for reg + reg addressing,
!    since it increases pressure on r0.  */
  
! #define ADDRESS_COST(X) (GET_CODE (X) == PLUS && ! CONSTANT_P (XEXP (X, 1)) \
! 			 ? 1 : 0)
  
  /* Compute extra cost of moving data between one register class
     and another.  */


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]