RFA: Tweak RA cost calculation for -O0

Sat Jul 19 21:36:00 GMT 2014

IRA (like the old allocators) calculates a "best" class and an "alternative"
class for each allocno.  The best class is the one with the lowest cost while
the alternative class is the biggest class whose cost is smaller than the
cost of spilling.  When optimising we adjust the costs so that the best
class is preferred, but when not optimisting we effectively use the
alternative class:

      if (optimize && ALLOCNO_CLASS (a) != pref[i])
	{
	  n = ira_class_hard_regs_num[aclass];
	  ALLOCNO_HARD_REG_COSTS (a)
	    = reg_costs = ira_allocate_cost_vector (aclass);
	  for (j = n - 1; j >= 0; j--)
	    {
	      hard_regno = ira_class_hard_regs[aclass][j];
	      if (TEST_HARD_REG_BIT (reg_class_contents[pref[i]], hard_regno))
		reg_costs[j] = ALLOCNO_CLASS_COST (a);
	      else
		{
		  rclass = REGNO_REG_CLASS (hard_regno);
		  num = cost_classes_ptr->index[rclass];
		  if (num < 0)
		    {
		      num = cost_classes_ptr->hard_regno_index[hard_regno];
		      ira_assert (num >= 0);
		    }
		  reg_costs[j] = COSTS (costs, i)->cost[num];
		}
	    }
	}

In some cases the alternative class can be significantly more costly
than the preferred class.  If reg_alloc_order lists a member of the
alternative class that isn't in the best class, then for -O0 we'll tend
to use it even if many members of the best class are still free.

Obviously we don't want to spend too much on RA at -O0.  But with the
code above disabled, I think we should use the best class as the allocno
class if it isn't likely to be spilled.  The patch below does this.

One case where this occurs is on MIPS with:

  (set (reg:DI y) (sign_extend:SI (reg:SI x)))
  ...(reg:DI y)...

where y is used in a context that requires a GPR and where the sign
extension is done by:

(define_insn_and_split "extendsidi2"
  [(set (match_operand:DI 0 "register_operand" "=d,l,d")
        (sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "0,0,m")))]
  "TARGET_64BIT"
  "@
   #
   #
   lw\t%0,%1"
  "&& reload_completed && register_operand (operands[1], VOIDmode)"
  [(const_int 0)]
{
  emit_note (NOTE_INSN_DELETED);
  DONE;
}
  [(set_attr "move_type" "move,move,load")
   (set_attr "mode" "DI")])

("l" is the LO register.)  Because a "d" register would satisfy both
the definition and use of "y", it is the best class and has a cost of 0.
But the costs are deliberately set up so that using "l" is cheaper
than memory (which might not be true on all processors, but that's
a question of -mtune-specific costs).  So the alternative class is
GR_AND_MD?_REGS.  The problem is that the LO register comes first
in the allocation order:

#define REG_ALLOC_ORDER							\
{ /* Accumulator registers.  When GPRs and accumulators have equal	\
     cost, we generally prefer to use accumulators.  For example,	\
     a division of multiplication result is better allocated to LO,	\
     so that we put the MFLO at the point of use instead of at the	\
     point of definition.  It's also needed if we're to take advantage	\
     of the extra accumulators available with -mdspr2.  In some cases,	\
     it can also help to reduce register pressure.  */			\
  64, 65,176,177,178,179,180,181,					\

So we end up picking LO even though it is significantly more costly
than GPRs in this case (but still less costly than memory, again
according to the chosen cost model).  This is the cause of a regression
in gcc.target/mips/branch-7.c after:

2014-06-24  Catherine Moore  <clm@codesourcery.com>
	    Sandra Loosemore  <sandra@codesourcery.com>

	* config/mips/mips.c (mips_order_regs_for_local_alloc): Delete.
	* config/mips/mips.h (ADJUST_REG_ALLOC_ORDER): Delete.
	* config/mips/mips-protos.h (mips_order_regs_for_local_alloc): Delete.

But I think it's a general problem.  I tried the patch on CSiBE for:

  MIPS -mips64r2 -mabi=32 -mips16
  MIPS -mips64r2 -mabi=32
  MIPS -mips64r2 -mabi=n32
  MIPS -mips64r2 -mabi=64
  MIPS -mips3 -mabi=64
  x86_64 -m32
  x86_64

all at -O0, and it was a size win in all cases.  The biggest win was for
MIPS -mips64r2 -mabi=32 (which isn't affected by the sign_extend case above,
since the pattern is restricted to 64-bit targets).  The saving there was 10%.
The saving for x86_64 -m32 was just 0.05%, but the saving for x86_64 was a
more worthwhile 0.5%.

Tested on mips64-linux-gnu and x86_64-linux-gnu.  It fixes the
gcc.target/mips/branch-7.c regression for MIPS.  OK to install?

Thanks,
Richard


gcc/
	* ira-costs.c (find_costs_and_classes): For -O0, use the best class
	as the allocation class if it isn't likely to be spilled.

Index: gcc/ira-costs.c
===================================================================

--- gcc/ira-costs.c	2014-07-19 12:22:53.182907048 +0100
+++ gcc/ira-costs.c	2014-07-19 21:02:08.000441494 +0100
@@ -1753,6 +1753,20 @@ find_costs_and_classes (FILE *dump_file)
 	  alt_class = ira_allocno_class_translate[alt_class];
 	  if (best_cost > i_mem_cost)
 	    regno_aclass[i] = NO_REGS;
+	  else if (!optimize && !targetm.class_likely_spilled_p (best))
+	    /* Registers in the alternative class are likely to need
+	       longer or slower sequences than registers in the best class.
+	       When optimizing we make some effort to use the best class
+	       over the alternative class where possible, but at -O0 we
+	       effectively give the alternative class equal weight.
+	       We then run the risk of using slower alternative registers
+	       when plenty of registers from the best class are still free.
+	       This is especially true because live ranges tend to be very
+	       short in -O0 code and so register pressure tends to be low.
+
+	       Avoid that by ignoring the alternative class if the best
+	       class has plenty of registers.  */
+	    regno_aclass[i] = best;
 	  else
 	    {
 	      /* Make the common class the biggest class of best and