This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: RFA: Tweak RA cost calculation for -O0

From: Jeff Law <law at redhat dot com>
To: gcc-patches at gcc dot gnu dot org, vmakarov at redhat dot com, rdsandiford at googlemail dot com
Date: Fri, 25 Jul 2014 16:23:04 -0600
Subject: Re: RFA: Tweak RA cost calculation for -O0
Authentication-results: sourceware.org; auth=none
References: <87lhrpnmg0 dot fsf at talisman dot default>

On 07/19/14 14:05, Richard Sandiford wrote:

IRA (like the old allocators) calculates a "best" class and an "alternative"
class for each allocno.  The best class is the one with the lowest cost while
the alternative class is the biggest class whose cost is smaller than the
cost of spilling.  When optimising we adjust the costs so that the best
class is preferred, but when not optimisting we effectively use the
alternative class:

       if (optimize && ALLOCNO_CLASS (a) != pref[i])
	{
	  n = ira_class_hard_regs_num[aclass];
	  ALLOCNO_HARD_REG_COSTS (a)
	    = reg_costs = ira_allocate_cost_vector (aclass);
	  for (j = n - 1; j >= 0; j--)
	    {
	      hard_regno = ira_class_hard_regs[aclass][j];
	      if (TEST_HARD_REG_BIT (reg_class_contents[pref[i]], hard_regno))
		reg_costs[j] = ALLOCNO_CLASS_COST (a);
	      else
		{
		  rclass = REGNO_REG_CLASS (hard_regno);
		  num = cost_classes_ptr->index[rclass];
		  if (num < 0)
		    {
		      num = cost_classes_ptr->hard_regno_index[hard_regno];
		      ira_assert (num >= 0);
		    }
		  reg_costs[j] = COSTS (costs, i)->cost[num];
		}
	    }
	}

In some cases the alternative class can be significantly more costly
than the preferred class.  If reg_alloc_order lists a member of the
alternative class that isn't in the best class, then for -O0 we'll tend
to use it even if many members of the best class are still free.

Obviously we don't want to spend too much on RA at -O0.  But with the
code above disabled, I think we should use the best class as the allocno
class if it isn't likely to be spilled.  The patch below does this.

One case where this occurs is on MIPS with:

   (set (reg:DI y) (sign_extend:SI (reg:SI x)))
   ...(reg:DI y)...

where y is used in a context that requires a GPR and where the sign
extension is done by:

(define_insn_and_split "extendsidi2"
   [(set (match_operand:DI 0 "register_operand" "=d,l,d")
         (sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "0,0,m")))]
   "TARGET_64BIT"
   "@
    #
    #
    lw\t%0,%1"
   "&& reload_completed && register_operand (operands[1], VOIDmode)"
   [(const_int 0)]
{
   emit_note (NOTE_INSN_DELETED);
   DONE;
}
   [(set_attr "move_type" "move,move,load")
    (set_attr "mode" "DI")])

("l" is the LO register.)  Because a "d" register would satisfy both
the definition and use of "y", it is the best class and has a cost of 0.
But the costs are deliberately set up so that using "l" is cheaper
than memory (which might not be true on all processors, but that's
a question of -mtune-specific costs).  So the alternative class is
GR_AND_MD?_REGS.  The problem is that the LO register comes first
in the allocation order:

#define REG_ALLOC_ORDER							\
{ /* Accumulator registers.  When GPRs and accumulators have equal	\
      cost, we generally prefer to use accumulators.  For example,	\
      a division of multiplication result is better allocated to LO,	\
      so that we put the MFLO at the point of use instead of at the	\
      point of definition.  It's also needed if we're to take advantage	\
      of the extra accumulators available with -mdspr2.  In some cases,	\
      it can also help to reduce register pressure.  */			\
   64, 65,176,177,178,179,180,181,					\

So we end up picking LO even though it is significantly more costly
than GPRs in this case (but still less costly than memory, again
according to the chosen cost model).  This is the cause of a regression
in gcc.target/mips/branch-7.c after:

2014-06-24  Catherine Moore  <clm@codesourcery.com>
	    Sandra Loosemore  <sandra@codesourcery.com>

	* config/mips/mips.c (mips_order_regs_for_local_alloc): Delete.
	* config/mips/mips.h (ADJUST_REG_ALLOC_ORDER): Delete.
	* config/mips/mips-protos.h (mips_order_regs_for_local_alloc): Delete.

But I think it's a general problem.  I tried the patch on CSiBE for:

   MIPS -mips64r2 -mabi=32 -mips16
   MIPS -mips64r2 -mabi=32
   MIPS -mips64r2 -mabi=n32
   MIPS -mips64r2 -mabi=64
   MIPS -mips3 -mabi=64
   x86_64 -m32
   x86_64

all at -O0, and it was a size win in all cases.  The biggest win was for
MIPS -mips64r2 -mabi=32 (which isn't affected by the sign_extend case above,
since the pattern is restricted to 64-bit targets).  The saving there was 10%.
The saving for x86_64 -m32 was just 0.05%, but the saving for x86_64 was a
more worthwhile 0.5%.

Tested on mips64-linux-gnu and x86_64-linux-gnu.  It fixes the
gcc.target/mips/branch-7.c regression for MIPS.  OK to install?

Thanks,
Richard


gcc/
	* ira-costs.c (find_costs_and_classes): For -O0, use the best class
	as the allocation class if it isn't likely to be spilled.

Seems reasonable. Please keep an eye out for fallout on other targets(I'm thinking ARM in particular).


Thanks,
jeff

References:
- RFA: Tweak RA cost calculation for -O0
  - From: Richard Sandiford

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]