This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: Register allocation: caller-save vs spilling

From: "Wilco Dijkstra" <wdijkstr at arm dot com>
To: <vmakarov at redhat dot com>
Cc: <gcc at gcc dot gnu dot org>
Date: Thu, 4 Sep 2014 19:37:41 +0100
Subject: RE: Register allocation: caller-save vs spilling
Authentication-results: sourceware.org; auth=none
References:

Hi Vlad,

I added you directly in case you hadn't spotted my original post.

A simple example for AArch64 trunk is as follows:

// Compile with: -O2 -fomit-frame-pointer -ffixed-d8 -ffixed-d9 -ffixed-d10 -ffixed-d11 -ffixed-d12
-ffixed-d13 -ffixed-d14 -ffixed-d15 -f(no-)caller-saves
void g(void);

float f(float x)
{
  x += 3.0;
  g();
  x *= 3.0;
  return x;
}

It seems that reload only ever considers rematerialization of spilled liveranges, not caller-saved
ones. That means the caller-save code should either reject constants outright or the memory spill
cost for these should always be lower than that of a caller-save (given memory_move_cost=4 and
register_move_cost=2 as commonly used by targets, anything that can be rematerialized should have
less than half the cost of being spilled or caller-saved).

Wilco

> -----Original Message-----
> From: Wilco Dijkstra [mailto:wdijkstr@arm.com]
> Sent: 27 August 2014 17:25
> To: 'gcc@gcc.gnu.org'
> Subject: Register allocation: caller-save vs spilling
> 
> Hi,
> 
> I'm investigating various register allocation inefficiencies. The first thing that stands out
> is that GCC both supports caller-saves as well as spilling. Spilling seems to spill all
> definitions and all uses of a liverange. This means you often end up with multiple reloads
> close together, while it would be more efficient to do a single load and then reuse the loaded
> value several times. Caller-save does better in that case, but it is inefficient in that it
> repeatedly stores registers across every call even if unchanged. If both were fixed to
> minimise the number of loads/stores I can't see how one could beat the other, so you'd no
> longer need both.
> 
> Anyway due to the current implementation there are clearly cases where caller-save is best and
> cases where spilling is best. However I do not see it making the correct decision despite
> trying to account for the costs - some code is significantly faster with -fno-caller-saves,
> other code wins with -fcaller-saves. As an example, I see code like this on AArch64:
> 
>         ldr     s4, .LC20
>         fmul    s0, s0, s4
>         str     s4, [x29, 104]
>         bl      f
>         ldr     s4, [x29, 104]
>         fmul    s0, s0, s4
> 
> With -fno-caller-saves it spills and rematerializes the constant as you'd expect:
> 
>         ldr     s1, .LC20
>         fmul    s0, s0, s1
>         bl      f
>         ldr     s5, .LC20
>         fmul    s0, s0, s5
> 
> So given this, is the cost calculation correct and does it include rematerialization? The
> spill code understands how to rematerialize so it should take this into account in the costs.
> I did find some code in ira-costs.c in scan_one_insn() that attempts something that looks like
> an adjustment for rematerialization but it doesn't appear to handle all cases (simple
> immediates, 2-instruction immediates, address-constants, non-aliased loads such as literal
> pool and const data loads).
> 
> Also the hook CALLER_SAVE_PROFITABLE appears to have disappeared - overall performance
> improves significantly if I add this (basically the default heuristic used on instruction
> frequencies):
> 
> --- a/gcc/ira-costs.c
> +++ b/gcc/ira-costs.c
> @@ -2230,6 +2230,8 @@ ira_tune_allocno_costs (void)
>                            * ALLOCNO_FREQ (a)
>                            * IRA_HARD_REGNO_ADD_COST_MULTIPLIER (regno) / 2);
>  #endif
> +                  if (ALLOCNO_FREQ (a) < 4 * ALLOCNO_CALL_FREQ (a))
> +                    cost = INT_MAX;
>                 }
>               if (INT_MAX - cost < reg_costs[j])
>                 reg_costs[j] = INT_MAX;
> 
> If such a simple heuristic can beat the costs, they can't be quite right.

Note if (ALLOCNO_FREQ (a) < 2 * ALLOCNO_CALL_FREQ (a)) turns out to be best overall.

> Is there anyone who understands the cost calculations?
> 
> Wilco

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]