This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
RE: Register allocation: caller-save vs spilling
- From: "Wilco Dijkstra" <wdijkstr at arm dot com>
- To: <vmakarov at redhat dot com>
- Cc: <gcc at gcc dot gnu dot org>
- Date: Thu, 4 Sep 2014 19:37:41 +0100
- Subject: RE: Register allocation: caller-save vs spilling
- Authentication-results: sourceware.org; auth=none
- References:
Hi Vlad,
I added you directly in case you hadn't spotted my original post.
A simple example for AArch64 trunk is as follows:
// Compile with: -O2 -fomit-frame-pointer -ffixed-d8 -ffixed-d9 -ffixed-d10 -ffixed-d11 -ffixed-d12
-ffixed-d13 -ffixed-d14 -ffixed-d15 -f(no-)caller-saves
void g(void);
float f(float x)
{
x += 3.0;
g();
x *= 3.0;
return x;
}
It seems that reload only ever considers rematerialization of spilled liveranges, not caller-saved
ones. That means the caller-save code should either reject constants outright or the memory spill
cost for these should always be lower than that of a caller-save (given memory_move_cost=4 and
register_move_cost=2 as commonly used by targets, anything that can be rematerialized should have
less than half the cost of being spilled or caller-saved).
Wilco
> -----Original Message-----
> From: Wilco Dijkstra [mailto:wdijkstr@arm.com]
> Sent: 27 August 2014 17:25
> To: 'gcc@gcc.gnu.org'
> Subject: Register allocation: caller-save vs spilling
>
> Hi,
>
> I'm investigating various register allocation inefficiencies. The first thing that stands out
> is that GCC both supports caller-saves as well as spilling. Spilling seems to spill all
> definitions and all uses of a liverange. This means you often end up with multiple reloads
> close together, while it would be more efficient to do a single load and then reuse the loaded
> value several times. Caller-save does better in that case, but it is inefficient in that it
> repeatedly stores registers across every call even if unchanged. If both were fixed to
> minimise the number of loads/stores I can't see how one could beat the other, so you'd no
> longer need both.
>
> Anyway due to the current implementation there are clearly cases where caller-save is best and
> cases where spilling is best. However I do not see it making the correct decision despite
> trying to account for the costs - some code is significantly faster with -fno-caller-saves,
> other code wins with -fcaller-saves. As an example, I see code like this on AArch64:
>
> ldr s4, .LC20
> fmul s0, s0, s4
> str s4, [x29, 104]
> bl f
> ldr s4, [x29, 104]
> fmul s0, s0, s4
>
> With -fno-caller-saves it spills and rematerializes the constant as you'd expect:
>
> ldr s1, .LC20
> fmul s0, s0, s1
> bl f
> ldr s5, .LC20
> fmul s0, s0, s5
>
> So given this, is the cost calculation correct and does it include rematerialization? The
> spill code understands how to rematerialize so it should take this into account in the costs.
> I did find some code in ira-costs.c in scan_one_insn() that attempts something that looks like
> an adjustment for rematerialization but it doesn't appear to handle all cases (simple
> immediates, 2-instruction immediates, address-constants, non-aliased loads such as literal
> pool and const data loads).
>
> Also the hook CALLER_SAVE_PROFITABLE appears to have disappeared - overall performance
> improves significantly if I add this (basically the default heuristic used on instruction
> frequencies):
>
> --- a/gcc/ira-costs.c
> +++ b/gcc/ira-costs.c
> @@ -2230,6 +2230,8 @@ ira_tune_allocno_costs (void)
> * ALLOCNO_FREQ (a)
> * IRA_HARD_REGNO_ADD_COST_MULTIPLIER (regno) / 2);
> #endif
> + if (ALLOCNO_FREQ (a) < 4 * ALLOCNO_CALL_FREQ (a))
> + cost = INT_MAX;
> }
> if (INT_MAX - cost < reg_costs[j])
> reg_costs[j] = INT_MAX;
>
> If such a simple heuristic can beat the costs, they can't be quite right.
Note if (ALLOCNO_FREQ (a) < 2 * ALLOCNO_CALL_FREQ (a)) turns out to be best overall.
> Is there anyone who understands the cost calculations?
>
> Wilco