This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav: Message Nav: [Date Index] [Subject Index] [Author Index] [Thread Index] [Date Prev] [Date Next] [Thread Prev] [Thread Next] [Raw text]

# Move costs, allocno costs, and per-register costs

• From: Richard Sandiford <rdsandiford at googlemail dot com>
• To: gcc-patches at gcc dot gnu dot org
• Cc: vmakarov at redhat dot com, schwab at suse dot de
• Date: Sun, 09 Nov 2008 12:36:49 +0000
• Subject: Move costs, allocno costs, and per-register costs

[Andreas, I'm cc:ing you as m68k maintainer; scroll down for why]

This patch helps to fix the MIPS dspr2-MULT{,U}.c regressions
that were introduced by my MIPS IRA patch.  I think the underlying
problem is a general one.

MIPS defines the following REGISTER_MOVE_COSTS:

GR_REGS  -> GR_REGS  : 2
GR_REGS  -> ACC_REGS : 6
ACC_REGS -> GR_REGS  : 6
ACC_REGS -> ACC_REGS : 12  (because this move has to through a GPR)

[ At this point I should insert my usual caveat that these costs are
historical, and are not accurate for all MIPS processors.  They should
really be part of the cost structure, so that we can define processor-
specific costs instead.  But that's not relevant to this dicussion. ]

MIPS also has a union and cover class called GR_AND_ACC_REGS.
The problem relates to the allocno costs of this class.

First of all, move_cost[MODE][FROM][TO] is the maximum of all:

REGISTER_MOVE_COST (MODE, FROM', TO')

such that FROM' \subseteq FROM and TO \subseteq TO.  So we have:

move_cost[MODE][GR_AND_ACC_REGS][GR_REGS] = 6    (= ACC_REGS -> GR_REGS)
move_cost[MODE][GR_AND_ACC_REGS][ACC_REGS] = 12  (= ACC_REGS -> ACC_REGS)
move_cost[MODE][GR_REGS][GR_AND_ACC_REGS] = 6    (= GR_REGS  -> ACC_REGS)
move_cost[MODE][ACC_REGS][GR_AND_ACC_REGS] = 12  (= ACC_REGS -> ACC_REGS)

and so on.  This is correct from a MIPS point of view.

We also have:

/* Similar, but here we don't have to move if the first index is a subset
of the second so in that case the cost is zero.  */
move_table *may_move_in_cost[MAX_MACHINE_MODE];

/* Similar, but here we don't have to move if the first index is a superset
of the second so in that case the cost is zero.  */
move_table *may_move_out_cost[MAX_MACHINE_MODE];

So:

may_move_in_cost[MODE][FROM][TO]
= { 0 if FROM \subseteq TO
{ move_cost[MODE][FROM][TO] otherwise

This is intended to be used in cases where:

- FROM is a class that the register allocator might use for
a pseudo register; and

- TO is the class required by a particular alternative of a
particular insn operand.

So the cost is rightly zero when the source register already belongs
to TO; no input reload would be needed in that case .  However,
I think the "otherwise" case is an overestimate.

To explain why, suppose FROM == GR_AND_ACC_REGS and TO == ACC_REGS.
value into an ACC_REGS operand.  At the moment:

may_move_in_cost[MODE][GR_AND_ACC_REGS][ACC_REGS]
= move_cost[MODE][GR_AND_ACC_REGS][ACC_REGS]

And, from above, move_cost[MODE][GR_AND_ACC_REGS][ACC_REGS] == 12,
because ACC_REGS to ACC_REGS moves have a cost of 12.  But no reload
is needed if the source is already a member of ACC_REGS, so a more
accurate value would be:

may_move_in_cost[MODE][GR_AND_ACC_REGS][ACC_REGS]
= move_cost[MODE][GR_REGS][ACC_REGS]

which is 6 rather than 12.  In other words, if we have an insn that
requires an accumulator register, we will overestimate the cost of
GR_AND_ACC_REGS by 6.

More generally, a tighter definition of may_move_in_cost would be:

may_move_in_cost[MODE][FROM][TO]
= { 0 if FROM \subseteq TO
{ move_cost[MODE][FROM \ TO][TO] otherwise

(Note that the "if" case is still needed: move_cost[MODE][NO_REGS][TO]
is the maximum cost because NO_REGS has no registers of mode MODE.
Also, FROM \ TO might not exist as a class, so this would be a
"superdiff"; c.f. our current "superunions".)

The same reasoning applies to may_move_out_cost.

Although this definition would solve the overestimate for a single insn,
there's still an inherent overestimate in the combined costs.
E.g. consider a pseudo register P that is used in two instructions:

- I1, which writes to a GPR
- I2, which reads from an accumulator

where I1 and I2 have the same frequency F.  The cost of using
GR_AND_ACC_REGS for P is:

may_move_out_cost[MODE][GR_REGS][GR_AND_ACC_REGS] * F
+ may_move_in_cost[MODE][GR_AND_ACC_REGS][ACC_REGS] * F

which after the suggested change would be equivalent to:

move_cost[MODE][GR_REGS][ACC_REGS] * 2 * F

But only one of the two moves is needed.  The allocator will either
pick a GPR (in which case the second move is needed) or an accumulator
(in which case the first move is needed).  This will be reflected
in the costs of P as follows:

GR_REGS : 6 * F               (for I2)
ACC_REGS : 6 * F              (for I1)
GR_AND_ACC_REGS : 12 * F      (for both I1 and I2)

(Before the changed may-move costs we would have:

GR_AND_ACC_REGS : 24 * F

Now the old regclass.c code recognised this problem, and picked the
"preferred" class as the union of all classes with the minimum cost.
So P's preferred class is the union of GR_REGS and ACC_REGS,
i.e. GR_AND_ACC_REGS, even though GR_AND_ACC_REGS itself was
given a much higher cost than the individual classes.

The regclass.c code was adopted for IRA, so it too makes the
same (correct) choice of preferred class.

However -- getting to the point at last -- once it has decided to
use this union class, IRA sets the ALLOCNO_COVER_CLASS_COST to the
cost of the _union class_, not the cost of the classes whose unions
were taken.  So in the example above, ALLOCNO_COVER_CLASS_COST would be
12 * F or 24 * F rather than 6 * F.  This is an overestimate for the
reasons given above.

In dspr2-MULTU.c, the overestimate is such that the GR_AND_ACC_REGS
cost exceeds the memory cost, even though the individual GR_REGS
and ACC_REGS costs are much lower than the memory cost.  So we end
up thinking that it is cheaper to spill the register to memory.

Things are slightly more complicated with IRA than the old allocators
because we have two sets of costs:

- per-allocno (allocno_costs)
- per-register (total_costs)

The "preferred" class is chosen using total_costs, but the
ALLOCNO_COVER_CLASS_COST is taken from allocno_costs.  So although
find_allocno_class_costs knows what the minimum *total_cost* is --
namely "best_cost" -- this isn't what we want the ALLOCNO_COVER_CLASS_COST
to be.  We want a value derived from allocno_costs instead.

One fix would be work out a set of partitions PP for each class C such that
C's allocno cost would be no greater than:

MIN_{P \in PP} (MAX_{C' \in P} (allocno_cost of C'))

That would be a little complicated though, and the calculation for
each C would be quadratic in the number of register classes.

Instead, let's suppose that the preferred class is calculated as
the union of C1 ... Cn, where each Ci has the cheapest total_cost.
The approach I've taken in the patch is to calculate the maximum
of the allocno_costs for each such Ci, and use that value as the
ALLOCNO_COVER_CLASS_COST.  Then the cost of any register in the
"preferred" class should be that of the ALLOCNO_COVER_CLASS_COST.
(The costs of other registers are the same as before the patch.)

This fixes the MIPS problem: the allocno above would now have
cost 6 * F rather than 12 * F or 24 * F, and we no longer spill
accumulator values to memory for dspr2-MULT{,U}.c.  I also think
it makes conceptual sense, for the reasons given above.

I tested the patch by compiling the gcc.c-torture, gcc.dg and g++.dg
testsuites at -O2 for the targets listed below and then compared the
before and after assembly code.  The differences were as follows:

alpha-linux-gnu	no change
arc-elf		no change
arm-eabi		no change
avr-rtems		improvements in 2 tests
bfin-elf		some tests get better, some get worse; see below
cris-elf		no change
fr30-elf		no change
frv-elf		like-for-like changes in 1 test
h8300-elf		no change
hppa2.0-linux-gnu	no change
i686-linux-gnu	changes in 2 tests, both improvements by my reading
iq2000-elf		no change
m32c-elf		no change
m32r-elf		no change
m68hc11-elf		no change
m68k-elf		some tests get better, some get worse; see below
mcore-elf		no change
mipsisa64-elf		changes in 3 tests, all improvements
mmix-knuth-mmixware	no change
pdp11-bsd		no change
powerpc64-linux-gnu	no change
powerpc-linux-gnu	no change
s390-linux-gnu	no change
score-elf		no change
sh64-elf		no change
sh-elf		lots of like-for-like changes
sparc-sun-solaris2.8	no change
spu-elf		no change
v850-elf		no change
vax-netbsdelf		no change
x86_64-linux-gnu	like-for-like changes in 1 test
xstormy16-elf		like-for-like changes in 32 tests
xtensa-elf		no change

(Some of these targets haven't been ported to IRA yet.)

I've attached the diffs for all targets that saw some change.

bfin-elf:

There are a lot of like-for-like changes, a few improvements, and a
few regressions.  One of the regressions is in struct-return-2_x.c,
in which one problem is repeated in several similar functions.
Although the test is hardly representative of typical code,
the problem is interesting.

Reload decides it has to free up a register by spilling a pseudo,
so IRA has to pick the best candidate.  The two cheapest candidates are:

P1: a register that is set once and used as the source address
in a rep_movhi instruction

P2: a register that is set once and used as the R1 argument
to a function.

So P1 and P2 are set and used the same number of times, but P2 has
the additional constraint that its value must be moved into a specific
register -- R1 -- at the point P2 dies.  So:

- P2:R1 has an extra negative cost, to make R1 seem very cheap
- IRA has duly allocated R1 to P2, as hoped
- this negative cost is added to the ALLOCNO_COVER_CLASS_COST of P2
(probably correctly)

ALLOCNO_COVER_CLASS_COST, adding the negative R1 cost to P2's
ALLOCNO_COVER_CLASS_COST would make it seem cheaper to spill P1.
And that's exactly what we want to do in this case.

However, the rep_movhi instruction that uses P1 also sets P1 to

(parallel [
...
(set (reg:SI P1)    <---
(plus:SI (plus:SI (reg:SI P1) ..)))
(set (mem:BLK (...))
(mem:BLK (reg:SI P1)))
...)

This updated P1 is not used, and there is a REG_UNUSED note to
indicate this.  However, ira-costs.c still takes the destination
into account when calculating the cost of P1.  That is, we calculate
the cost of _three_ operands:

- the output operand for the original P1 definition
- the input operand for the rep_movhi
- the output operand for the unused rep_movhi destination

We're really considering the cost of potential reloads here,
so I don't think the last operand should be counted.  As it stands,
P1 has an inflated ALLOCNO_MEMORY_COST, because we wrongly assume
that we'd need to copy P1 to memory twice.

Now P1 can use either IREGS or PREGS, so its preferred class
is IPREGS.  The cost of IPREGS itself is overestimated for
the reason given above, and before the patch, we would set
ALLOCNO_COVER_CLASS_COST to this overestimated value.
This overestimate of the cover class cost goes some way
to cancelling out the overestimate of the memory cost,
so before the patch, spilling P1 had the same cost as
spilling P2.  We ended up choosing to spill P1.

After the patch, the ALLOCNO_COVER_CLASS_COST is the cost of using
either IREGS or PREGS, so the gap between the ALLOCNO_COVER_CLASS_COST
and the ALLOCNO_MEMORY_COST is greater.  This makes it seem cheaper to
spill P2.

In summary, I think ira-costs.c:record_reg_classes should not add
ira_may_move_out_cost and ira_memory_cost for unused output operands.
I'd like to test that at some point, but I've got a lot on my plate
and probably won't find time.  (An agreement in principle might
add a bit of incentive though.)

m68k-elf:

The tests that get worse have XFmode registers that only occur in move
instructions.  The situation here is similar to the MIPS one that I'm
trying to fix: GENERAL_REGS and FP_REGS have a cost of 0 for an XFmode
pseudo register P, but the union class (ALL_REGS) has a higher cost
than memory.  (Move instructions have memory-to-memory alternatives,
so memory looks cheap.  And FPR<->GPR moves have a cost of 4 rather
than 2.)

So before the patch, IRA would decide that spilling P to memory was
cheaper than allocating a register (which is not in fact true).  We then
rely on reload inheritance rather than register allocation to handle P
efficiently.

In the problem cases, reload inheritance is able to treat P as though
it had been allocated a register: the definition and uses are in the
same block, and reload is able to inherit the definition's output
reload.  The difference is that reload sees the "!" markers in the
movxf instruction, so it only considers using an FPR, never a data
or address register.  IRA (and the old reg-class.c code) ignore "!"
markers, so GPRs and FPRs seem equally cheap.

Thus if we allocate the register using IRA, the reg_alloc_order causes
us to prefer a GPR.  If we don't allocate a register and rely on reload
inheritance, we would only ever use an FPR.  And GPR XFmode moves are
more expensive than FPR moves, which leads to the code-quality regression.

Andrew recently raised a similar issue for PowerPC:

http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01323.html

and I believe the decision reached on that thread is that IRA and
reg-class.c are doing the right thing by ignoring "!".  If so, I think
this is a bug in the m68k port.  The movxf GPR alternatives are more
expensive than the FPR ones, so they should have "?" markers as well
as "!" markers.  (Or maybe "*" markers, but they've always seemed like
a bit of a hack.)

Bootstrapped & regression-tested on x86_64-linux-gnu.  OK to install?

Richard

gcc/
* ira-costs.c (find_allocno_class_costs): Work out the maximum
allocno_costs value of the classees with the lowest total_costs
value.  Use this to set ALLOCNO_COVER_CLASS_COST here...
(setup_allocno_cover_class_and_costs): ...rather than here.
Use the ALLOCNO_COVER_CLASS_COST for all registers in the
preferred class.

Index: gcc/ira-costs.c
===================================================================
--- gcc/ira-costs.c	2008-11-02 15:37:40.000000000 +0000
+++ gcc/ira-costs.c	2008-11-07 23:58:50.000000000 +0000
@@ -1149,7 +1149,7 @@ find_allocno_class_costs (void)
ira_allocno_t a, parent_a;
int rclass, a_num, parent_a_num;
ira_loop_tree_node_t parent;
-	  int best_cost;
+	  int best_cost, allocno_cost;
enum reg_class best, alt_class, common_class;
#ifdef FORBIDDEN_INC_DEC_CLASSES
int inc_dec_p = false;
@@ -1255,6 +1255,7 @@ find_allocno_class_costs (void)
/* Finding best class which is subset of the common
class.  */
best_cost = (1 << (HOST_BITS_PER_INT - 2)) - 1;
+		  allocno_cost = best_cost;
best = ALL_REGS;
for (k = 0; k < cost_classes_num; k++)
{
@@ -1279,12 +1280,21 @@ find_allocno_class_costs (void)
{
best_cost
= COSTS_OF_ALLOCNO (total_costs, a_num)->cost[k];
+			  allocno_cost
+			    = COSTS_OF_ALLOCNO (allocno_costs, a_num)->cost[k];
best = (enum reg_class) rclass;
}
else if (COSTS_OF_ALLOCNO (total_costs, a_num)->cost[k]
== best_cost)
-			best = ira_reg_class_union[best][rclass];
+			{
+			  best = ira_reg_class_union[best][rclass];
+			  allocno_cost
+			    = MAX (allocno_cost,
+				   COSTS_OF_ALLOCNO (allocno_costs,
+						     a_num)->cost[k]);
+			}
}
+		  ALLOCNO_COVER_CLASS_COST (a) = allocno_cost;
}
if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL
&& (pass == 0 || allocno_pref[a_num] != best))
@@ -1398,6 +1408,7 @@ setup_allocno_cover_class_and_costs (voi
int *reg_costs;
enum reg_class cover_class, rclass;
enum machine_mode mode;
+  HARD_REG_SET *pref;
ira_allocno_t a;
ira_allocno_iterator ai;

@@ -1412,10 +1423,7 @@ setup_allocno_cover_class_and_costs (voi
if (cover_class == NO_REGS)
continue;
ALLOCNO_AVAILABLE_REGS_NUM (a) = ira_available_class_regs[cover_class];
-      num = cost_class_nums[allocno_pref[i]];
-      ira_assert (num >= 0);
-      ALLOCNO_COVER_CLASS_COST (a)
-	= COSTS_OF_ALLOCNO (allocno_costs, i)->cost[num];
+      pref = &reg_class_contents[allocno_pref[i]];
if (optimize && ALLOCNO_COVER_CLASS (a) != allocno_pref[i])
{
n = ira_class_hard_regs_num[cover_class];
@@ -1424,17 +1432,23 @@ setup_allocno_cover_class_and_costs (voi
for (j = n - 1; j >= 0; j--)
{
regno = ira_class_hard_regs[cover_class][j];
-	      rclass = REGNO_REG_CLASS (regno);
-	      num = cost_class_nums[rclass];
-	      if (num < 0)
+	      if (TEST_HARD_REG_BIT (*pref, regno))
+		reg_costs[j] = ALLOCNO_COVER_CLASS_COST (a);
+	      else
{
-		  /* The hard register class is not a cover class or a
-		     class not fully inside in a cover class -- use
-		     the allocno cover class.  */
-		  ira_assert (ira_hard_regno_cover_class[regno] == cover_class);
-		  num = cost_class_nums[cover_class];
+		  rclass = REGNO_REG_CLASS (regno);
+		  num = cost_class_nums[rclass];
+		  if (num < 0)
+		    {
+		      /* The hard register class is not a cover class or a
+			 class not fully inside in a cover class -- use
+			 the allocno cover class.  */
+		      ira_assert (ira_hard_regno_cover_class[regno]
+				  == cover_class);
+		      num = cost_class_nums[cover_class];
+		    }
+		  reg_costs[j] = COSTS_OF_ALLOCNO (allocno_costs, i)->cost[num];
}
-	      reg_costs[j] = COSTS_OF_ALLOCNO (allocno_costs, i)->cost[num];
}
}
}



Index Nav: Message Nav: [Date Index] [Subject Index] [Author Index] [Thread Index] [Date Prev] [Date Next] [Thread Prev] [Thread Next]