This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: MIPS: 2'nd pass of ira, causes weird register allocation for 2-op mult
- From: Richard Sandiford <rdsandiford at googlemail dot com>
- To: Klaus Pedersen <projectu at gmail dot com>
- Cc: gcc at gnu dot org, vmakarov at redhat dot com
- Date: Mon, 28 May 2012 20:09:53 +0100
- Subject: Re: MIPS: 2'nd pass of ira, causes weird register allocation for 2-op mult
- References: <CANbV6W==XSF_kjhbikAkyNy8WCMy29ScWoUcnH35FnJaq-gwnw@mail.gmail.com>
Klaus Pedersen <projectu@gmail.com> writes:
> The summery goes something like this:
>
> It is possible for the second pass of ira to get confused and decide that
> NO_REGS or a hard float register are better choices for the result of the
> 2 operand mult. First pass already optimally allocated in GR_AND_MD1_REGS.
Yeah. I'm afraid this is something I've been sitting on for a while now.
I think the problem boils down to this:
> Pass 0 for finding pseudo/allocno costs
>
> a2 (r209,l0) best GR_REGS, allocno GR_REGS
> a4 (r208,l0) best GR_REGS, allocno GR_REGS
> a5 (r207,l0) best GR_REGS, allocno GR_REGS
> a1 (r200,l0) best GR_REGS, allocno GR_REGS
> a3 (r199,l0) best GR_AND_MD1_REGS, allocno GR_AND_MD1_REGS
> a0 (r194,l0) best GR_REGS, allocno GR_REGS
>
> ...
> a3(r199,l0) costs: M16_REGS:6000,6000 T_REG:6000,6000
> M16_T_REGS:6000,6000 PIC_FN_ADDR_REG:6000,6000 V1_REG:6000,6000
> LEA_REGS:6000,6000 GR_REGS:6000,6000 FP_REGS:14000,14000
> MD1_REG:6000,6000 MD_REGS:2000000,2000000 ACC_REGS:2000000,2000000
> GR_AND_MD0_REGS:2000000,2000000 GR_AND_MD1_REGS:18000,18000
> GR_AND_MD_REGS:2000000,2000000 GR_AND_ACC_REGS:2000000,2000000
> ALL_REGS:2000000,2000000 MEM:14000,14000
Here the cost of using GR_REGS is 6000 and the cost of using MD1_REG
is also 6000. But the cost of using _either_ GR_REGS _or_ MD1_REG
is 18000! That doesn't make conceptual sense.
This doesn't matter during the first pass because we collect the
union of the cheapest classes:
if (i_costs[k] < best_cost)
{
best_cost = i_costs[k];
best = (enum reg_class) rclass;
}
else if (i_costs[k] == best_cost)
best = ira_reg_class_subunion[best][rclass];
I.e. if the cost of using X is C and the cost of using Y is C, we assume
that the cost of using either X or Y is also C. We don't care in the
"else" case about the cost of ira_reg_class_subunion[best][rclass],
because if that cost is something other than C, it's wrong.
So despite what the dump says, the first pass is effectively
(and correctly) giving GR_AND_MD1_REGS a cost of 6000.
But the second pass goes out of its way to avoid calculating the costs
of these constituent classes, and just calculates the cost of
GR_AND_MD1_REGS itself:
/* We exclude classes from consideration which are subsets of
ACLASS only if ACLASS is a pressure class or subset of a
pressure class. It means by the definition of pressure classes
that cost of moving between susbets of ACLASS is cheaper than
load or store. */
for (i = 0; i < ira_pressure_classes_num; i++)
{
cl = ira_pressure_classes[i];
if (cl == aclass)
break;
COPY_HARD_REG_SET (temp2, reg_class_contents[cl]);
AND_COMPL_HARD_REG_SET (temp2, ira_no_alloc_regs);
if (hard_reg_set_subset_p (temp, temp2))
break;
}
exclude_p = i < ira_pressure_classes_num;
classes.num = 0;
for (i = 0; i < ira_important_classes_num; i++)
{
cl = ira_important_classes[i];
if (exclude_p)
{
/* Exclude no-pressure classes which are subsets of
ACLASS. */
COPY_HARD_REG_SET (temp2, reg_class_contents[cl]);
AND_COMPL_HARD_REG_SET (temp2, ira_no_alloc_regs);
if (! ira_reg_pressure_class_p[cl]
&& hard_reg_set_subset_p (temp2, temp) && cl != aclass)
continue;
}
classes.classes[classes.num++] = cl;
}
So we get:
> a3(r199,l0) costs: FP_REGS:14000,14000 MD_REGS:2000000,2000000
> ACC_REGS:2000000,2000000 GR_AND_MD0_REGS:2000000,2000000
> GR_AND_MD1_REGS:18000,18000 GR_AND_MD_REGS:2000000,2000000
> GR_AND_ACC_REGS:2000000,2000000 ALL_REGS:2000000,2000000
> MEM:14000,14000
I.e. the same incorrect value for GR_AND_MD1_REGS, but without the
individual GR_REGS and MD1_REG costs that would give the correct value.
Part of the problem is that may_move_{in,out}_cost is too conservative.
It is either the full cost of a move or 0. E.g. take:
MMOC == may_move_out_cost[mode][X][Y]
which is used where an output constraint specifies class X and the
allocation class is Y. MMOC is 0 if Y is a subset of X, because no move
is ever needed in that case. But in other cases MMOC is the worst-case
cost of moving from X to _anything_ in Y. So if Y is X \union X', say,
then MMOC will include the cost of an X->X move, even though a move is
only needed for registers in the X' part of Y.
We could do better by making:
may_move_out_cost[mode][X][Y] = move_cost[mode][X][superdiff[Y][X]]
where superdiff is a new array specifying the smallest class that contains
Y - X. But even that's too pessimistic. When we see an X constraint,
we'll count the cost of:
move_cost[mode][X][X']
And when we see an X' constraint, we'll count the cost of:
move_cost[mode][X'][X]
And we'll sum these costs together, even though they represent
mutually-exclusive scenarios. In cases like yours (which are common
on MIPS) we'll end up with a GR_AND_MD1_REGS cost of 12000 instead
of 18000, but the value really ought to be 6000.
I think the only practical way of calculating accurate costs for things
like GR_AND_MD_REGS really is to count the cost of the constituent classes
and then take their MAX.
Vlad, what do you think? Is the above exclude_p code "just" a compile-time
speed-up? Or is it a conceptual part of the algorithm? More generally,
what kind of situations does the second pass help with? I.e. how does
it improve on the first pass?
Richard