This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
RE: Understanding IRA
Jeff Law wrote:
> On 11/16/09 10:33, Ian Bolton wrote:
> > The question is: how to fix this? I have initially put the
> REG_ALLOC_ORDER
> > back to how it was and changed the operand constraints in our MD
> file,
> > so each of the "apathetic" instructions, will either take a 't'
> (TOP_CREG)
> > or '?b' (BOTTOM_REG). The '?' shows that this alternative is
> slightly more
> > costly than using 't'. On the benchmark that benefitted the most
> from
> > the new REG_ALLOC_ORDER, these constraints are almost achieving the
> same
> > thing. It is only "almost" there because I am struggling with how to
> show
> > two alternatives for loads and stores, which currently have an 'm'
> > constraint.
> >
> I'm not aware of any way to describe this to IRA. In theory I guess
> IRA
> could be twiddled to use TARGET_ADDRESS_COST to derive some kind of
> cost difference based on the registers used in the MEM, but it seems
> rather hackish.
I found somewhere in record_address_reg to achieve what I needed:
for (k = 0; k < cost_classes_num; k++)
{
i = cost_classes[k];
pp->cost[k]
+= (ira_get_may_move_cost (Pmode, i, rclass, true) * scale) / 2;
/* Slightly nudge memory addresses away from using BOTTOM_REGS and
C_REGS, so they take TOP_CREGS instead - should this pseudo later
need BOTTOM_REGS, there will be a higher cost to use TOP_CREGS
and it will still get BOTTOM_REGS. This is equivalent to adding a
?b on each instruction that currently has a 'm' constraint.
Writing this generically might look something like:
pp->cost[k] += TARGET_ADDRESS_EXTRA_COST_P(cost_classes[k])
? (scale/2) : 0;
*/
if (cost_classes[k] == BOTTOM_REGS || cost_classes[k] == C_REGS)
pp->cost[k] += (scale) / 2;
}
I was then able to alter all our register-agnostic instructions in our
.md file to take either a 't' for TOP_CREGS for a '?b' for BOTTOM_REGS.
Initial results showed that IRA was moving input arguments out of their
BOTTOM_REGS (e.g. $c1) into TOP_CREGS to do work on them, since it
thought TOP_CREGS were less costly to use, despite the cost of the move
instruction to get the input argument into a TOP_CREG.
I wondered if this was because we add 2 to the alt_cost for each '?' and
our REG_MOVE_COST is also 2, but this becomes irrelevant anyway if you
do a lot of work with the input argument, since each potential use in a
BOTTOM_REG incurs some kind of penalty (one per '?' seen) which will
eventually persuade IRA that leaving the input argument in the
BOTTOM_REG it arrived in is more costly than moving it to a TOP_CREG.
I addressed this problem by splitting my register bank a little
differently: instead of making a distinction between BOTTOM_REGS and
TOP_CREGS, I made it so there was only a penalty if you used one of the
non-argument BOTTOM_REGS (i.e. a callee-save BOTTOM_REG). This meant
that IRA was happy to leave input arguments in their BOTTOM_REGS but
erred towards using TOP_CREGS once the caller-save BOTTOM_REGS had run
out. This was an improvement, but there was still a case where these
'?' penalties were not aligned with reality:
T1 = A + B; // can use any register, TOP_CREGS appears cheaper
T2 = A - C; // can use any register, TOP_CREGS appears cheaper
T3 = A & D; // must use BOTTOM_REGS
The constraints for the first two instructions show that TOP_CREGS is
cheaper, but then you have to plant a move to get A into a BOTTOM_REG
to do the AND; in reality, we know it cheaper to have A in a BOTTOM_REG
all along, but the '?' constraint suggests there is a cost in doing this
for the ADD and SUB and so IRA will put A in a TOP_CREG at first and
incur the cost of the move because it is still cheaper than the costs I
have defined in with my constraints. I don't believe there is a way to
communicate a conditional cost, so I'm thinking that constraints are not
the solution for me at this time. What are your thoughts?
> You might try something like this:
>
> 1. Crank up the callee-saved register cost adjustment in
> assign_hard_reg so that it's scaled based on REG_FREQ. That will
> probably lead to some regressions based on my experiments.
>
> 2. Then add a check that completely avoids the cost adjustment in
> cases where we pushed a MAY_SPILL_P allocno. This was on my todo list,
> but I haven't got to it yet.
>
> If you wanted to get fancy, you could track the maximum number of
> neighbors in each class as allocnos are pushed and use that to adjust
> how many registers are cost adjusted in assign_hard_reg. The idea
> being
> the more neighbors the allocno has, the more callee-saved regsiters
> we're likely to need.
>
> You could also try to account for the fact that once allocated, the
> callee saved regsiter is available "for free" to non-conflicting
> allocnos. So you could iterate over those and decrease the penalty
> for
> using a callee saved register on the current allocno. Given the
> interfaces provided by IRA, this could be compile-time expensive.
>
Your "small improvement to IRA allocations" patch posted yesterday
achieves something very similar to this, right?
In the case of our BOTTOM_REGS and TOP_CREGS split, whether this pays
off depends on which allocno is colored first. If we color the one that
needs BOTTOM_REGS first then we will later reuse that BOTTOM_REG to do
work that can be done in any register. But if we color the one that can
be done in any register first, my REG_ALLOC_ORDER is giving out TOP_CREGS
first, which then can't be reused later to do the BOTTOM_REGS work.
I guess when we are coloring an agnostic allocno that will take any reg,
we could check if any of the non-conflicting allocnos want a BOTTOM_REG
and therefore give out a BOTTOM_REG since it can be reused later.
Alternatively, we could walk over all conflicting allocnos when coloring
this one and check for how many conflicting allocnos require BOTTOM_REGS
and make sure we don't give one out now if we will otherwise run out.
Obviously, this will make things even more expensive for compilation, but
I'm not sure I can get what I require without some kind of "look-ahead"
mechanism like in the two ideas above. Unless I simply change the
coloring algorithm so that we always color those that need BOTTOM_REGS
first? I think this is effectively what I was achieving with my early
hacks involving the cost adjustments of 6 and 20.
Thanks for reading,
Ian