This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Understanding IRA


Jeff Law wrote: 
> On 11/16/09 10:33, Ian Bolton wrote:
> > The question is: how to fix this?  I have initially put the
> REG_ALLOC_ORDER
> > back to how it was and changed the operand constraints in our MD
> file,
> > so each of the "apathetic" instructions, will either take a 't'
> (TOP_CREG)
> > or '?b' (BOTTOM_REG).  The '?' shows that this alternative is
> slightly more
> > costly than using 't'.  On the benchmark that benefitted the most
> from
> > the new REG_ALLOC_ORDER, these constraints are almost achieving the
> same
> > thing.  It is only "almost" there because I am struggling with how to
> show
> > two alternatives for loads and stores, which currently have an 'm'
> > constraint.
> >
> I'm not aware of any way to describe this to IRA.  In theory I guess
> IRA
> could be twiddled to use  TARGET_ADDRESS_COST to derive some kind of
> cost difference based on the registers used in the MEM, but it seems
> rather hackish.

I found somewhere in record_address_reg to achieve what I needed:

  for (k = 0; k < cost_classes_num; k++)
  {
    i = cost_classes[k];
    pp->cost[k]
      += (ira_get_may_move_cost (Pmode, i, rclass, true) * scale) / 2;

    /* Slightly nudge memory addresses away from using BOTTOM_REGS and
       C_REGS, so they take TOP_CREGS instead - should this pseudo later
       need BOTTOM_REGS, there will be a higher cost to use TOP_CREGS
       and it will still get BOTTOM_REGS. This is equivalent to adding a
       ?b on each instruction that currently has a 'm' constraint.

       Writing this generically might look something like:

       pp->cost[k] += TARGET_ADDRESS_EXTRA_COST_P(cost_classes[k])
                      ? (scale/2) : 0;
    */
    if (cost_classes[k] == BOTTOM_REGS || cost_classes[k] == C_REGS)
      pp->cost[k] += (scale) / 2;
  }

I was then able to alter all our register-agnostic instructions in our
.md file to take either a 't' for TOP_CREGS for a '?b' for BOTTOM_REGS.
Initial results showed that IRA was moving input arguments out of their
BOTTOM_REGS (e.g. $c1) into TOP_CREGS to do work on them, since it
thought TOP_CREGS were less costly to use, despite the cost of the move
instruction to get the input argument into a TOP_CREG.

I wondered if this was because we add 2 to the alt_cost for each '?' and
our REG_MOVE_COST is also 2, but this becomes irrelevant anyway if you
do a lot of work with the input argument, since each potential use in a
BOTTOM_REG incurs some kind of penalty (one per '?' seen) which will
eventually persuade IRA that leaving the input argument in the
BOTTOM_REG it arrived in is more costly than moving it to a TOP_CREG.

I addressed this problem by splitting my register bank a little
differently: instead of making a distinction between BOTTOM_REGS and
TOP_CREGS, I made it so there was only a penalty if you used one of the
non-argument BOTTOM_REGS (i.e. a callee-save BOTTOM_REG).  This meant
that IRA was happy to leave input arguments in their BOTTOM_REGS but
erred towards using TOP_CREGS once the caller-save BOTTOM_REGS had run
out.  This was an improvement, but there was still a case where these
'?' penalties were not aligned with reality:

T1 = A + B; // can use any register, TOP_CREGS appears cheaper
T2 = A - C; // can use any register, TOP_CREGS appears cheaper
T3 = A & D; // must use BOTTOM_REGS

The constraints for the first two instructions show that TOP_CREGS is
cheaper, but then you have to plant a move to get A into a BOTTOM_REG
to do the AND; in reality, we know it cheaper to have A in a BOTTOM_REG
all along, but the '?' constraint suggests there is a cost in doing this
for the ADD and SUB and so IRA will put A in a TOP_CREG at first and
incur the cost of the move because it is still cheaper than the costs I
have defined in with my constraints.  I don't believe there is a way to
communicate a conditional cost, so I'm thinking that constraints are not
the solution for me at this time.  What are your thoughts?

> You might try something like this:
> 
>    1. Crank up the callee-saved register cost adjustment in
> assign_hard_reg so that it's scaled based on REG_FREQ.  That will
> probably lead to some regressions based on my experiments.
> 
>    2. Then add a check that completely avoids the cost adjustment in
> cases where we pushed a MAY_SPILL_P allocno.  This was on my todo list,
> but I haven't got to it yet.
> 
> If you wanted to get fancy, you could track the maximum number of
> neighbors in each class as allocnos are pushed and use that to adjust
> how many registers are cost adjusted in assign_hard_reg.  The idea
> being
> the more neighbors the allocno has, the  more callee-saved regsiters
> we're likely to need.
> 
> You could also try to account for the fact that once allocated, the
> callee saved regsiter is available "for free" to non-conflicting
> allocnos.   So you could iterate over those and decrease the penalty
> for
> using a callee saved register on the current allocno.  Given the
> interfaces provided by IRA, this could be compile-time expensive.
> 

Your "small improvement to IRA allocations" patch posted yesterday
achieves something very similar to this, right?

In the case of our BOTTOM_REGS and TOP_CREGS split, whether this pays
off depends on which allocno is colored first.  If we color the one that
needs BOTTOM_REGS first then we will later reuse that BOTTOM_REG to do
work that can be done in any register.  But if we color the one that can
be done in any register first, my REG_ALLOC_ORDER is giving out TOP_CREGS
first, which then can't be reused later to do the BOTTOM_REGS work.

I guess when we are coloring an agnostic allocno that will take any reg,
we could check if any of the non-conflicting allocnos want a BOTTOM_REG
and therefore give out a BOTTOM_REG since it can be reused later.
Alternatively, we could walk over all conflicting allocnos when coloring
this one and check for how many conflicting allocnos require BOTTOM_REGS
and make sure we don't give one out now if we will otherwise run out.

Obviously, this will make things even more expensive for compilation, but
I'm not sure I can get what I require without some kind of "look-ahead"
mechanism like in the two ideas above.  Unless I simply change the
coloring algorithm so that we always color those that need BOTTOM_REGS
first?  I think this is effectively what I was achieving with my early
hacks involving the cost adjustments of 6 and 20.

Thanks for reading, 
Ian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]