This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Understanding IRA


(NOTE: this is a repost to gcc@gcc.gnu.org of a private message
originally sent to just Jeff Law and Vladimir Makarov.)

> Vladimir Makarov wrote:
> Jeff Law wrote:
> > On 11/03/09 09:29, Ian Bolton wrote:
> >> Hi again, Vladimir,
> >>
> >> I am pleased to report some performance improvements after altering
> >> ira-costs.c.  A key benchmark for us has improved by 5%.
> >>
> >> Specifically, in record_reg_classes(), after the alt_cost has been
> >> calculated and it will be applied to pp->mem_cost and pp->cost[k],
I
> >> check whether this particular operand wanted one of our BOTTOM_REGS
> >> (r0-r15) and I further increase the pp->mem_cost by an arbitrary
> >> amount and also increase pp->cost[k] by an arbitrary amount if k
> >> does not represent the BOTTOM_REGS class.  My aim here is to nudge
> >> IRA in the right direction for operands that just want BOTTOM_REGS.
> >>
> >> After experimenting with different values for my "arbitrary
> >> amounts", I discovered some that successfully made IRA more likely
> >> to give BOTTOM_REGS to those instructions/operands that want
> >> BOTTOM_REGS, since any other regs and memory ended up with high
> >> enough costs for IRA to try and avoid using them.
> >>
> >> I have included a snippet from my version of record_reg_classes()
> >> below:
> >>
> >>
> > What I don't understand at this point is why the current mechanisms
> in
> > IRA aren't showing a lower cost for using BOTTOM_REGS (or a higher
> > cost for TOP_REGS).  i.e.  I don't think any of this should be
> > necessary as IRA should already be doing something similar.
> >
> > This may be a case where your backend hasn't indicated that TOP_REGS
> > has a higher cost than BOTTOM_REGS in particular situations.
> >
> I am agree with Jeff.  It is hard to understand what you are doing
> without the architecture knowledge and some macro values in your port
> (IRA_COVER_CLASSES, MEMORY_MOVE_COST, and REGISTER_MOVE_COST).
> 
> I'd also add that besides right macro value definitions, you could use
> insn alternative hints near register constraints like ? or even *.
>

FYI, my IRA_COVER_CLASSES is simply this:

#define IRA_COVER_CLASSES	\
{					\
  GENERAL_REGS,  			\
  LIM_REG_CLASSES 		\
}

REGISTER_MOVE_COST is currently set to the default of 2, in defaults.h.
On our architecture, it takes 1 cycle to move from one register to
another, so I assume 2 would be the correct number if this is meant to
represent the cost of moving there and back again?  Or is the cost 2
to represent the cycle impact (1) plus the size impact (1)?

MEMORY_MOVE_COST is set to be 4 + memory_move_secondary_cost in
reload.h.  I've not delved into what is happening in that function for
my particular case, but I see it mentions REGISTER_MOVE_COST in there
too.

(As you can see from my previous email, I have increased the costs for
allocnos where we want BOTTOM_REGS and get TOP_CREGS or MEM by 6*freq
and 20*freq, respectively.  I do not know whether these numbers are
sensible at this stage, since I do not fully understand what the costs
are meant to represent.)

After Jeff mentioned that IRA should already be doing the right thing,
I had another look at the 176r.ira dump file and realised that I had
originally only seen "Pass 0 for finding allocno costs" in that file.
In fact, some costs for allocnos are added in Pass 1.

For example, in unmodified IRA with priority algorithm and mixed
region:

a0(r230,l0) costs: BOTTOM_REGS:0,0 TOP_CREGS:0,0 C_REGS:0,0 MEM:32000
a1(r203,l0) costs: BOTTOM_REGS:0,0 TOP_CREGS:0,8000 C_REGS:0,8000
MEM:8000
...
a29(r203,l1) costs: BOTTOM_REGS:0,0 TOP_CREGS:8000,8000 C_REGS:8000,8000
MEM:20000

From looking at 174r.sched1, I can see that r203 is defined in l0 and
used as an operand that needed BOTTOM_REGS in l1.  Notice that the
allocno cost is 0 for l0, but the total cost is 8000, since this has
been passed on from the 8000 for allocno in l1.

r203 ends up getting register #23, which is not a BOTTOM_REG, so we need
to generate a move in region l1 before we can use this value.

What is interesting is that in my modified version of IRA, both lines
that mention r203 have an allocno cost:

a0(r230,l0) costs: BOTTOM_REGS:0,0 TOP_CREGS:0,0 C_REGS:0,0 MEM:32000
a1(r203,l0) costs: BOTTOM_REGS:0,0 TOP_CREGS:12000,50000
C_REGS:12000,50000 MEM:48000
a29(r203,l1) costs: BOTTOM_REGS:0,0 TOP_CREGS:38000,38000
C_REGS:38000,38000 MEM:138000

I'm not really sure how my change in record_reg_classes has ended up
altering the allocno cost in l0, but that does seem to be the key.
I will continue investigating this.

Another thing you can notice from the above cost output is that allocnos
that are going to be used for memory operations are getting their MEM
cost set high, to prevent them being put in MEM.  (r230 gets used
several
times to load in values and store out values.)

But in my modified IRA, my grossly inflated MEM costs for allocnos that
want BOTTOM_REGS are causing some of those allocnos to get a higher
priority than these ones that are to be used as memory address.  See
that a29/r203 has a MEM score of 138000, which means it gets first pick
of the available registers, so more chance of getting the BOTTOM_REG
that it wants.

I wonder if the value being used for these memory address allocnos could
be tuned.  I also wonder if we could persuade them to prefer TOP_CREGS,
thus leaving more BOTTOM_REGS for those that want them.

Best regards,
Ian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]