This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Understanding IRA


> Ian Bolton wrote:
> 
> (NOTE: this is a repost to gcc@gcc.gnu.org of a private message
> originally sent to just Jeff Law and Vladimir Makarov.)
> 
> > Vladimir Makarov wrote:
> > Jeff Law wrote:
> > > On 11/03/09 09:29, Ian Bolton wrote:
> > >> Hi again, Vladimir,
> > >>
> > >> I am pleased to report some performance improvements after
> altering
> > >> ira-costs.c.  A key benchmark for us has improved by 5%.
> > >>
> > >> Specifically, in record_reg_classes(), after the alt_cost has
been
> > >> calculated and it will be applied to pp->mem_cost and
pp->cost[k],
> I
> > >> check whether this particular operand wanted one of our
> BOTTOM_REGS
> > >> (r0-r15) and I further increase the pp->mem_cost by an arbitrary
> > >> amount and also increase pp->cost[k] by an arbitrary amount if k
> > >> does not represent the BOTTOM_REGS class.  My aim here is to
nudge
> > >> IRA in the right direction for operands that just want
> BOTTOM_REGS.
> > >>
> > >> After experimenting with different values for my "arbitrary
> > >> amounts", I discovered some that successfully made IRA more
likely
> > >> to give BOTTOM_REGS to those instructions/operands that want
> > >> BOTTOM_REGS, since any other regs and memory ended up with high
> > >> enough costs for IRA to try and avoid using them.
> > >>
> > >> I have included a snippet from my version of record_reg_classes()
> > >> below:
> > >>
> > >>
> > > What I don't understand at this point is why the current
mechanisms
> > in
> > > IRA aren't showing a lower cost for using BOTTOM_REGS (or a higher
> > > cost for TOP_REGS).  i.e.  I don't think any of this should be
> > > necessary as IRA should already be doing something similar.
> > >
> > > This may be a case where your backend hasn't indicated that
> TOP_REGS
> > > has a higher cost than BOTTOM_REGS in particular situations.
> > >
> > I am agree with Jeff.  It is hard to understand what you are doing
> > without the architecture knowledge and some macro values in your
port
> > (IRA_COVER_CLASSES, MEMORY_MOVE_COST, and REGISTER_MOVE_COST).
> >
> > I'd also add that besides right macro value definitions, you could
> use
> > insn alternative hints near register constraints like ? or even *.
> >
> 
> FYI, my IRA_COVER_CLASSES is simply this:
> 
> #define IRA_COVER_CLASSES	\
> {					\
>   GENERAL_REGS,  			\
>   LIM_REG_CLASSES 		\
> }
> 
> REGISTER_MOVE_COST is currently set to the default of 2, in
defaults.h.
> On our architecture, it takes 1 cycle to move from one register to
> another, so I assume 2 would be the correct number if this is meant to
> represent the cost of moving there and back again?  Or is the cost 2
> to represent the cycle impact (1) plus the size impact (1)?
> 
> MEMORY_MOVE_COST is set to be 4 + memory_move_secondary_cost in
> reload.h.  I've not delved into what is happening in that function for
> my particular case, but I see it mentions REGISTER_MOVE_COST in there
> too.
> 
> (As you can see from my previous email, I have increased the costs for
> allocnos where we want BOTTOM_REGS and get TOP_CREGS or MEM by 6*freq
> and 20*freq, respectively.  I do not know whether these numbers are
> sensible at this stage, since I do not fully understand what the costs
> are meant to represent.)
> 
> After Jeff mentioned that IRA should already be doing the right thing,
> I had another look at the 176r.ira dump file and realised that I had
> originally only seen "Pass 0 for finding allocno costs" in that file.
> In fact, some costs for allocnos are added in Pass 1.
> 
> For example, in unmodified IRA with priority algorithm and mixed
> region:
> 
> a0(r230,l0) costs: BOTTOM_REGS:0,0 TOP_CREGS:0,0 C_REGS:0,0 MEM:32000
> a1(r203,l0) costs: BOTTOM_REGS:0,0 TOP_CREGS:0,8000 C_REGS:0,8000
> MEM:8000
> ...
> a29(r203,l1) costs: BOTTOM_REGS:0,0 TOP_CREGS:8000,8000
> C_REGS:8000,8000
> MEM:20000
> 
> From looking at 174r.sched1, I can see that r203 is defined in l0 and
> used as an operand that needed BOTTOM_REGS in l1.  Notice that the
> allocno cost is 0 for l0, but the total cost is 8000, since this has
> been passed on from the 8000 for allocno in l1.
> 
> r203 ends up getting register #23, which is not a BOTTOM_REG, so we
> need
> to generate a move in region l1 before we can use this value.
> 
> What is interesting is that in my modified version of IRA, both lines
> that mention r203 have an allocno cost:
> 
> a0(r230,l0) costs: BOTTOM_REGS:0,0 TOP_CREGS:0,0 C_REGS:0,0 MEM:32000
> a1(r203,l0) costs: BOTTOM_REGS:0,0 TOP_CREGS:12000,50000
> C_REGS:12000,50000 MEM:48000
> a29(r203,l1) costs: BOTTOM_REGS:0,0 TOP_CREGS:38000,38000
> C_REGS:38000,38000 MEM:138000
> 
> I'm not really sure how my change in record_reg_classes has ended up
> altering the allocno cost in l0, but that does seem to be the key.
> I will continue investigating this.
> 
> Another thing you can notice from the above cost output is that
> allocnos
> that are going to be used for memory operations are getting their MEM
> cost set high, to prevent them being put in MEM.  (r230 gets used
> several
> times to load in values and store out values.)
> 
> But in my modified IRA, my grossly inflated MEM costs for allocnos
that
> want BOTTOM_REGS are causing some of those allocnos to get a higher
> priority than these ones that are to be used as memory address.  See
> that a29/r203 has a MEM score of 138000, which means it gets first
pick
> of the available registers, so more chance of getting the BOTTOM_REG
> that it wants.
> 
> I wonder if the value being used for these memory address allocnos
> could
> be tuned.  I also wonder if we could persuade them to prefer
TOP_CREGS,
> thus leaving more BOTTOM_REGS for those that want them.
> 

I think I may have made a breakthrough!

As mentioned above, IRA is correctly increasing the cost for TOP_REGS
when an allocno in region 1 is being used in one of our special
instructions that needs BOTTOM_REGS.  We can also see that IRA is
passing that allocno cost up to the associated allocno in region 0 and
altering the total_cost for that allocno.

However, it does not appear as though IRA is doing anything with the
total_costs, apart from using it to determine the preferred class for
the allocno.  When the main logic of the algorithms comes into play,
we only look at allocno_costs, which do not take into account the
future usage of that register in the allocno in region 1.

I believe that if the decisions were made based on total_costs then I
would get a better allocation with existing IRA.  Before I experiment
with this, I was wondering what the motivation is for only looking
at allocno_costs?

BTW, I did look into using the ! and * constraints, but I don't think
they could help.  Our architecture is like Motorola 68k in that we have
some instructions that can use all the registers and some that can
only use the subset (BOTTOM_REGS).  The latter type can only specify
"b" for their constraint, since they can only go in BOTTOM_REGS.  The
former type might benefit from "b,!t", so we are more likely to get
things in BOTTOM_REGS for when they are later needed there, but the
flip-side is that we might also fill up BOTTOM_REGS when actually there
was no subsequent need for the value to be in BOTTOM_REGS.  I may have
misunderstood how all this works, but it looks like constraints will
not help here and, in fact, the total_costs calculations that I
mention above are precisely the way IRA passes information about
register requirements upwards.

Best regards,
Ian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]