This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: MIPS: "bal" micro-optimization

At Tue, 3 May 2005 19:27:26 +0000 (UTC), "Maciej W. Rozycki" wrote:
>  MIPS startup files use "bal . + 8" to get the value of the PC.  This 
> clearly shows the code in question is not interested in actually taking 
> the branch as a side effect.  Replacing these instructions with "bltzal 
> $0, . + 8" keeps the code retrieving the PC into $ra, yet the branch is 
> not taken saving an unnecessary pipeline flush.

Is this really desirable?

Have you actually measured improvements on modern processors with this

Have you actually measured improvements on ancient processors with
this change?

I ask because...

* modern processors often predict branches, and ifetch/decode is often
  decoupled from actual instruction execution.

  -> i.e., the cost of taking that branch may be entirely hidden by
     front-end prediction and prefetching code.

* it's entirely reasonable to hard-code 'bal' as always taken --
  thereby making the prediction easier, and possibly saving an entry
  in the prediction table...  (Doing this for all cases of
  branch-compare-vs.-0 not called out as special cases in the
  architecture may be less desirable...)  So, this change could
  actually lose.

* if the goal is to just get the PC and performance is an issue, using
  many of these instructions (esp. with RA as the destination) may
  mess with return prediction stacks...

So, back to the basic question:

Is this desirable, and does it actually win?  or is it just churn?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]