MIPS: "bal" micro-optimization

cgd@broadcom.com cgd@broadcom.com
Tue May 3 19:39:00 GMT 2005


At Tue, 3 May 2005 19:27:26 +0000 (UTC), "Maciej W. Rozycki" wrote:
>  MIPS startup files use "bal . + 8" to get the value of the PC.  This 
> clearly shows the code in question is not interested in actually taking 
> the branch as a side effect.  Replacing these instructions with "bltzal 
> $0, . + 8" keeps the code retrieving the PC into $ra, yet the branch is 
> not taken saving an unnecessary pipeline flush.

Is this really desirable?

Have you actually measured improvements on modern processors with this
change?

Have you actually measured improvements on ancient processors with
this change?


I ask because...

* modern processors often predict branches, and ifetch/decode is often
  decoupled from actual instruction execution.

  -> i.e., the cost of taking that branch may be entirely hidden by
     front-end prediction and prefetching code.

* it's entirely reasonable to hard-code 'bal' as always taken --
  thereby making the prediction easier, and possibly saving an entry
  in the prediction table...  (Doing this for all cases of
  branch-compare-vs.-0 not called out as special cases in the
  architecture may be less desirable...)  So, this change could
  actually lose.

* if the goal is to just get the PC and performance is an issue, using
  many of these instructions (esp. with RA as the destination) may
  mess with return prediction stacks...



So, back to the basic question:

Is this desirable, and does it actually win?  or is it just churn?




chris



More information about the Gcc-patches mailing list