MIPS: "bal" micro-optimization
cgd@broadcom.com
cgd@broadcom.com
Tue May 3 19:39:00 GMT 2005
At Tue, 3 May 2005 19:27:26 +0000 (UTC), "Maciej W. Rozycki" wrote:
> MIPS startup files use "bal . + 8" to get the value of the PC. This
> clearly shows the code in question is not interested in actually taking
> the branch as a side effect. Replacing these instructions with "bltzal
> $0, . + 8" keeps the code retrieving the PC into $ra, yet the branch is
> not taken saving an unnecessary pipeline flush.
Is this really desirable?
Have you actually measured improvements on modern processors with this
change?
Have you actually measured improvements on ancient processors with
this change?
I ask because...
* modern processors often predict branches, and ifetch/decode is often
decoupled from actual instruction execution.
-> i.e., the cost of taking that branch may be entirely hidden by
front-end prediction and prefetching code.
* it's entirely reasonable to hard-code 'bal' as always taken --
thereby making the prediction easier, and possibly saving an entry
in the prediction table... (Doing this for all cases of
branch-compare-vs.-0 not called out as special cases in the
architecture may be less desirable...) So, this change could
actually lose.
* if the goal is to just get the PC and performance is an issue, using
many of these instructions (esp. with RA as the destination) may
mess with return prediction stacks...
So, back to the basic question:
Is this desirable, and does it actually win? or is it just churn?
chris
More information about the Gcc-patches
mailing list