This is the mail archive of the
mailing list for the GCC project.
Re: MIPS: "bal" micro-optimization
- From: cgd at broadcom dot com
- To: macro at linux-mips dot org
- Cc: gcc-patches at gcc dot gnu dot org
- Date: 03 May 2005 12:38:22 -0700
- Subject: Re: MIPS: "bal" micro-optimization
- References: <Pine.LNX.4.61L.email@example.com><mailpost.1115148445.22550@news-sj1-1>
At Tue, 3 May 2005 19:27:26 +0000 (UTC), "Maciej W. Rozycki" wrote:
> MIPS startup files use "bal . + 8" to get the value of the PC. This
> clearly shows the code in question is not interested in actually taking
> the branch as a side effect. Replacing these instructions with "bltzal
> $0, . + 8" keeps the code retrieving the PC into $ra, yet the branch is
> not taken saving an unnecessary pipeline flush.
Is this really desirable?
Have you actually measured improvements on modern processors with this
Have you actually measured improvements on ancient processors with
I ask because...
* modern processors often predict branches, and ifetch/decode is often
decoupled from actual instruction execution.
-> i.e., the cost of taking that branch may be entirely hidden by
front-end prediction and prefetching code.
* it's entirely reasonable to hard-code 'bal' as always taken --
thereby making the prediction easier, and possibly saving an entry
in the prediction table... (Doing this for all cases of
branch-compare-vs.-0 not called out as special cases in the
architecture may be less desirable...) So, this change could
* if the goal is to just get the PC and performance is an issue, using
many of these instructions (esp. with RA as the destination) may
mess with return prediction stacks...
So, back to the basic question:
Is this desirable, and does it actually win? or is it just churn?