This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: predicated instructions in ARM
- From: Richard Earnshaw <rearnsha at arm dot com>
- To: Arvind Krishnaswamy <arvind at CS dot Arizona dot EDU>
- Cc: gcc at gcc dot gnu dot org, Richard dot Earnshaw at arm dot com
- Date: Fri, 03 May 2002 10:20:54 +0100
- Subject: Re: predicated instructions in ARM
- Organization: ARM Ltd.
- Reply-to: Richard dot Earnshaw at arm dot com
> The following is a sequence of predicated instructions being generated by
> gcc for the ARM:
>
> ...
> ...
> rsble r3, r6, #0
> movle r4, r3, asl #19
> movle r4, r4, lsr #19
> .L11:
> movgt r3, r6, asl #16
> movgt r4, r3, asr #16
> .L12:
> ...
> ...
>
> So for a given condition either the 3 instructions before .L11 will become
> no-ops (if gt is true) else the 2 instructions between .L11 and .L12 will
> be no-ops. Isn't it better to just use branches to the appropriate code
> since branches will introduce just 1 stall in the pipeline?
On most ARM chips, a branch instruction takes at one cycle to execute and
is followed by three stalls as the pipeline refills. So it is profitable
to conditionally execute up to four instructions (four instructions not
executed will take the same time as one executed branch instruction, but
the sequence will be one instruction shorter). We can do this for both
the true and false terms, since such a sequence will always have two
branch instructions in it, and one will always be executed. So the above
code is optimal for most processors. (Note that static branch prediction
will always fail a forward branch, so even ARM cores with that capability
will incur a stall.)
StrongARM on the other hand has accelerated branch handling code, so the
conditional sequences should be shorter (unless optimizing for space); in
that case it's probably best to limit the code to two instructions. You
get this behaviour from the compiler if you have -mtune=strongarm (or
-mcpu=strongarm).
> One more question. In the above code, control is never transferred to .L11
> or .L12. Why are these labels introduced?
I'm assuming you are using gcc-2.95 (or earlier). The labels are there
because the sequence was generated late in the output process, and it
wasn't really feasible to eliminate them at that time.
> Also where can find the arm and thumb specific code under the gcc source tree?
gcc/config/arm/*
R.