This is the mail archive of the
mailing list for the GCC project.
Re: cr logical insn implementation for rs6000
> From: Michael Meissner <email@example.com>
> Date: Wed, 6 Sep 2000 15:43:07 -0400
> Cc: firstname.lastname@example.org
> On Tue, Sep 05, 2000 at 11:08:25PM -0700, Geoff Keating wrote:
> > This patch represents the CR logical operations for rs6000
> > in RTL, explains them (well, somewhat) to the scheduler,
> > and changes the branch-emitting logic and sCOND-emitting logic to use
> > them.
> > There are still a few pieces to do:
> > - Remove the %D output_operand modifier
> > - Update the call architecture to use the new insns
> > - Perhaps even try to make combine use the new insns
> > to reduce the number of branches.
> > I hope even this much is a significant improvement.
> Have you tested the changes on real machines to make sure that it is indeed a
> win? I'm not trying to be a grump or anything, but when I was doing the
> original 750 work, I was surprised to find that cror and friends (on the 750
> and 604 varients, but not on the 603 IIRC) were serializing instructions, while
> branches were not, so you could have sequences that were slower in real life
> than what they replaced. Of course given the horrid code that is generated
> when BRANCH_COST is > 1 (generate two SCC instructions, & or | them together,
> and then do another comparison), it might still be a win.
> You might want to look at finishing up the -foptimize-comparisons work and
> enable it for the rs/6000 as well.
The changes don't change the code generated, they just allow it to be
scheduled differently. Before, we used to emit the cror operations as
part of the branch (we'd generate code like
"cror 3,1,2;bso- cr0,foo"); now, we can schedule other operations
between the cror and the branch, which is pretty much guaranteed to
win even in the presence of execute-serialisation; for that sequence
on the 750, it'll take about four cycles before the branch is fully
resolved, which means that if there's another branch anywhere in the
next 10 instructions the machine could stall.
I don't believe that combine can use the new insn, it exceeds the
3-insn limit for combinations (by one insn). I should look into
fixing that by adding the right define_insn_split.
I'm pretty sure that using cr* insns would be a win over doing the
same thing in integer arithmetic. I think it's at best marginal
whether it'd be better than using branches, I suspect in most cases
it'd be better to leave the branches in and try to rearrange them so
that most of the time only one branch is executed (by looking at the
- Geoffrey Keating <email@example.com>