This is the mail archive of the java@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Exception causing insns in delay slots


 In message <200204262146.g3QLk4Hp007025@hiauly1.hia.nrc.ca>, "John David 
Anglin" writes:
 > Yes, the branch actually just updates iaoq_next.  However, when a branch
 > follows a branch, the second branch modifies the insn queue so that
 > one insn at the target of the first branch is executed, then the flow
 > transfers to the target of the second branch.  This differs from what
 > happens with the return pointer optimization where the effect doesn't
 > take place until the return of the function called.  You can create
 > a two insn timer loop using this capability and there is a significant
 > performance improvement on older machines without branch prediction.
 > 
 > I still think that a branch in the delay slot of another branch (call)
 > on the PA is not equivalent to the return pointer optimization (assuming
 > the adjustment can be made).
Maybe this is your confusion.  The return address twiddling isn't meant
to optimize a branch in the delay slot of the call, but instead a branch
after the call (and after the call's delay slot).  Maybe code would be better.

Let's assume we have something like this before delay slot optimizations
are run:

  bl foo,%r2
    <delay slot>
  bl newtarget,%r0
    <delay slot>


We can arrange for the return from "foo" to resume execution at "newtarget"
by twiddling the value in %r2 in the delay slot of foo.  It turns the code
into something like this:

temp:
  bl foo,%r2
  ldo newtarget-temp-8(%r2),%r2


Note carefully we are not trying to optimize the case of a branch in another
branch's delay slot.  We do not generate such code to the best of my knowledge,
and such code is not generally useful (you get a single instruction executed
from the target of the first branch if I recall correctly).  It's also the
case that anytime you have a branch in the delay slot of another branch all
predictions are turned off, so even if you could come up with an obscure way
to use this feature, it'd probably perform poorly, even on older HPs.



 > > call/return stack for predicting branches.  This is why we disable it
 > > anytime
 > > we're optimizing for a PA8000 or newer machine.
 > 
 > I looked at output_call again and I couldn't see that this is disabled
 > for PA8000 or newer machines.  In the dw2 testing, we were definitely
 > getting unconditional branches in the delay slot of calls on PA8000
 > or newer machines.
It's handled elsewhere -- output_call is way too late to catch this.  You
have to prevent reorg from filling the delay slot of the call with the
return address adjustment.

Also remember, this is controlled by optimizing for the PA8000 (-msched=8000).
If you didn't use -msched=8000, then you'd still get the return pointer
adjustments.

jeff








Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]