This is the mail archive of the
java@gcc.gnu.org
mailing list for the Java project.
Re: Exception causing insns in delay slots
- From: law at redhat dot com
- To: "John David Anglin" <dave at hiauly1 dot hia dot nrc dot ca>
- Cc: jason at redhat dot com, davem at redhat dot com, gcc-patches at gcc dot gnu dot org, mark at codesourcery dot com, java at gcc dot gnu dot org, dave dot anglin at nrc dot ca
- Date: Sat, 27 Apr 2002 09:24:21 -0600
- Subject: Re: Exception causing insns in delay slots
- Reply-to: law at redhat dot com
In message <200204262146.g3QLk4Hp007025@hiauly1.hia.nrc.ca>, "John David
Anglin" writes:
> Yes, the branch actually just updates iaoq_next. However, when a branch
> follows a branch, the second branch modifies the insn queue so that
> one insn at the target of the first branch is executed, then the flow
> transfers to the target of the second branch. This differs from what
> happens with the return pointer optimization where the effect doesn't
> take place until the return of the function called. You can create
> a two insn timer loop using this capability and there is a significant
> performance improvement on older machines without branch prediction.
>
> I still think that a branch in the delay slot of another branch (call)
> on the PA is not equivalent to the return pointer optimization (assuming
> the adjustment can be made).
Maybe this is your confusion. The return address twiddling isn't meant
to optimize a branch in the delay slot of the call, but instead a branch
after the call (and after the call's delay slot). Maybe code would be better.
Let's assume we have something like this before delay slot optimizations
are run:
bl foo,%r2
<delay slot>
bl newtarget,%r0
<delay slot>
We can arrange for the return from "foo" to resume execution at "newtarget"
by twiddling the value in %r2 in the delay slot of foo. It turns the code
into something like this:
temp:
bl foo,%r2
ldo newtarget-temp-8(%r2),%r2
Note carefully we are not trying to optimize the case of a branch in another
branch's delay slot. We do not generate such code to the best of my knowledge,
and such code is not generally useful (you get a single instruction executed
from the target of the first branch if I recall correctly). It's also the
case that anytime you have a branch in the delay slot of another branch all
predictions are turned off, so even if you could come up with an obscure way
to use this feature, it'd probably perform poorly, even on older HPs.
> > call/return stack for predicting branches. This is why we disable it
> > anytime
> > we're optimizing for a PA8000 or newer machine.
>
> I looked at output_call again and I couldn't see that this is disabled
> for PA8000 or newer machines. In the dw2 testing, we were definitely
> getting unconditional branches in the delay slot of calls on PA8000
> or newer machines.
It's handled elsewhere -- output_call is way too late to catch this. You
have to prevent reorg from filling the delay slot of the call with the
return address adjustment.
Also remember, this is controlled by optimizing for the PA8000 (-msched=8000).
If you didn't use -msched=8000, then you'd still get the return pointer
adjustments.
jeff