This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Exception causing insns in delay slots


>  > I still think that a branch in the delay slot of another branch (call)
>  > on the PA is not equivalent to the return pointer optimization (assuming
>  > the adjustment can be made).
> Maybe this is your confusion.  The return address twiddling isn't meant
> to optimize a branch in the delay slot of the call, but instead a branch
> after the call (and after the call's delay slot).  Maybe code would be
> better.
> 
> Let's assume we have something like this before delay slot optimizations
> are run:
> 
>   bl foo,%r2
>     <delay slot>
>   bl newtarget,%r0
>     <delay slot>

The point that I was trying to make is that in output_call this would
appear as a sequence with the second branch in the "delay slot" of the
the first branch.  We check for this and do the return adjust if we
can.  If we can't we just put a nop in the delay slot.

If the insn in the "delay slot" is not a branch, we just output it.
This is often an insn to load an argument register for the branch
to foo.  So, clearly in this case we assume that the second insn
executes before the transfer to foo occurs.

As you have noted before, it is generally a good idea to make the
rtl match the hardware as closely as is possible.  Maybe these sequences
are created at such a late stage that the above inconsistency doesn't
matter but I think there are a number of reasons not to allow it:

1)  As you noted below, there is no gain on PA8000 and above.
2)  We have to disable the return pointer optimization under hppa-linux
    for correct operation of the dwarf2 exception support.
3)  If we didn't allow an unconditional branch as the second insn in
    a delayed branch sequence, then the compiler might be able to find
    some other insn to fit in the sequence.  I guess we could allow
    an unconditional jump as the last alternative if there are no
    others.

At a minimum, we should turn off allowing branches in the "delay slot"
of sequences under hppa-linux at the same point as we do for the PA8000.

> We can arrange for the return from "foo" to resume execution at "newtarget"
> by twiddling the value in %r2 in the delay slot of foo.  It turns the code
> into something like this:
> 
> temp:
>   bl foo,%r2
>   ldo newtarget-temp-8(%r2),%r2

I understand this part.

> 
> Note carefully we are not trying to optimize the case of a branch in another
> branch's delay slot.  We do not generate such code to the best of my
> knowledge,
> and such code is not generally useful (you get a single instruction executed

The loop that I had was something like

extern __inline__ void __delay(unsigned long loops) {
  asm volatile(
"	addib,UV,n      -1,%0,.
	addib,NUV,n     -1,%0,.+8
	nop"
 : "=r" (loops) : "0" (loops));

Don't know if this is generally useful.  It does run much faster on a
7100 (more or less one cycle per decrement and test).

> Also remember, this is controlled by optimizing for the PA8000
> (-msched=8000).
> If you didn't use -msched=8000, then you'd still get the return pointer
> adjustments.

I wasn't using -msched=8000 so that explains why this was occuring.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6605)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]