This is the mail archive of the java@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Exception causing insns in delay slots


 In message <200204271632.g3RGWIfV013676@hiauly1.hia.nrc.ca>, "John David 
Anglin" writes:
 > The point that I was trying to make is that in output_call this would
 > appear as a sequence with the second branch in the "delay slot" of the
 > the first branch.  We check for this and do the return adjust if we
 > can.  If we can't we just put a nop in the delay slot.
Right.  So what's the problem?  The vast majority of the time if we've put
the JUMP_INSN in the delay slot of a CALL_INSN, then we'll be able to
apply the return pointer adjustment trick.

Yes you get sub-optimal code if the branch target is out of range, but what
would you propose to do?  Re-run the delay slot optimizer again?  It's not
that common and in the cases where it does happen I seriously doubt the
call/jump is on the critical path.

 > If the insn in the "delay slot" is not a branch, we just output it.
 > This is often an insn to load an argument register for the branch
 > to foo.  So, clearly in this case we assume that the second insn
 > executes before the transfer to foo occurs.
No, that's not a safe assumption at all.  In fact, it would be a very unsafe
assumption.  Consider

	copy %r1,%r10
	movb,= %r10,%r11,target
	nop

You can not transform that into

	movb,= %r10,%r11,target
	copy %r1,%r10

Remember, only the side effect of control transfer is delayed.





 > As you have noted before, it is generally a good idea to make the
 > rtl match the hardware as closely as is possible.  Maybe these sequences
 > are created at such a late stage that the above inconsistency doesn't
 > matter but I think there are a number of reasons not to allow it:
I still don't see the inconsistency.  The RTL describes to the fullest
extent possible what is happening.  We have a branch and of some kind and
an insn in the delay slot of the branch.  The specifics of what we can 
safely put into a delay slot are handled by a combination of generic code
which tracks dataflow and target code to deal with testing candidate insns
for validity.

The delay slot does _not_ execute before the branch, this is precisely why
you can't (for example) take an insn from before the branch which sets a
resource used by the branch.  The generic code in reorg which tracks dataflow
ensures this can't happen.

Are you saying it would be good if we put an adjustment of %r2/%r31 in the
delay slot instead of the JUMP_INSN as that more accurately represents what
assembly code we will produce?  If so, maybe, but you have some technical
difficulties to consider for the representation and ultimately I don't think
it's going to be that useful.


 > 1)  As you noted below, there is no gain on PA8000 and above.
Right.

 > 2)  We have to disable the return pointer optimization under hppa-linux
 >     for correct operation of the dwarf2 exception support.
Right.

 > 3)  If we didn't allow an unconditional branch as the second insn in
 >     a delayed branch sequence, then the compiler might be able to find
 >     some other insn to fit in the sequence.  I guess we could allow
 >     an unconditional jump as the last alternative if there are no
 >     others.
For targets where it's useful (older PAs running HPUX), then it is far
more profitable to do the return address hack than any other filling
of those delay slots.  That's why it's done first.  If we fall outside
the confines of the systems where it is useful, then we should not do
the return address hack at all (ie #1 & #2 above).

 > At a minimum, we should turn off allowing branches in the "delay slot"
 > of sequences under hppa-linux at the same point as we do for the PA8000.
Yup.  It's somewhere in pa.c.  following_call or something like that

 > > Note carefully we are not trying to optimize the case of a branch in anoth
 > er
 > > branch's delay slot.  We do not generate such code to the best of my
 > > knowledge,
 > > and such code is not generally useful (you get a single instruction execut
 > ed
 > 
 > The loop that I had was something like
 > 
 > extern __inline__ void __delay(unsigned long loops) {
 >   asm volatile(
 > "	addib,UV,n      -1,%0,.
 > 	addib,NUV,n     -1,%0,.+8
 > 	nop"
 >  : "=r" (loops) : "0" (loops));
 > 
 > Don't know if this is generally useful.  It does run much faster on a
 > 7100 (more or less one cycle per decrement and test).
Much faster than what?


jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]