This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: PATCH for branch-shortening on MIPS


The patch looks mostly reasonable.

I suspect it will break the mips16 support, but mips16 is broken anyways, so
there is no need to worry about that.  We can fix it when we need it.

The patch will break all mips ports other than Irix6, because it is using
irix6 specific assembly language constructs.  This should be fixed.
We may need different fixes for irix5 (pic) and embedded mips/irix4/ultrix
(non-pic).

After that, I think it is OK if the patch goes in now, and then we hunt down
all of the little nits with instructions lengths as necessary.

For irix5 pic, there is some non-obvious stuff with gp register handling.
gp is callee saved in irix6, so it gets saved/restored in the prologue/
pilogue.  gp is caller saved in irix5, so it needs to be restored after
every call.  This happens via macro hell.  We emit the .cprestore pseudo op
in the prologue giving a stack slot offset as an argument, and then the
assembler translates every jal into a three instruction sequence jalr, nop,
load gp from stack slot.  Thus the call pattern lengths are underestimates
for irix5.  This only affects the TARGET_ABICALLS patterns, and only for
irix5 (e.g. ABI_32).

While I was just looking at the call patterns, I noticed that some of them
have obviously wrong lengths.  The call_internal1 pattern for instance has
a length of 4, but emits two instructions if the input is a CONST_INT.

There is some trickery with addressing modes to get efficient code.  This
is macro hell again.  SYMBOL_REF+REG is considered a valid address for
non-irix6 non-pic targets, because the lack of assembler syntax for lui
relocations means that the compiler must emit "la symbol address, add register
to it, load" which is a 4 instruction sequence, whereas if we emit "load
symbol(register)" the assembler gives us the desired 3 instruction sequence
"lui upper, add, load lower+reg".  GNU as has a syntax for this, the irix6
assembler has a syntax for this, the original MIPS assembler does not.
loads have a length of 8, so we are underestimating length in this case.
You may have to use -mno-split-addresses with to reproduce this.

This does not affect irix6 target, because the early irix6 assembler had a
bug (feature?) that prevented it from accepting these addresses, so I disabled
them, and never got around to re-enabling them.

For irix5, which is PIC, we get a bit different result.  We get "load from
got, nop, add reg to it, load" which is even worse, because it is 16 bytes
long.

This raises a related problem, which is that the assembler may be silently
adding nops to the code emitted by gcc, if it thinks that the target is one
of the old mips chips that doesn't have interlocks.  In this case, we would
be underestimating instruction lengths because we don't know about the extra
nops.  This wouldn't be a problem for irix6, but could be a problem for all
other mips targets.

I suspect there are other problems which aren't immediately obvious.

	Exception on the MIPS16, where there are no delay slots.

Not entirely true.  Some branches have delay slots, some don't.
I believe the rule is that unconditional branches always have delay slots,
but conditional branches have delay slots only when not mips16.  The reasoning
behind this is that filling a delay slot for a conditional branch may
increase code size, so mips16 took them out.  Filling a delay slot for an
unconditional branch should (almost?) never increase code size, so they were
OK to keep.

The define_delay patterns in mips.md distinguish between branches (conditional)
and jumps (unconditional).

Typo: juse -> just

	the MIPS assembler

This traditionally means the original MIPS assembler, which was used by
everyone up to Irix5, and which is still used by many embedded compilers,
and which is very different from the SGI irix6 assembler.  GNU as is
compatible with "the MIPS assembler", but is not compatible with the SGI
Irix6 assembler (yet).  I'd use SGI Irix6 assembler instead, as that is
presumably what you mean.

Typo: restort -> resort

+             lw $at,%got_page(target)($gp)
+             daddiu $at,$at,%got_ofst(target)

This will work only with the SGI Irix6 assembler.  It will not work with
any other MIPS assembler.  It especially will not work with the assembler
that is traditionally called "the MIPS assembler".

This assumes that we are generating PIC code.  irix5/irix6 code is always PIC,
but irix4/ultrix code not PIC, and most embedded systems are either not PIC,
or use a different kind of PIC.

I suspect we can get by with just two versions of this code.  One for the 
Irix6 assembler, and one for all other older assemblers.

! ;; a location within a signed 18-bit offset of the delay slot.  There's
! ;; also an unconditional jump with a 28-bit value, but that value is not an
! ;; offset.  Instead, it's bitwise-ored with the high-order four bits
! ;; of the instruction in the delay slot, which means it's not useful
! ;; unless you already know the absolute address of your code.

Gcc already uses the 'j' instruction for unconditional branches, so it must
be safe.  You should assume that functions are smaller than 256M, and never
cross a 256M boundary.  If we find a case where this is not true, then we
can always add a -mhuge or whatever option that avoids the j instruction.

Perhaps you can get rid of the %got_page and %got_ofst stuff if you use j?

Jim


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]