This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH] MIPS16/GCC: Optimise `__call_stub_' call stubs
- From: "Maciej W. Rozycki" <macro at codesourcery dot com>
- To: <gcc-patches at gcc dot gnu dot org>
- Cc: Catherine Moore <clm at codesourcery dot com>, Eric Christopher <echristo at gmail dot com>, Matthew Fortune <matthew dot fortune at imgtec dot com>
- Date: Mon, 29 Dec 2014 23:38:10 +0000
- Subject: [PATCH] MIPS16/GCC: Optimise `__call_stub_' call stubs
- Authentication-results: sourceware.org; auth=none
- References: <alpine dot DEB dot 1 dot 10 dot 1411191209550 dot 2881 at tp dot orcam dot me dot uk>
On Wed, 19 Nov 2014, Maciej W. Rozycki wrote:
> I have a second optimisation to make here too, but that triggers a
> surprising bug in GNU LD where BFD code meant to discard unused stubs
> appears not to work at all. So that'll have to be fixed first and it
> also means the other optimisation is unsafe to include in 5.0. I plan
> to post it shortly anyway for discussion, once I have the linker bug
> fixed.
For posterity -- optimise plain call `__call_stub_' MIPS16 stubs (where
no FP value is returned) that just jump to (tail-call) the actual standard
MIPS function. There is no need to jump via a register here as we know
that:
1. By definition the jump target is going to be standard MIPS code (no
need to relax to JALX ever).
2. We are not linked into PIC code as PIC code uses libgcc.a's indirect
stubs instead so $25 doesn't have to be valid on function's entry.
3. If the target function has been compiled to PIC code, then a PIC stub
will be prepended to the function by LD to load $25 on entry as usually
with non-PIC code.
This shortens the stub from code like:
Disassembly of section .mips16.call.callee_af7:
00000000 <__call_stub_callee_af7>:
0: 3c190000 lui t9,0x0
0: R_MIPS_HI16 callee_af7
4: 27390000 addiu t9,t9,0
4: R_MIPS_LO16 callee_af7
8: 44846000 mtc1 a0,$f12
c: 44877000 mtc1 a3,$f14
10: 44867800 mtc1 a2,$f15
14: 03200008 jr t9
18: 00000000 nop
(taken from the `pr23324' test case at `-O0' which also implies `-O1' for
GAS, i.e. no branch swapping and hence the delay-slot NOP) to code like:
Disassembly of section .mips16.call.callee_af7:
00000000 <__call_stub_callee_af7>:
0: 44846000 mtc1 a0,$f12
4: 44877000 mtc1 a3,$f14
8: 44867800 mtc1 a2,$f15
c: 08000000 j 0 <__call_stub_callee_af7>
c: R_MIPS_26 callee_af7
10: 00000000 nop
and also helps branch prediction (instruction prefetching at the target of
the jump) where available by avoiding an indirect jump.
As noted in the previous message cited above this however triggers a BFD
bug, which I tracked down to a missing feature: call stubs are meant to be
discarded where not needed -- which is where the actual function called is
MIPS16 code -- but that has never been implemented where the actual
function called is local (symbol referred binds locally). This is because
the global symbol hash is used internally by MIPS BFD linker code to check
which MIPS16 stubs have to stay and which ought to be discarded.
Consequently any such stubs associated with local symbols are left
untouched and get through to linker output, wasting storage and runtime
memory space too. When the actual function is MIPS16 code, the linker
then fails as it cannot relax the jump associated with the R_MIPS_26
relocation and bails out:
mips-linux-gnu-ld: pr23324.o: .mips16.call.callee_af7+0xc: Unsupported jump between ISA modes; consider recompiling with interlinking enabled.
mips-linux-gnu-ld: final link failed: Bad value
even though the stub will obviously never execute. With an indirect jump
currently produced the useless stub makes its way to linker output
successfully with the mode switch taken into account in the HI16/LO16
relocations associated with the LUI/ADDIU instruction pair.
Therefore the BFD issue needs to be fixed first before this optimisation
can be made and right now I cannot dive into implementing the missing bit
noted above, so I'm just sharing this change so that it can be used in the
future when BFD has been corrected.
2014-12-29 Maciej W. Rozycki <macro@codesourcery.com>
gcc/
* config/mips/mips.c (mips16_build_call_stub): Emit a direct
jump (and omit the address load) rather than a jump-register
instruction in the tail-call case.
Maciej
gcc-mips16-call-stub-j.patch
Index: gcc-fsf-trunk-quilt/gcc/config/mips/mips.c
===================================================================
--- gcc-fsf-trunk-quilt.orig/gcc/config/mips/mips.c 2014-11-18 23:33:10.917768628 +0000
+++ gcc-fsf-trunk-quilt/gcc/config/mips/mips.c 2014-11-18 23:33:32.417976370 +0000
@@ -6957,19 +6957,6 @@ mips16_build_call_stub (rtx retval, rtx
reg_names[GP_REG_FIRST + 18],
reg_names[RETURN_ADDR_REGNUM]);
}
- else
- {
- /* Load the address of the MIPS16 function into $25. Do this
- first so that targets with coprocessor interlocks can use
- an MFC1 to fill the delay slot. */
- if (TARGET_EXPLICIT_RELOCS)
- {
- output_asm_insn ("lui\t%^,%%hi(%0)", &fn);
- output_asm_insn ("addiu\t%^,%^,%%lo(%0)", &fn);
- }
- else
- output_asm_insn ("la\t%^,%0", &fn);
- }
/* Move the arguments from general registers to floating-point
registers. */
@@ -7037,10 +7024,7 @@ mips16_build_call_stub (rtx retval, rtx
fprintf (asm_out_file, "\t.cfi_endproc\n");
}
else
- {
- /* Jump to the previously-loaded address. */
- output_asm_insn ("jr\t%^", NULL);
- }
+ output_asm_insn (MIPS_CALL ("j", &fn, 0, -1), &fn);
#ifdef ASM_DECLARE_FUNCTION_SIZE
ASM_DECLARE_FUNCTION_SIZE (asm_out_file, stubname, stubdecl);