Summary: | Generate indirect jump instruction on x86-64 | ||
---|---|---|---|
Product: | gcc | Reporter: | Adam Warner <adam.warner.nz> |
Component: | target | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | NEW --- | ||
Severity: | enhancement | CC: | areg.melikadamyan, hjl.tools, ktietz, rth, ubizjak |
Priority: | P3 | Keywords: | missed-optimization |
Version: | 4.9.1 | ||
Target Milestone: | --- | ||
Host: | Target: | x86-64 | |
Build: | Known to work: | ||
Known to fail: | Last reconfirmed: | 2021-11-27 00:00:00 |
Description
Adam Warner
2010-10-28 22:52:15 UTC
(define_insn "*sibcall_1_rex64" [(call (mem:QI (match_operand:DI 0 "sibcall_insn_operand" "s,U")) (match_operand 1 "" ""))] "TARGET_64BIT && SIBLING_CALL_P (insn)" "@ jmp\t%P0 jmp\t%A0" [(set_attr "type" "call")]) I think "m" needs to be added as a constraint in the above instruction. Other than changing GCC, there is no way. For some reason, memory operand is prohibited in a sibcall, see predicates.md: ;; Test for a valid operand for a call instruction. (define_predicate "call_insn_operand" (ior (match_operand 0 "constant_call_address_operand") (match_operand 0 "call_register_no_elim_operand") (match_operand 0 "memory_operand"))) ;; Similarly, but for tail calls, in which we cannot allow memory references. (define_predicate "sibcall_insn_operand" (ior (match_operand 0 "constant_call_address_operand") (match_operand 0 "register_no_elim_operand"))) That would be because we have no good way to say: global memory is fine, but the on-stack memory that we just deallocated is not. In addition for this case, we have to ensure that the registers used to do the indexing are still valid after call-saved registers have been restored, and avoid any call-clobbered registers that might be needed to execute the epilogue. In general I don't think this is solvable, but for this specific case we could add a peephole. Author: ktietz Date: Thu Jun 5 17:03:52 2014 New Revision: 211283 URL: http://gcc.gnu.org/viewcvs?rev=211283&root=gcc&view=rev Log: 2014-06-05 Kai Tietz <ktietz@redhat.com> Richard Henderson <rth@redhat.com> PR target/46219 * config/i386/predicates.md (memory_nox32_operand): Add memory_operand checking for !TARGET_X32. * config/i386/i386.md (UNSPEC_PEEPSIB): New unspec constant. (sibcall_intern): New define_insn, plus required peepholes. (sibcall_pop_intern): Likewise. (sibcall_value_intern): Likewise. (sibcall_value_pop_intern): Likewise. 2014-06-05 Kai Tietz <ktietz@redhat.com> PR target/46219 * gcc.target/i386/sibcall-4.c: Remove xfail. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md trunk/gcc/config/i386/predicates.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/sibcall-4.c Fixed. Great work thanks Kai Tietz and Richard Henderson! I've come across a situation where complex jmp is not generated and crafted a simplified test case: $ cat gcc_bug_no_complex_indirect_jmp.c #include <stdint.h> typedef void (*fn0_t)(uint8_t *rdi); typedef void (*fn1_t)(uint8_t *rdi, fn0_t *rsi); fn0_t fn0_dispatch[256]; fn1_t fn1_dispatch[256]; void fn0_test(uint8_t *rdi) { fn0_t *rsi = fn0_dispatch; fn1_dispatch[rdi[1]](rdi, rsi); } int main(void) { asm volatile ("ret; jmpq *0x601140(,%rax,8)"); return 0; } $ gcc --version gcc (Debian 4.9.1-4) 4.9.1 Copyright (C) 2014 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ gcc -O3 gcc_bug_no_complex_indirect_jmp.c && objdump -d -m i386:x86-64:intel a.out |less ... 00000000004003c0 <main>: 4003c0: c3 ret 4003c1: ff 24 c5 40 11 60 00 jmp QWORD PTR [rax*8+0x601140] ... 00000000004004c0 <fn0_test>: 4004c0: 0f b6 47 01 movzx eax,BYTE PTR [rdi+0x1] 4004c4: be 40 09 60 00 mov esi,0x600940 4004c9: 48 8b 04 c5 40 11 60 mov rax,QWORD PTR [rax*8+0x601140] 4004d0: 00 4004d1: ff e0 jmp rax ... The last two instructions should be merged into JMP QWORD PTR [rax*8+0x601140]. This is a 7 byte instruction. Fortuitously fn0_test would become 16 bytes total (no more than 16 bytes of machine code can be decoded in one clock cycle on Intel Core 2). |