Bug 46219 - Generate indirect jump instruction on x86-64
Summary: Generate indirect jump instruction on x86-64
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.9.1
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2010-10-28 22:52 UTC by Adam Warner
Modified: 2021-11-28 05:48 UTC (History)
5 users (show)

See Also:
Host:
Target: x86-64
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-11-27 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Adam Warner 2010-10-28 22:52:15 UTC
Is there a less brutal way to coax gcc into generating an indirect jump instruction on x86-64?

typedef void (*dispatch_t)(long offset);

dispatch_t dispatch[256];

void make_indirect_jump(long offset) {
  dispatch[offset](offset);
}

void force_use_of_indirect_jump_instruction(long offset) {
  asm ("jmp *dispatch( ,%0, 8)\n" : : "r" (offset));
  __builtin_unreachable();
}

int main() {
  return 0;
}

$ gcc-snapshot.sh -std=gnu99 -O3 use-indirect-jump-instruction.c && objdump -d -m i386:x86-64:intel a.out|less

0000000000400480 <make_indirect_jump>:
  400480:       48 8b 04 fd 20 12 60    mov    rax,QWORD PTR [rdi*8+0x601220]
  400487:       00 
  400488:       ff e0                   jmp    rax
  40048a:       66 0f 1f 44 00 00       nop    WORD PTR [rax+rax*1+0x0]

0000000000400490 <force_use_of_indirect_jump_instruction>:
  400490:       ff 24 fd 20 12 60 00    jmp    QWORD PTR [rdi*8+0x601220]
  400497:       66 0f 1f 84 00 00 00    nop    WORD PTR [rax+rax*1+0x0]
  40049e:       00 00 

This combination of inline assembly and __builtin_unreachable() is not a generally usable architecture-specific solution (there needs to be a way to ensure the results of modified input arguments end up in the same registers for the opaque tail call. It works in this case because offset remains unmodified, satisfying the ABI for dispatch_t).
Comment 1 Andrew Pinski 2010-10-28 22:58:27 UTC
(define_insn "*sibcall_1_rex64"
  [(call (mem:QI (match_operand:DI 0 "sibcall_insn_operand" "s,U"))
         (match_operand 1 "" ""))]
  "TARGET_64BIT && SIBLING_CALL_P (insn)"
  "@
   jmp\t%P0
   jmp\t%A0"
  [(set_attr "type" "call")])

I think "m" needs to be added as a constraint in the above instruction.
Other than changing GCC, there is no way.
Comment 2 Uroš Bizjak 2010-10-29 08:17:17 UTC
For some reason, memory operand is prohibited in a sibcall, see predicates.md:

;; Test for a valid operand for a call instruction.
(define_predicate "call_insn_operand"
  (ior (match_operand 0 "constant_call_address_operand")
       (match_operand 0 "call_register_no_elim_operand")
       (match_operand 0 "memory_operand")))

;; Similarly, but for tail calls, in which we cannot allow memory references.
(define_predicate "sibcall_insn_operand"
  (ior (match_operand 0 "constant_call_address_operand")
       (match_operand 0 "register_no_elim_operand")))
Comment 3 Richard Henderson 2010-10-29 16:45:47 UTC
That would be because we have no good way to say: global memory is fine,
but the on-stack memory that we just deallocated is not.

In addition for this case, we have to ensure that the registers used to
do the indexing are still valid after call-saved registers have been
restored, and avoid any call-clobbered registers that might be needed
to execute the epilogue.

In general I don't think this is solvable, but for this specific case
we could add a peephole.
Comment 4 Kai Tietz 2014-06-05 17:04:24 UTC
Author: ktietz
Date: Thu Jun  5 17:03:52 2014
New Revision: 211283

URL: http://gcc.gnu.org/viewcvs?rev=211283&root=gcc&view=rev
Log:
2014-06-05  Kai Tietz  <ktietz@redhat.com>
	    Richard Henderson  <rth@redhat.com>

	PR target/46219
	* config/i386/predicates.md (memory_nox32_operand): Add memory_operand
	checking for !TARGET_X32.
	* config/i386/i386.md (UNSPEC_PEEPSIB): New unspec constant.
	(sibcall_intern): New define_insn, plus required peepholes.
	(sibcall_pop_intern): Likewise.
	(sibcall_value_intern): Likewise.
	(sibcall_value_pop_intern): Likewise.

2014-06-05  Kai Tietz  <ktietz@redhat.com>

	PR target/46219
	* gcc.target/i386/sibcall-4.c: Remove xfail.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.md
    trunk/gcc/config/i386/predicates.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/i386/sibcall-4.c
Comment 5 Kai Tietz 2014-06-05 17:05:51 UTC
Fixed.
Comment 6 Adam Warner 2014-09-05 00:29:10 UTC
Great work thanks Kai Tietz and Richard Henderson! I've come across a situation where complex jmp is not generated and crafted a simplified test case:

$ cat gcc_bug_no_complex_indirect_jmp.c 
#include <stdint.h>

typedef void (*fn0_t)(uint8_t *rdi);
typedef void (*fn1_t)(uint8_t *rdi, fn0_t *rsi);

fn0_t fn0_dispatch[256];
fn1_t fn1_dispatch[256];

void fn0_test(uint8_t *rdi) {
  fn0_t *rsi = fn0_dispatch;
  fn1_dispatch[rdi[1]](rdi, rsi);
}

int main(void) {
  asm volatile ("ret; jmpq *0x601140(,%rax,8)");
  return 0;
}

$ gcc --version
gcc (Debian 4.9.1-4) 4.9.1
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc -O3 gcc_bug_no_complex_indirect_jmp.c && objdump -d -m i386:x86-64:intel a.out |less

...
00000000004003c0 <main>:
  4003c0:       c3                      ret    
  4003c1:       ff 24 c5 40 11 60 00    jmp    QWORD PTR [rax*8+0x601140]
...
00000000004004c0 <fn0_test>:
  4004c0:       0f b6 47 01             movzx  eax,BYTE PTR [rdi+0x1]
  4004c4:       be 40 09 60 00          mov    esi,0x600940
  4004c9:       48 8b 04 c5 40 11 60    mov    rax,QWORD PTR [rax*8+0x601140]
  4004d0:       00 
  4004d1:       ff e0                   jmp    rax
...

The last two instructions should be merged into JMP QWORD PTR [rax*8+0x601140].
This is a 7 byte instruction. Fortuitously fn0_test would become 16 bytes total (no more than 16 bytes of machine code can be decoded in one clock cycle on Intel Core 2).