[Bug target/46219] Generate indirect jump instruction on x86-64

adam at consulting dot net.nz gcc-bugzilla@gcc.gnu.org
Fri Sep 5 00:29:00 GMT 2014


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46219

Adam Warner <adam at consulting dot net.nz> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
            Version|4.6.0                       |4.9.1
         Resolution|FIXED                       |---

--- Comment #6 from Adam Warner <adam at consulting dot net.nz> ---
Great work thanks Kai Tietz and Richard Henderson! I've come across a situation
where complex jmp is not generated and crafted a simplified test case:

$ cat gcc_bug_no_complex_indirect_jmp.c 
#include <stdint.h>

typedef void (*fn0_t)(uint8_t *rdi);
typedef void (*fn1_t)(uint8_t *rdi, fn0_t *rsi);

fn0_t fn0_dispatch[256];
fn1_t fn1_dispatch[256];

void fn0_test(uint8_t *rdi) {
  fn0_t *rsi = fn0_dispatch;
  fn1_dispatch[rdi[1]](rdi, rsi);
}

int main(void) {
  asm volatile ("ret; jmpq *0x601140(,%rax,8)");
  return 0;
}

$ gcc --version
gcc (Debian 4.9.1-4) 4.9.1
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc -O3 gcc_bug_no_complex_indirect_jmp.c && objdump -d -m i386:x86-64:intel
a.out |less

...
00000000004003c0 <main>:
  4003c0:       c3                      ret    
  4003c1:       ff 24 c5 40 11 60 00    jmp    QWORD PTR [rax*8+0x601140]
...
00000000004004c0 <fn0_test>:
  4004c0:       0f b6 47 01             movzx  eax,BYTE PTR [rdi+0x1]
  4004c4:       be 40 09 60 00          mov    esi,0x600940
  4004c9:       48 8b 04 c5 40 11 60    mov    rax,QWORD PTR [rax*8+0x601140]
  4004d0:       00 
  4004d1:       ff e0                   jmp    rax
...

The last two instructions should be merged into JMP QWORD PTR [rax*8+0x601140].
This is a 7 byte instruction. Fortuitously fn0_test would become 16 bytes total
(no more than 16 bytes of machine code can be decoded in one clock cycle on
Intel Core 2).



More information about the Gcc-bugs mailing list