[Bug target/53623] New: [4.7 Regression] sign extension is effectively split into two x86-64 instructions

adam at consulting dot net.nz gcc-bugzilla@gcc.gnu.org
Sun Jun 10 02:50:00 GMT 2012


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53623

             Bug #: 53623
           Summary: [4.7 Regression] sign extension is effectively split
                    into two x86-64 instructions
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: adam@consulting.net.nz


Note: 

#include <stdint.h>

typedef (*inst_t)(int64_t rdi, int64_t rsi, int64_t rdx);

int16_t code[256];
inst_t dispatch[256];

void an_inst(int64_t rdi, int64_t rsi, int64_t rdx) {
  rdx = code[rdx];
  uint8_t inst = (uint8_t) rdx;
  rdx >>= 8;
  dispatch[inst](rdi, rsi, rdx);
}

int main(void) {
  return 0;
}

$ gcc-4.6 -O3 sign_extension_regression.c && objdump -d -m i386:x86-64 a.out
|less

00000000004004a0 <an_inst>:
  4004a0:       48 0f bf 94 12 20 1a    movswq 0x601a20(%rdx,%rdx,1),%rdx
  4004a7:       60 00 
  4004a9:       0f b6 c2                movzbl %dl,%eax
  4004ac:       48 c1 fa 08             sar    $0x8,%rdx
  4004b0:       48 8b 04 c5 20 12 60    mov    0x601220(,%rax,8),%rax
  4004b7:       00 
  4004b8:       ff e0                   jmpq   *%rax

int16_t is sign extended into RDX. RDX is arithmetic shifted down by 8 (after
first extracting DL). Result: RDX contains a sign extended 8-bit value.

$ gcc-4.7 -O3 sign_extension_regression.c && objdump -d -m i386:x86-64 a.out
|less

00000000004004b0 <an_inst>:
  4004b0:       0f b7 84 12 60 1a 60    movzwl 0x601a60(%rdx,%rdx,1),%eax
  4004b7:       00 
  4004b8:       48 0f bf d0             movswq %ax,%rdx
  4004bc:       0f b6 c0                movzbl %al,%eax
  4004bf:       48 c1 fa 08             sar    $0x8,%rdx
  4004c3:       48 8b 04 c5 60 12 60    mov    0x601260(,%rax,8),%rax
  4004ca:       00 
  4004cb:       ff e0                   jmpq   *%rax

int16_t is loaded into EAX without sign extension. The low 16 bits of EAX are
loaded into RDX with sign extension. RDX is arithmetic shifted down by 8.
Result: RDX contains a sign extended 8-bit value.

This is a regression. gcc-4.6 achieved the same result with one less
instruction.

Note: The quality of the generated code is affect by Bug 45434 and Bug 46219.
Suggested optimal approach with four instructions:

1. movzwl mem16 -> edx
2. movzbl dl -> eax
3. movsbq dh -> rdx
4. complex indrect jmp (combining mov mem64 -> rax; jmp rax)



More information about the Gcc-bugs mailing list