Bug 34653 - operation performed unnecessarily in 64-bit mode
Summary: operation performed unnecessarily in 64-bit mode
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.3.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2008-01-03 19:07 UTC by dean
Modified: 2021-12-25 07:19 UTC (History)
4 users (show)

See Also:
Host: x86_64-unknown-linux-gnu
Target: x86_64-unknown-linux-gnu
Build: x86_64-unknown-linux-gnu
Known to work:
Known to fail: 4.3.5, 4.4.5, 4.5.2, 4.6.0
Last reconfirmed: 2011-02-01 16:45:31


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description dean 2008-01-03 19:07:26 UTC
for this code:

extern unsigned long table[];

unsigned long foo(unsigned char *p) {
  unsigned long long tag = *p;
  return table[tag >> 4];
}

gcc generates:

0000000000000000 <foo>:
   0:   0f b6 07                movzbl (%rdi),%eax
   3:   48 c1 e8 04             shr    $0x4,%rax
   7:   48 8b 04 c5 00 00 00    mov    0x0(,%rax,8),%rax
   e:   00
                        b: R_X86_64_32S table
   f:   c3                      retq

that "shr $0x4,%rax" would be better as "shr $0x4,%eax" because it produces the same result (due to dominating movzbl) and it's one byte shorter which favours both space and the narrow decoder on the core2.

thanks
-dean

/home/odo/gcc/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc/configure --prefix=/home/odo/gcc --disable-multilib --disable-biarch x86_64-unknown-linux-gnu --enable-languages=c
Thread model: posix
gcc version 4.3.0 20071128 (experimental) (GCC)
Comment 1 dean 2008-01-03 19:27:31 UTC
oops i should have used an "unsigned long" for the tag rather than unsigned long long, not that it matters much.

here's an expanded example showing another unnecessary REX:

extern unsigned long table[];

unsigned long foo(unsigned char *p) {
  unsigned long tag = *p;
  return table[tag >> 4] + table[tag & 0xf];
}

which generates:

   0:   0f b6 17                movzbl (%rdi),%edx
   3:   48 89 d0                mov    %rdx,%rax
   6:   48 c1 ea 04             shr    $0x4,%rdx
   a:   83 e0 0f                and    $0xf,%eax
   d:   48 8b 04 c5 00 00 00    mov    0x0(,%rax,8),%rax
  14:   00
                        11: R_X86_64_32S        table
  15:   48 03 04 d5 00 00 00    add    0x0(,%rdx,8),%rax
  1c:   00
                        19: R_X86_64_32S        table
  1d:   c3                      retq

and in this case the "mov %rdx,%rax" could be "mov %edx,%eax" because of the dominating movzbl.
Comment 2 Uroš Bizjak 2009-09-17 09:50:00 UTC
(In reply to comment #1)

> and in this case the "mov %rdx,%rax" could be "mov %edx,%eax" because of the
> dominating movzbl.

32bit moves and other instructions _SIGN_EXTEND_ results to 64bits on x86_64.

Comment 3 dean 2009-09-17 10:23:11 UTC
(In reply to comment #2)
> (In reply to comment #1)
> 
> > and in this case the "mov %rdx,%rax" could be "mov %edx,%eax" because of the
> > dominating movzbl.
> 
> 32bit moves and other instructions _SIGN_EXTEND_ results to 64bits on x86_64.
> 

every single data type in my example was unsigned.
Comment 4 dean 2009-09-17 10:27:59 UTC
(In reply to comment #2)
> 32bit moves and other instructions _SIGN_EXTEND_ results to 64bits on x86_64

wait i just reread your statement.

the amd64 ISA zero-extends 32-bit register writes out to 64-bits.  please go read the documentation.

-dean
Comment 5 H.J. Lu 2009-09-17 13:47:49 UTC
This is a dup for PR 17387.

*** This bug has been marked as a duplicate of 17387 ***
Comment 6 Paolo Bonzini 2009-09-24 06:06:01 UTC
Not a dup, since this is about shortening the mode, rather than about eliminating zero extensions.
Comment 7 Uroš Bizjak 2009-10-03 12:48:16 UTC
We regressed on the example from comment #1.

gcc-4.3 with -O2 generates:

foo:
	movzbl	(%rdi), %edx
	movq	%rdx, %rax
	shrq	$4, %rdx
	andl	$15, %eax
	movq	table(,%rax,8), %rax
	addq	table(,%rdx,8), %rax
	ret

And gcc-4.4+ -O2 generates:

foo:
	movzbl	(%rdi), %eax
	movq	%rax, %rcx
>>	movq	%rax, %rdx
	andl	$15, %ecx
	shrq	$4, %rdx
	movq	table(,%rcx,8), %rax
	addq	table(,%rdx,8), %rax
	ret

Please note extra move.
Comment 8 Tony 2011-02-01 16:45:31 UTC
Confirmed that both the example in the description and the example in comment #1 apply to GCC 4.3.5, 4.4.5, 4.5.2 and 4.6.0 (20110129).

Also confirmed the regression noted in comment #7, where an extra register is used (ecx), resulting in an additional mov instruction.  This regression is present in versions 4.4.5, 4.5.2 and 4.6.0 (20110129).  This regression could possibly be related to PR47521, which also first appeared in 4.4.x.
Comment 9 Tony 2013-04-10 02:01:27 UTC
GCC 4.8.0 with -O2 produces something similar to the original, so the regression noted in comment #7 and comment #8 is now resolved.

        movzbl  (%rdi), %eax
        shrq    $4, %rax
        movq    table(,%rax,8), %rax
        ret

However the original bug from comment #1 is still present.
Comment 10 Andrew Pinski 2021-12-25 07:19:50 UTC
For the original testcase here is what other compilers do:
LLVM trunk:

        movzbl  (%rdi), %eax
        shrq    %rax
        andl    $120, %eax
        movq    table(%rax), %rax
        retq

ICC 2021.3.0:
        movzbl    (%rdi), %eax                                  #5.29
        shrq      $4, %rax                                      #6.23
        movq      table(,%rax,8), %rax                          #6.10
        ret 

MSVC:
        movzx   eax, BYTE PTR [rcx]
        lea     rcx, OFFSET FLAT:unsigned long * table            ; table
        shr     rax, 4
        mov     eax, DWORD PTR [rcx+rax*4]
        ret     0


GCC trunk:
        movzbl  (%rdi), %eax
        shrq    $4, %rax
        movq    table(,%rax,8), %rax



Trying 7 -> 8:
    7: r89:DI=zero_extend([r92:DI])
      REG_DEAD r92:DI
    8: {r90:DI=r89:DI 0>>0x4;clobber flags:CC;}
      REG_DEAD r89:DI
      REG_UNUSED flags:CC
Failed to match this instruction:
(parallel [
        (set (reg:DI 90)
            (zero_extract:DI (mem:QI (reg:DI 92) [0 *p_4(D)+0 S1 A8])
                (const_int 4 [0x4])
                (const_int 4 [0x4])))
        (clobber (reg:CC 17 flags))
    ])
Failed to match this instruction:
(set (reg:DI 90)
    (zero_extract:DI (mem:QI (reg:DI 92) [0 *p_4(D)+0 S1 A8])
        (const_int 4 [0x4])
        (const_int 4 [0x4])))
Failed to match this instruction:
(set (reg:DI 90)
    (and:DI (subreg:DI (lshiftrt:QI (mem:QI (reg:DI 92) [0 *p_4(D)+0 S1 A8])
                (const_int 4 [0x4])) 0)
        (const_int 15 [0xf])))