for this code: extern unsigned long table[]; unsigned long foo(unsigned char *p) { unsigned long long tag = *p; return table[tag >> 4]; } gcc generates: 0000000000000000 <foo>: 0: 0f b6 07 movzbl (%rdi),%eax 3: 48 c1 e8 04 shr $0x4,%rax 7: 48 8b 04 c5 00 00 00 mov 0x0(,%rax,8),%rax e: 00 b: R_X86_64_32S table f: c3 retq that "shr $0x4,%rax" would be better as "shr $0x4,%eax" because it produces the same result (due to dominating movzbl) and it's one byte shorter which favours both space and the narrow decoder on the core2. thanks -dean /home/odo/gcc/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../gcc/configure --prefix=/home/odo/gcc --disable-multilib --disable-biarch x86_64-unknown-linux-gnu --enable-languages=c Thread model: posix gcc version 4.3.0 20071128 (experimental) (GCC)
oops i should have used an "unsigned long" for the tag rather than unsigned long long, not that it matters much. here's an expanded example showing another unnecessary REX: extern unsigned long table[]; unsigned long foo(unsigned char *p) { unsigned long tag = *p; return table[tag >> 4] + table[tag & 0xf]; } which generates: 0: 0f b6 17 movzbl (%rdi),%edx 3: 48 89 d0 mov %rdx,%rax 6: 48 c1 ea 04 shr $0x4,%rdx a: 83 e0 0f and $0xf,%eax d: 48 8b 04 c5 00 00 00 mov 0x0(,%rax,8),%rax 14: 00 11: R_X86_64_32S table 15: 48 03 04 d5 00 00 00 add 0x0(,%rdx,8),%rax 1c: 00 19: R_X86_64_32S table 1d: c3 retq and in this case the "mov %rdx,%rax" could be "mov %edx,%eax" because of the dominating movzbl.
(In reply to comment #1) > and in this case the "mov %rdx,%rax" could be "mov %edx,%eax" because of the > dominating movzbl. 32bit moves and other instructions _SIGN_EXTEND_ results to 64bits on x86_64.
(In reply to comment #2) > (In reply to comment #1) > > > and in this case the "mov %rdx,%rax" could be "mov %edx,%eax" because of the > > dominating movzbl. > > 32bit moves and other instructions _SIGN_EXTEND_ results to 64bits on x86_64. > every single data type in my example was unsigned.
(In reply to comment #2) > 32bit moves and other instructions _SIGN_EXTEND_ results to 64bits on x86_64 wait i just reread your statement. the amd64 ISA zero-extends 32-bit register writes out to 64-bits. please go read the documentation. -dean
This is a dup for PR 17387. *** This bug has been marked as a duplicate of 17387 ***
Not a dup, since this is about shortening the mode, rather than about eliminating zero extensions.
We regressed on the example from comment #1. gcc-4.3 with -O2 generates: foo: movzbl (%rdi), %edx movq %rdx, %rax shrq $4, %rdx andl $15, %eax movq table(,%rax,8), %rax addq table(,%rdx,8), %rax ret And gcc-4.4+ -O2 generates: foo: movzbl (%rdi), %eax movq %rax, %rcx >> movq %rax, %rdx andl $15, %ecx shrq $4, %rdx movq table(,%rcx,8), %rax addq table(,%rdx,8), %rax ret Please note extra move.
Confirmed that both the example in the description and the example in comment #1 apply to GCC 4.3.5, 4.4.5, 4.5.2 and 4.6.0 (20110129). Also confirmed the regression noted in comment #7, where an extra register is used (ecx), resulting in an additional mov instruction. This regression is present in versions 4.4.5, 4.5.2 and 4.6.0 (20110129). This regression could possibly be related to PR47521, which also first appeared in 4.4.x.
GCC 4.8.0 with -O2 produces something similar to the original, so the regression noted in comment #7 and comment #8 is now resolved. movzbl (%rdi), %eax shrq $4, %rax movq table(,%rax,8), %rax ret However the original bug from comment #1 is still present.
For the original testcase here is what other compilers do: LLVM trunk: movzbl (%rdi), %eax shrq %rax andl $120, %eax movq table(%rax), %rax retq ICC 2021.3.0: movzbl (%rdi), %eax #5.29 shrq $4, %rax #6.23 movq table(,%rax,8), %rax #6.10 ret MSVC: movzx eax, BYTE PTR [rcx] lea rcx, OFFSET FLAT:unsigned long * table ; table shr rax, 4 mov eax, DWORD PTR [rcx+rax*4] ret 0 GCC trunk: movzbl (%rdi), %eax shrq $4, %rax movq table(,%rax,8), %rax Trying 7 -> 8: 7: r89:DI=zero_extend([r92:DI]) REG_DEAD r92:DI 8: {r90:DI=r89:DI 0>>0x4;clobber flags:CC;} REG_DEAD r89:DI REG_UNUSED flags:CC Failed to match this instruction: (parallel [ (set (reg:DI 90) (zero_extract:DI (mem:QI (reg:DI 92) [0 *p_4(D)+0 S1 A8]) (const_int 4 [0x4]) (const_int 4 [0x4]))) (clobber (reg:CC 17 flags)) ]) Failed to match this instruction: (set (reg:DI 90) (zero_extract:DI (mem:QI (reg:DI 92) [0 *p_4(D)+0 S1 A8]) (const_int 4 [0x4]) (const_int 4 [0x4]))) Failed to match this instruction: (set (reg:DI 90) (and:DI (subreg:DI (lshiftrt:QI (mem:QI (reg:DI 92) [0 *p_4(D)+0 S1 A8]) (const_int 4 [0x4])) 0) (const_int 15 [0xf])))