Consider the three functions check, test0 and test1: (Compiler Explorer link: https://gcc.godbolt.org/z/Sh4GpR ) #include <string.h> #define LARGE_POWER_OF_TWO (1UL << 40) int check(unsigned long m) { return m >= LARGE_POWER_OF_TWO; } void g(int); void test0(unsigned long m) { if (m >= LARGE_POWER_OF_TWO) g(0); } void test1(unsigned long m) { if (m >= LARGE_POWER_OF_TWO) g(m); } At least in the case of check and test0, the optimal way to compare m to 1<<40 is to shift m by 40 and compare the result to 0. This is the code generated for these functions by Clang 10: check: # @check xorl %eax, %eax shrq $40, %rdi setne %al retq test0: # @test0 shrq $40, %rdi je .LBB1_1 xorl %edi, %edi jmp g # TAILCALL .LBB1_1: retq In contrast, GCC 9.3 uses a 64-bit constant that needs to be loaded in a register with movabsq: check: movabsq $1099511627775, %rax cmpq %rax, %rdi seta %al movzbl %al, %eax ret test0: movabsq $1099511627775, %rax cmpq %rax, %rdi ja .L5 ret .L5: xorl %edi, %edi jmp g In the case of the function test1 the comparison is between these two version, because the shift is destructive: Clang10: test1: # @test1 movq %rdi, %rax shrq $40, %rax je .LBB2_1 jmp g # TAILCALL .LBB2_1: retq GCC9.3: test1: movabsq $1099511627775, %rax cmpq %rax, %rdi ja .L8 ret .L8: jmp g It is less obvious which approach is better in the case of the function test1, but generally speaking the shift approach should still be faster. The register-register move can be free on Skylake (in the sense of not needing any execution port), whereas movabsq requires an execution port and also it's a 10-byte instruction!
Confirmed.
Created attachment 48315 [details] Prototype patch Using this patch, the following asm is created (-O2): --cut here-- check: xorl %eax, %eax shrq $40, %rdi setne %al ret test0: shrq $40, %rdi jne .L5 ret .L5: xorl %edi, %edi jmp g test1: movq %rdi, %rax shrq $40, %rax jne .L8 ret .L8: jmp g --cut here--
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>: https://gcc.gnu.org/g:8ea03e9016cbca5a7ee2b4befa4d5c32467b0982 commit r11-37-g8ea03e9016cbca5a7ee2b4befa4d5c32467b0982 Author: Uros Bizjak <ubizjak@gmail.com> Date: Mon May 4 13:49:14 2020 +0200 i386: Use SHR to compare with large power-of-two constants [PR94650] Convert unsigned compares where m >= LARGE_POWER_OF_TWO and LARGE_POWER_OF_TWO represent an immediate where bit 33+ is set to use a SHR instruction and compare the result to 0. This avoids loading a large immediate with MOVABS insn. movabsq $1099511627775, %rax cmpq %rax, %rdi ja .L5 gets converted to: shrq $40, %rdi jne .L5 PR target/94650 * config/i386/predicates.md (shr_comparison_operator): New predicate. * config/i386/i386.md (compare->shr splitter): New splitters. testsuite/ChangeLog: PR target/94650 * gcc.targeti/i386/pr94650.c: New test.
Implemented for gcc-11.