Consider the functions: (Compiler Explorer link: https://gcc.godbolt.org/z/Uzd6nd ) #define POWER_OF_TWO (1UL << 20) int check(unsigned long m, unsigned long n) { return m >= POWER_OF_TWO || n >= POWER_OF_TWO; } void g(unsigned long, unsigned long); void test1(unsigned long m, unsigned long n) { if (m >= POWER_OF_TWO || n >= POWER_OF_TWO) g(m, 0); } void test2(unsigned long m, unsigned long n) { if (m >= POWER_OF_TWO || n >= POWER_OF_TWO) g(m, n); } At least for the test1 and test2 functions, it seems that code that implements (m|n) >= POWER_OF_TWO will be faster on average for more input distributions than code with two comparisons on pretty much every modern architecture. This is what Clang 10 generates: check: # @check orq %rsi, %rdi xorl %eax, %eax cmpq $1048575, %rdi # imm = 0xFFFFF seta %al retq test1: # @test1 orq %rdi, %rsi cmpq $1048576, %rsi # imm = 0x100000 jb .LBB1_1 xorl %esi, %esi jmp g # TAILCALL .LBB1_1: retq test2: # @test2 movq %rsi, %rax orq %rdi, %rax cmpq $1048576, %rax # imm = 0x100000 jb .LBB2_1 jmp g # TAILCALL .LBB2_1: retq GCC 9.3 does one comparison after the other. This leads to extra instructions being necessary afterwards for the function check on x86, although it saves one register-register move in the function test2: check: cmpq $1048575, %rdi seta %al cmpq $1048575, %rsi seta %dl orl %edx, %eax movzbl %al, %eax ret test1: cmpq $1048575, %rdi ja .L6 cmpq $1048575, %rsi ja .L6 ret .L6: xorl %esi, %esi jmp g test2: cmpq $1048575, %rdi ja .L10 cmpq $1048575, %rsi ja .L10 ret .L10: jmp g
Fixed in GCC 11+: _Z5checkmm: or rdi, rsi xor eax, eax cmp rdi, 1048575 seta al ret _Z5test1mm: or rsi, rdi cmp rsi, 1048575 ja .L5 ret .L5: xor esi, esi jmp _Z1gmm _Z5test2mm: mov rax, rdi or rax, rsi cmp rax, 1048575 ja .L8 ret .L8: jmp _Z1gmm This was done by PR 56719 which is an exact dup. *** This bug has been marked as a duplicate of bug 56719 ***