94651 – Missed peephole optimization: m >= POWER_OF_TWO || n >= POWER_OF_TWO

Bug 94651 - Missed peephole optimization: m >= POWER_OF_TWO || n >= POWER_OF_TWO

Summary: Missed peephole optimization: m >= POWER_OF_TWO || n >= POWER_OF_TWO

Status:	RESOLVED DUPLICATE of bug 56719

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	9.3.0

Importance:	P3 enhancement
Target Milestone:	11.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2020-04-18 18:47 UTC by Pascal Cuoq
Modified:	2024-02-22 22:58 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:	x86_64-- i?86--
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Pascal Cuoq 2020-04-18 18:47:18 UTC

Consider the functions:

(Compiler Explorer link: https://gcc.godbolt.org/z/Uzd6nd )

#define POWER_OF_TWO (1UL << 20)

int check(unsigned long m, unsigned long n)
{
    return m >= POWER_OF_TWO || n >= POWER_OF_TWO;
}

void g(unsigned long, unsigned long);

void test1(unsigned long m, unsigned long n)
{
    if (m >= POWER_OF_TWO || n >= POWER_OF_TWO) g(m, 0);
}

void test2(unsigned long m, unsigned long n)
{
    if (m >= POWER_OF_TWO || n >= POWER_OF_TWO) g(m, n);
}

At least for the test1 and test2 functions, it seems that code that implements (m|n) >= POWER_OF_TWO will be faster on average for more input distributions than code with two comparisons on pretty much every modern architecture. This is what Clang 10 generates:

check:                                  # @check
        orq     %rsi, %rdi
        xorl    %eax, %eax
        cmpq    $1048575, %rdi          # imm = 0xFFFFF
        seta    %al
        retq
test1:                                  # @test1
        orq     %rdi, %rsi
        cmpq    $1048576, %rsi          # imm = 0x100000
        jb      .LBB1_1
        xorl    %esi, %esi
        jmp     g                       # TAILCALL
.LBB1_1:
        retq
test2:                                  # @test2
        movq    %rsi, %rax
        orq     %rdi, %rax
        cmpq    $1048576, %rax          # imm = 0x100000
        jb      .LBB2_1
        jmp     g                       # TAILCALL
.LBB2_1:
        retq


GCC 9.3 does one comparison after the other. This leads to extra instructions being necessary afterwards for the function check on x86, although it saves one register-register move in the function test2:

check:
        cmpq    $1048575, %rdi
        seta    %al
        cmpq    $1048575, %rsi
        seta    %dl
        orl     %edx, %eax
        movzbl  %al, %eax
        ret
test1:
        cmpq    $1048575, %rdi
        ja      .L6
        cmpq    $1048575, %rsi
        ja      .L6
        ret
.L6:
        xorl    %esi, %esi
        jmp     g
test2:
        cmpq    $1048575, %rdi
        ja      .L10
        cmpq    $1048575, %rsi
        ja      .L10
        ret
.L10:
        jmp     g

Comment 1 Andrew Pinski 2021-08-01 18:10:51 UTC

Fixed in GCC 11+:
_Z5checkmm:
        or      rdi, rsi
        xor     eax, eax
        cmp     rdi, 1048575
        seta    al
        ret
_Z5test1mm:
        or      rsi, rdi
        cmp     rsi, 1048575
        ja      .L5
        ret
.L5:
        xor     esi, esi
        jmp     _Z1gmm
_Z5test2mm:
        mov     rax, rdi
        or      rax, rsi
        cmp     rax, 1048575
        ja      .L8
        ret
.L8:
        jmp     _Z1gmm

This was done by PR 56719 which is an exact dup.

*** This bug has been marked as a duplicate of bug 56719 ***