[Bug rtl-optimization/104950] GCC does not emit branchless code for load next to each other

crazylht at gmail dot com gcc-bugzilla@gcc.gnu.org
Wed Mar 16 09:27:06 GMT 2022


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104950

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #3)
> Ah, on aarch64 we get
> 
>         cmp     w0, 0
>         add     x0, x1, 4
>         csel    x0, x0, x1, eq
>         ldr     w0, [x0]
> 
> so we do not load from the possibly trapping mem.  With the testcase I
> provided and -fno-tree-sink on x86_64 we get

Not for this one

float
foo (float a, float b, float *c, int i, int j)
{
    return a > b ? c[i] : c[j];
}

gcc
        vcomiss xmm0, xmm1
        jbe     .L6
        movsx   rsi, esi
        vmovss  xmm0, DWORD PTR [rdi+rsi*4]
        ret
.L6:
        movsx   rdx, edx
        vmovss  xmm0, DWORD PTR [rdi+rdx*4]
        ret
llvm
         vucomiss        xmm0, xmm1
        cmovbe  esi, edx
        movsxd  rax, esi
        vmovss  xmm0, dword ptr [rdi + 4*rax]
        ret


More information about the Gcc-bugs mailing list