[Bug rtl-optimization/104950] GCC does not emit branchless code for load next to each other
crazylht at gmail dot com
gcc-bugzilla@gcc.gnu.org
Wed Mar 16 09:27:06 GMT 2022
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104950
Hongtao.liu <crazylht at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |crazylht at gmail dot com
--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #3)
> Ah, on aarch64 we get
>
> cmp w0, 0
> add x0, x1, 4
> csel x0, x0, x1, eq
> ldr w0, [x0]
>
> so we do not load from the possibly trapping mem. With the testcase I
> provided and -fno-tree-sink on x86_64 we get
Not for this one
float
foo (float a, float b, float *c, int i, int j)
{
return a > b ? c[i] : c[j];
}
gcc
vcomiss xmm0, xmm1
jbe .L6
movsx rsi, esi
vmovss xmm0, DWORD PTR [rdi+rsi*4]
ret
.L6:
movsx rdx, edx
vmovss xmm0, DWORD PTR [rdi+rdx*4]
ret
llvm
vucomiss xmm0, xmm1
cmovbe esi, edx
movsxd rax, esi
vmovss xmm0, dword ptr [rdi + 4*rax]
ret
More information about the Gcc-bugs
mailing list