[Bug target/82582] not quite optimal code for -2*x*y - 3*z: could use one less LEA for smaller code without increasing critical path latency for any input
pinskia at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Thu Aug 19 18:17:10 GMT 2021
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82582
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
Last reconfirmed| |2021-08-19
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.
Note aarch64 does this now too:
mneg w0, w1, w0 ; w0 = -(w1*w0)
sub w2, w2, w2, lsl 2 ; w2 = w2 - w2*4
add w0, w2, w0, lsl 1 ; w0 = w2 + w0*2
clang is able to produce:
imull %esi, %edi
leal (%rdx,%rdx,2), %eax
leal (%rax,%rdi,2), %eax
negl %eax
MSVC is able to produce:
imul ecx, edx
imul eax, r8d, -3
add ecx, ecx
sub eax, ecx
GCC x86_64 produces:
imull %esi, %edi
leal 0(,%rdx,4), %eax
subl %eax, %edx
negl %edi
leal (%rdx,%rdi,2), %eax
More information about the Gcc-bugs
mailing list