[Bug target/82582] not quite optimal code for -2xy - 3*z: could use one less LEA for smaller code without increasing critical path latency for any input

Thu Aug 19 18:17:10 GMT 2021

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82582

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-08-19

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.

Note aarch64 does this now too:

        mneg    w0, w1, w0             ; w0 = -(w1*w0)
        sub     w2, w2, w2, lsl 2      ; w2 = w2 - w2*4
        add     w0, w2, w0, lsl 1      ; w0 = w2 + w0*2

clang is able to produce:
        imull   %esi, %edi
        leal    (%rdx,%rdx,2), %eax
        leal    (%rax,%rdi,2), %eax
        negl    %eax

MSVC is able to produce:
        imul    ecx, edx
        imul    eax, r8d, -3
        add     ecx, ecx
        sub     eax, ecx

GCC x86_64 produces:
        imull   %esi, %edi
        leal    0(,%rdx,4), %eax
        subl    %eax, %edx
        negl    %edi
        leal    (%rdx,%rdi,2), %eax

[Bug target/82582] not quite optimal code for -2*x*y - 3*z: could use one less LEA for smaller code without increasing critical path latency for any input

[Bug target/82582] not quite optimal code for -2xy - 3*z: could use one less LEA for smaller code without increasing critical path latency for any input