This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/80819] [6/7/8 regression] Useless store to the stack in _mm_set_epi64x with SSE4 -mno-avx


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80819

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu.org,
                   |                            |jakub at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Honza, thoughts on this?  Given:
        movq    %rdi, -16(%rsp)
        movq    -16(%rsp), %xmm0
        pinsrq  $1, %rsi, %xmm0
I'd say if pinsrq $1, %rsi, %xmm0 is not too slow on recent AMD, then either
movq %rdi, %xmm0 should be also not too slow, or pinsrq $0, %rdi, %xmm0 should
be the way to go.
Note current trunk still emits a dead store with -mtune=intel -O2 -msse4:
        movq    %rsi, -16(%rsp)
        movq    %rdi, %xmm0
        pinsrq  $1, %rsi, %xmm0
and with -mtune=generic -O2 -msse4:
        movq    %rdi, -16(%rsp)
        movq    %rsi, -24(%rsp)
        movq    -16(%rsp), %xmm0
        pinsrq  $1, %rsi, %xmm0
Wonder why doesn't DSE eliminate it.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]