[Bug middle-end/70434] New: adding an extraneous cast to vector type results in different code

Tue Mar 29 10:56:00 GMT 2016

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70434

            Bug ID: 70434
           Summary: adding an extraneous cast to vector type results in
                    different code
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zsojka at seznam dot cz
  Target Milestone: ---

Created attachment 38119
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38119&action=edit
testcases

Originally observed in PR70421

When the attached code is compiled, barN() results in a different code compared
to fooN(), even though the only difference is a useless cast of 'vNsi v' to
'vNsi'.

For example, v4si on x86_64 -O3 -mavx512f -masm=intel:

foo4:
        vpextrd edx, xmm0, 1
        vmovd   eax, xmm0
        movsx   rdi, edi
        xor     eax, edx
        vpinsrd xmm1, xmm0, eax, 0
        vmovaps XMMWORD PTR [rsp-24], xmm1
        mov     eax, DWORD PTR [rsp-24+rdi*4]
        ret

bar4:
        vmovaps XMMWORD PTR [rsp-24], xmm0
        movsx   rdi, edi
        mov     eax, DWORD PTR [rsp-20]
        xor     DWORD PTR [rsp-24], eax
        mov     eax, DWORD PTR [rsp-24+rdi*4]
        ret

I haven't benchmarked which one is faster, but why is the code different at
all?
For foo32/bar32 case, bar32 is certainly faster, because foo32 creates an extra
copy of the variable on the stack.