[Bug middle-end/70434] New: adding an extraneous cast to vector type results in different code
zsojka at seznam dot cz
gcc-bugzilla@gcc.gnu.org
Tue Mar 29 10:56:00 GMT 2016
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70434
Bug ID: 70434
Summary: adding an extraneous cast to vector type results in
different code
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: zsojka at seznam dot cz
Target Milestone: ---
Created attachment 38119
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38119&action=edit
testcases
Originally observed in PR70421
When the attached code is compiled, barN() results in a different code compared
to fooN(), even though the only difference is a useless cast of 'vNsi v' to
'vNsi'.
For example, v4si on x86_64 -O3 -mavx512f -masm=intel:
foo4:
vpextrd edx, xmm0, 1
vmovd eax, xmm0
movsx rdi, edi
xor eax, edx
vpinsrd xmm1, xmm0, eax, 0
vmovaps XMMWORD PTR [rsp-24], xmm1
mov eax, DWORD PTR [rsp-24+rdi*4]
ret
bar4:
vmovaps XMMWORD PTR [rsp-24], xmm0
movsx rdi, edi
mov eax, DWORD PTR [rsp-20]
xor DWORD PTR [rsp-24], eax
mov eax, DWORD PTR [rsp-24+rdi*4]
ret
I haven't benchmarked which one is faster, but why is the code different at
all?
For foo32/bar32 case, bar32 is certainly faster, because foo32 creates an extra
copy of the variable on the stack.
More information about the Gcc-bugs
mailing list