[Bug fortran/102510] New: Function call has unnecessary aliasing check
dwwork at gmail dot com
gcc-bugzilla@gcc.gnu.org
Tue Sep 28 02:17:15 GMT 2021
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102510
Bug ID: 102510
Summary: Function call has unnecessary aliasing check
Product: gcc
Version: 11.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: fortran
Assignee: unassigned at gcc dot gnu.org
Reporter: dwwork at gmail dot com
Target Milestone: ---
The following 2 functions semantically do the same thing, they add two fixed
size arrays and store them into a third. When compiled with "-O3 -mavx" for
x86_64, I expect to see a single avx instruction. The first version does this
correctly, while the second has an aliasing check with a vectorized branch and
a scalar branch (I think). The second version is incorrect, and should produce
similar vectorized assembly to the first, as fortran does not allow function
arguments to alias. I could be wrong of course, but that is my understanding.
subroutine add2vecs1(a,b,c)
use iso_fortran_env, only: r32 => real32
real(r32), dimension(8), intent(in) :: a,b
real(r32), dimension(8), intent(out) :: c
c = a + b
end subroutine
Output Assembly (from godbolt.org, https://godbolt.org/z/aedEe7rGM):
add2vecs1_:
vmovups ymm0, YMMWORD PTR [rdi]
vaddps ymm0, ymm0, YMMWORD PTR [rsi]
vmovups YMMWORD PTR [rdx], ymm0
vzeroupper
ret
function add2vecs2(a,b)
use iso_fortran_env, only: r32 => real32
real(r32), dimension(8), intent(in) :: a,b
real(r32), dimension(8) :: add2vecs2
add2vecs2 = a + b
end function
Output Assembly:
add2vecs2_:
mov rax, QWORD PTR [rdi+40]
mov rcx, QWORD PTR [rdi]
test rax, rax
je .L5
cmp rax, 1
jne .L11
.L5:
vmovups ymm0, YMMWORD PTR [rdx]
vaddps ymm0, ymm0, YMMWORD PTR [rsi]
vmovups YMMWORD PTR [rcx], ymm0
vzeroupper
ret
.L11:
vmovups xmm1, XMMWORD PTR [rdx]
vaddps xmm0, xmm1, XMMWORD PTR [rsi]
lea rdi, [rcx+rax*8]
mov r8, rax
sal r8, 4
vmovss DWORD PTR [rcx], xmm0
vextractps DWORD PTR [rcx+rax*4], xmm0, 1
vextractps DWORD PTR [rcx+rax*8], xmm0, 2
vextractps DWORD PTR [rdi+rax*4], xmm0, 3
vmovups xmm0, XMMWORD PTR [rdx+16]
vaddps xmm0, xmm0, XMMWORD PTR [rsi+16]
lea rdi, [rcx+r8]
lea rdx, [rdi+rax*8]
vmovss DWORD PTR [rcx+r8], xmm0
vextractps DWORD PTR [rdi+rax*4], xmm0, 1
vextractps DWORD PTR [rdi+rax*8], xmm0, 2
vextractps DWORD PTR [rdx+rax*4], xmm0, 3
ret
More information about the Gcc-bugs
mailing list