This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Caused by unknown alignment, was: Re: On the x86_64, does one have to zero a vector register before filling it completely ?
- From: Toon Moene <toon at moene dot org>
- To: gcc mailing list <gcc at gcc dot gnu dot org>
- Date: Sun, 29 Nov 2009 19:12:34 +0100
- Subject: Re: Caused by unknown alignment, was: Re: On the x86_64, does one have to zero a vector register before filling it completely ?
- References: <4B1107A6.6010205@moene.org> <4B113B18.5070304@aol.com> <4B11740B.2020702@moene.org> <4B1280B7.1090604@moene.org>
Toon Moene wrote:
This is where IPA could help. I created the following main program:
real a(10), b(10), c(10)
a = 0.
b = 1.
print '(3(1x,z16))', loc(a), loc(b), loc(c)
call sum(a, b, c, 10)
print *, c(5)
end
So the alignment of a, b and c is known and is correct for vectorization
- still the loop in the subroutine looks like this (objdump -S a.out):
Inlining the "sum.f" subroutine by hand:
integer i
real a(10), b(10), c(10)
a = 0.
b = 1.
print '(3(1x,z16))', loc(a), loc(b), loc(c)
do i = 1, 10
c(i) = a(i) + b(i)
enddo
print *, c(5)
end
*does* lead to better code:
movaps 1056(%rsp), %xmm0
movq %rbp, %rdi
addps 1008(%rsp), %xmm0
movq $.LC2, 488(%rsp)
movaps %xmm0, 960(%rsp)
movl $9, 496(%rsp)
movaps 1072(%rsp), %xmm0
movl $128, 480(%rsp)
addps 1024(%rsp), %xmm0
movl $6, 484(%rsp)
movaps %xmm0, 976(%rsp)
movss 1088(%rsp), %xmm0
addss 1040(%rsp), %xmm0
movss %xmm0, 992(%rsp)
movss 1092(%rsp), %xmm0
addss 1044(%rsp), %xmm0
movss %xmm0, 996(%rsp)
i.e., a completely unrolled and (SLP) vectorized code.
So the potential is there - what we just need is an Alignment
Propagation Pass (analogous to the Constant and the Range Propagation pass).
--
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html