This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/51179] poor vectorization on interlagos.
- From: "ubizjak at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 22 Nov 2011 22:00:36 +0000
- Subject: [Bug tree-optimization/51179] poor vectorization on interlagos.
- Auto-submitted: auto-generated
- References: <bug-51179-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179
--- Comment #7 from Uros Bizjak <ubizjak at gmail dot com> 2011-11-22 22:00:36 UTC ---
(In reply to comment #3)
> Your testcase doesn't ressemble the original, the inner for cycles need
> clearing of the iteration variable.
Ah, indeed... fingers were too fast.
One additional data point with -O2 -ftree-vectorize -mfma4 -mavx with all
loops:
movslq %r8d, %rax
movl $C+32, %edx
xorl %esi, %esi
leaq B(,%rax,8), %rcx
movl $C, %eax
.L3:
>> vmovsd 80(%rcx), %xmm1
addl $2, %esi
vmovapd A(%rdi), %ymm0
>> vmovddup %xmm1, %xmm1
vbroadcastsd (%rcx), %ymm2
addq $160, %rcx
>> vinsertf128 $1, %xmm1, %ymm1, %ymm1
vfmaddpd (%rax), %ymm2, %ymm0, %ymm2
vmovapd %ymm2, (%rax)
addq $64, %rax
vfmaddpd (%rdx), %ymm1, %ymm0, %ymm0
vmovapd %ymm0, (%rdx)
addq $64, %rdx
cmpl $10, %esi
jne .L3
This could be just "vbroadcastsd 80(%rcx), %ymm1". For some reason combine pass
does not form it.