This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/54939] Very poor vectorization of loops with complex arithmetic
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 08 Jun 2016 13:47:49 +0000
- Subject: [Bug tree-optimization/54939] Very poor vectorization of loops with complex arithmetic
- Auto-submitted: auto-generated
- References: <bug-54939-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
With AVX2 we indeed generate
.L4:
vmovupd (%rdx,%rax), %ymm3
addl $1, %r9d
vpermpd $177, %ymm3, %ymm4
vmovapd %ymm3, %ymm2
vmulpd %ymm6, %ymm4, %ymm4
vfmsub132pd %ymm5, %ymm4, %ymm2
vfmadd132pd %ymm5, %ymm4, %ymm3
vshufpd $10, %ymm3, %ymm2, %ymm2
vaddpd (%rcx,%rax), %ymm2, %ymm2
vmovupd %ymm2, (%rcx,%rax)
addq $32, %rax
cmpl %esi, %r9d
jb .L4
thus either there is no addsub for %ymm or there is insufficient pattern
support
for it. Note that with AVX2 the above is what is generated even with the cost
model as it's now considered a profitable vectorization.