This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/54939] Very poor vectorization of loops with complex arithmetic


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
With AVX2 we indeed generate

.L4:
        vmovupd (%rdx,%rax), %ymm3
        addl    $1, %r9d
        vpermpd $177, %ymm3, %ymm4
        vmovapd %ymm3, %ymm2
        vmulpd  %ymm6, %ymm4, %ymm4
        vfmsub132pd     %ymm5, %ymm4, %ymm2
        vfmadd132pd     %ymm5, %ymm4, %ymm3
        vshufpd $10, %ymm3, %ymm2, %ymm2
        vaddpd  (%rcx,%rax), %ymm2, %ymm2
        vmovupd %ymm2, (%rcx,%rax)
        addq    $32, %rax
        cmpl    %esi, %r9d
        jb      .L4

thus either there is no addsub for %ymm or there is insufficient pattern
support
for it.  Note that with AVX2 the above is what is generated even with the cost
model as it's now considered a profitable vectorization.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]