This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/69882] [6 regression] Excessive reduction statements generated by SLP


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69882

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ok, I think what goes wrong is

t.f90:13:0: note: Detected interleaving load *a_19(D)[_54] and *a_19(D)[_18]
t.f90:13:0: note: Detected interleaving load of size 4 starting with _55 =
*a_19(D)[_54];
t.f90:13:0: note: There is a gap of 2 elements after the group

but we end up loading 4 elements without handling the gap!

I have a patch  (that also makes vectorizing the testcase no longer
profitable).
w/o cost model we get

.L5:
        vmovupd (%rcx), %xmm0
        addl    $1, %r9d
        addq    $64, %rcx
        vmovupd -32(%rcx), %xmm1
        vinsertf128     $0x1, -48(%rcx), %ymm0, %ymm0
        vinsertf128     $0x1, -16(%rcx), %ymm1, %ymm1
        cmpl    %edx, %r9d
        vinsertf128     $1, %xmm1, %ymm0, %ymm0
        vmaxpd  %ymm0, %ymm2, %ymm2
        jb      .L5

which indeed looks not too profitable to me.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]