[Bug tree-optimization/94828] Loop fusion is not implemented outside of ISL

Wed Apr 29 07:02:27 GMT 2020

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94828

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Loop fusion is not          |Loop fusion is not
                   |implemented                 |implemented outside of ISL

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Both loops are vectorized:

> ./cc1 -quiet y.c -O3 -fopt-info-vec
y.c:6:11: optimized: loop vectorized using 16 byte vectors
y.c:3:7: optimized: loop vectorized using 16 byte vectors

GCC fuses the loops with -floop-nest-optimize

[scheduler] original ast:
{
  for (int c0 = 0; c0 < P_20; c0 += 1)
    S_3(c0);
  for (int c0 = 0; c0 < P_20; c0 += 1)
    S_4(c0);
}

[scheduler] AST generated by isl:
for (int c0 = 0; c0 < P_20; c0 += 1) {
  S_3(c0);
  S_4(c0);
}

producing

.L4:
        movdqu  (%rdi,%rax), %xmm0
        movdqu  (%rsi,%rax), %xmm2
        paddd   %xmm2, %xmm0
        paddd   %xmm2, %xmm0
        movups  %xmm0, (%rdi,%rax)
        addq    $16, %rax
        cmpq    %rdx, %rax
        jne     .L4

but it's true that GCC does not implement classical loop fusion.