[Bug tree-optimization/94828] Loop fusion is not implemented outside of ISL
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Wed Apr 29 07:02:27 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94828
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Loop fusion is not |Loop fusion is not
|implemented |implemented outside of ISL
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Both loops are vectorized:
> ./cc1 -quiet y.c -O3 -fopt-info-vec
y.c:6:11: optimized: loop vectorized using 16 byte vectors
y.c:3:7: optimized: loop vectorized using 16 byte vectors
GCC fuses the loops with -floop-nest-optimize
[scheduler] original ast:
{
for (int c0 = 0; c0 < P_20; c0 += 1)
S_3(c0);
for (int c0 = 0; c0 < P_20; c0 += 1)
S_4(c0);
}
[scheduler] AST generated by isl:
for (int c0 = 0; c0 < P_20; c0 += 1) {
S_3(c0);
S_4(c0);
}
producing
.L4:
movdqu (%rdi,%rax), %xmm0
movdqu (%rsi,%rax), %xmm2
paddd %xmm2, %xmm0
paddd %xmm2, %xmm0
movups %xmm0, (%rdi,%rax)
addq $16, %rax
cmpq %rdx, %rax
jne .L4
but it's true that GCC does not implement classical loop fusion.
More information about the Gcc-bugs
mailing list