This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/61338] too many permutation in a vectorized "reverse loop"
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 28 May 2014 10:33:27 +0000
- Subject: [Bug tree-optimization/61338] too many permutation in a vectorized "reverse loop"
- Auto-submitted: auto-generated
- References: <bug-61338-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61338
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Status|UNCONFIRMED |NEW
Last reconfirmed| |2014-05-28
Blocks| |53947
Ever confirmed|0 |1
Severity|normal |enhancement
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed. We fail to detect that all DRs are accessed "reverse" which is the
case where we can drop the permutes. We also fail to reverse the
positive vectors if they happen to be lower in number:
float x[1024];
float y[1024];
float z[1024];
void foo() {
for (int i=0; i<512; ++i)
x[i] += y[1023-i]*z[512-i];
}
produces
.L2:
vpermd (%rdx), %ymm1, %ymm0
subq $32, %rdx
vpermd (%rcx), %ymm1, %ymm2
addq $32, %rax
vfmadd213ps -32(%rax), %ymm2, %ymm0
subq $32, %rcx
vmovaps %ymm0, -32(%rax)
cmpq $z-28, %rdx
jne .L2
instead of permuting the result before storing it.