This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/61338] too many permutation in a vectorized "reverse loop"


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61338

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2014-05-28
             Blocks|                            |53947
     Ever confirmed|0                           |1
           Severity|normal                      |enhancement

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  We fail to detect that all DRs are accessed "reverse" which is the
case where we can drop the permutes.  We also fail to reverse the
positive vectors if they happen to be lower in number:

float x[1024];
float y[1024];
float z[1024];

void foo() {
    for (int i=0; i<512; ++i)
      x[i] += y[1023-i]*z[512-i];
}

produces

.L2:
        vpermd  (%rdx), %ymm1, %ymm0
        subq    $32, %rdx
        vpermd  (%rcx), %ymm1, %ymm2
        addq    $32, %rax
        vfmadd213ps     -32(%rax), %ymm2, %ymm0
        subq    $32, %rcx
        vmovaps %ymm0, -32(%rax)
        cmpq    $z-28, %rdx
        jne     .L2

instead of permuting the result before storing it.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]