This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/68482] New: No vectorization for x86-64


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68482

            Bug ID: 68482
           Summary: No vectorization for x86-64
           Product: gcc
           Version: 5.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: lvqcl.mail at gmail dot com
  Target Milestone: ---

GCC ver: 5.2.0 and 4.9.2
Arch: x86-64
Options: -S -O2 -ftree-vectorize -msse2
Code:

#include <stdint.h>

void test(int32_t* input, int32_t* out, unsigned x1, unsigned x2)
{
        unsigned i, j;
        unsigned end = x1;

        for(i = j = 0; i < 1000; i++) {
                int32_t sum = 0;
                end += x2;
                for( ; j < end; j++)
                        sum += input[j];
                out[i] = sum;
        }
}

GCC is able to vectorize the loop for IA32 arch, but not x86-64.

The innermost loop for IA32:
L4:
        movdqu  (%ecx), %xmm1
        addl    $1, %ebx
        addl    $16, %ecx
        cmpl    %ebx, 4(%esp)
        paddd   %xmm1, %xmm0
        ja      L4

The innermost loop for x86-64:
.L3:
        movl    %eax, %r10d
        addl    $1, %eax
        addl    (%rcx,%r10,4), %edx
        cmpl    %eax, %r8d
        jne     .L3

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]