This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/68482] New: No vectorization for x86-64
- From: "lvqcl.mail at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sun, 22 Nov 2015 12:43:53 +0000
- Subject: [Bug target/68482] New: No vectorization for x86-64
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68482
Bug ID: 68482
Summary: No vectorization for x86-64
Product: gcc
Version: 5.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: lvqcl.mail at gmail dot com
Target Milestone: ---
GCC ver: 5.2.0 and 4.9.2
Arch: x86-64
Options: -S -O2 -ftree-vectorize -msse2
Code:
#include <stdint.h>
void test(int32_t* input, int32_t* out, unsigned x1, unsigned x2)
{
unsigned i, j;
unsigned end = x1;
for(i = j = 0; i < 1000; i++) {
int32_t sum = 0;
end += x2;
for( ; j < end; j++)
sum += input[j];
out[i] = sum;
}
}
GCC is able to vectorize the loop for IA32 arch, but not x86-64.
The innermost loop for IA32:
L4:
movdqu (%ecx), %xmm1
addl $1, %ebx
addl $16, %ecx
cmpl %ebx, 4(%esp)
paddd %xmm1, %xmm0
ja L4
The innermost loop for x86-64:
.L3:
movl %eax, %r10d
addl $1, %eax
addl (%rcx,%r10,4), %edx
cmpl %eax, %r8d
jne .L3