[Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads
nmiell at comcast dot net
gcc-bugzilla@gcc.gnu.org
Thu Sep 24 23:14:00 GMT 2009
gcc (GCC) 4.4.1 20090725 (Red Hat 4.4.1-2)
The testcase (built with -Wall -O3):
#include <math.h>
void MulPi(float * __attribute__((aligned(16))) i, float *
__attribute__((aligned(16))) f, int n)
{
for (int j = 0; j < n; j++)
f[j] = (float) M_PI * i[j];
}
produces the following for the vectorized version of the loop:
.L7:
movaps %xmm1, %xmm0 # zero XMM0
incl %ecx
movlps (%rdi,%rax), %xmm0 # load the low half into XMM0
movhps 8(%rdi,%rax), %xmm0 # load the high half into XMM0
mulps %xmm2, %xmm0 # multiply by pi
movaps %xmm0, (%rsi,%rax) # store to memory
addq $16, %rax
cmpl %r8d, %ecx
jb .L7
--
Summary: vector loads are unnecessarily split into high and low
loads
Product: gcc
Version: 4.4.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: nmiell at comcast dot net
GCC build triplet: x86_64-linux-gnu
GCC host triplet: x86_64-linux-gnu
GCC target triplet: x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464
More information about the Gcc-bugs
mailing list