[Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads

Thu Sep 24 23:14:00 GMT 2009

gcc (GCC) 4.4.1 20090725 (Red Hat 4.4.1-2)

The testcase (built with -Wall -O3):

#include <math.h>

void MulPi(float * __attribute__((aligned(16))) i, float *
__attribute__((aligned(16))) f, int n)
{
        for (int j = 0; j < n; j++)
                f[j] = (float) M_PI * i[j];
}

produces the following for the vectorized version of the loop:

.L7:
        movaps  %xmm1, %xmm0            # zero XMM0
        incl    %ecx                    
        movlps  (%rdi,%rax), %xmm0      # load the low half into XMM0
        movhps  8(%rdi,%rax), %xmm0     # load the high half into XMM0
        mulps   %xmm2, %xmm0            # multiply by pi
        movaps  %xmm0, (%rsi,%rax)      # store to memory
        addq    $16, %rax
        cmpl    %r8d, %ecx
        jb      .L7

-- 
           Summary: vector loads are unnecessarily split into high and low
                    loads
           Product: gcc
           Version: 4.4.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: nmiell at comcast dot net
 GCC build triplet: x86_64-linux-gnu
  GCC host triplet: x86_64-linux-gnu
GCC target triplet: x86_64-linux-gnu

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464