This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/60408] New: ARM: inefficient code for vget_lane_f32 intrinsic


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60408

            Bug ID: 60408
           Summary: ARM: inefficient code for vget_lane_f32 intrinsic
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mans at mansr dot com

Consider this trivial function:

#include <arm_neon.h>
float foo(float32x2_t v)
{
    return vget_lane_f32(v, 0) + vget_lane_f32(v, 1);
}

Compiling with gcc 4.9 trunk from 2014-03-02 yields this (non-code output
removed):

$ gcc -O3 -march=armv7-a -mfpu=neon -S -o - test.c
foo:
        vmov.32 r3, d0[0]
        vmov.32 r2, d0[1]
        fmsr    s15, r3
        fmsr    s0, r2
        fadds   s0, s0, s15
        bx      lr

A simple "fadds s0, s0, s1" is what one would expect from code like this.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]