This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/60408] New: ARM: inefficient code for vget_lane_f32 intrinsic
- From: "mans at mansr dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 04 Mar 2014 10:57:44 +0000
- Subject: [Bug target/60408] New: ARM: inefficient code for vget_lane_f32 intrinsic
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60408
Bug ID: 60408
Summary: ARM: inefficient code for vget_lane_f32 intrinsic
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: mans at mansr dot com
Consider this trivial function:
#include <arm_neon.h>
float foo(float32x2_t v)
{
return vget_lane_f32(v, 0) + vget_lane_f32(v, 1);
}
Compiling with gcc 4.9 trunk from 2014-03-02 yields this (non-code output
removed):
$ gcc -O3 -march=armv7-a -mfpu=neon -S -o - test.c
foo:
vmov.32 r3, d0[0]
vmov.32 r2, d0[1]
fmsr s15, r3
fmsr s0, r2
fadds s0, s0, s15
bx lr
A simple "fadds s0, s0, s1" is what one would expect from code like this.