This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug target/60408] New: ARM: inefficient code for vget_lane_f32 intrinsic

From: "mans at mansr dot com" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Tue, 04 Mar 2014 10:57:44 +0000
Subject: [Bug target/60408] New: ARM: inefficient code for vget_lane_f32 intrinsic
Auto-submitted: auto-generated

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60408

            Bug ID: 60408
           Summary: ARM: inefficient code for vget_lane_f32 intrinsic
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mans at mansr dot com

Consider this trivial function:

#include <arm_neon.h>
float foo(float32x2_t v)
{
    return vget_lane_f32(v, 0) + vget_lane_f32(v, 1);
}

Compiling with gcc 4.9 trunk from 2014-03-02 yields this (non-code output
removed):

$ gcc -O3 -march=armv7-a -mfpu=neon -S -o - test.c
foo:
        vmov.32 r3, d0[0]
        vmov.32 r2, d0[1]
        fmsr    s15, r3
        fmsr    s0, r2
        fadds   s0, s0, s15
        bx      lr

A simple "fadds s0, s0, s1" is what one would expect from code like this.

Follow-Ups:
- [Bug target/60408] ARM: inefficient code for vget_lane_f32 intrinsic
  - From: ktkachov at gcc dot gnu.org

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]