This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/60408] ARM: inefficient code for vget_lane_f32 intrinsic
- From: "wilson at tuliptree dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Mon, 23 Mar 2015 16:16:57 +0000
- Subject: [Bug target/60408] ARM: inefficient code for vget_lane_f32 intrinsic
- Auto-submitted: auto-generated
- References: <bug-60408-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60408
--- Comment #3 from Jim Wilson <wilson at tuliptree dot org> ---
Even if we could fix the vec_extract constraints, we still end up with 3
instructions, as the optimizer can't do anything interesting with the
vec_extract RTL.
For a 32-bit SFmode value though, we can just use a subreg instead of a vector
extract. The ARM port models the vector registers as 32-bit registers, so a
subreg for a 32-bit mode will always be valid. Using a subreg instead of a
vector extract here, I get 2 instructions.
vmov.f32 s15, s0
vadd.f32 s0, s1, s15
That is because the register allocator thinks it needs a temp because
inputs and ouputs partially overlap. That is a harder problem to fix.
Subregs should also work for 64-bit modes.
I have an experimental patch which is mostly untested. I don't know if this
works for both big-endian and little-endian. I don't know if this works for
all 32-bit modes and all vector types. Etc. All I know is that it seems to
work for this testcase.