This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/60408] ARM: inefficient code for vget_lane_f32 intrinsic


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60408

--- Comment #3 from Jim Wilson <wilson at tuliptree dot org> ---
Even if we could fix the vec_extract constraints, we still end up with 3
instructions, as the optimizer can't do anything interesting with the
vec_extract RTL.

For a 32-bit SFmode value though, we can just use a subreg instead of a vector
extract.  The ARM port models the vector registers as 32-bit registers, so a
subreg for a 32-bit mode will always be valid.  Using a subreg instead of a
vector extract here, I get 2 instructions.
vmov.f32 s15, s0
vadd.f32 s0, s1, s15
That is because the register allocator thinks it needs a temp because
inputs and ouputs partially overlap.  That is a harder problem to fix.

Subregs should also work for 64-bit modes.

I have an experimental patch which is mostly untested.  I don't know if this
works for both big-endian and little-endian.  I don't know if this works for
all 32-bit modes and all vector types.  Etc.  All I know is that it seems to
work for this testcase.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]