This patch makes the non-predicated vdupq_n MVE intrinsics use
vec_duplicate rather than an unspec. This enables the compiler to
generate better code sequences (for instance using vmov when
possible).
The patch renames the existing mve_vdup<mode> pattern into
@mve_vdupq_n<mode>, and removes the now useless
@mve_<mve_insn>q_n_f<mode> and @mve_<mve_insn>q_n_<supf><mode> ones.
As a side-effect, it needs to update the mve_unpredicated_insn
predicates in @mve_<mve_insn>q_m_n_<supf><mode> and
@mve_<mve_insn>q_m_n_f<mode>.
Using vec_duplicates means the compiler is now able to use vmov in the
tests with an immediate argument in vdupq_n_[su]{8,16,32}.c:
vmov.i8 q0,#0x1
However, this is only possible when the immediate has a suitable value
(MVE encoding constraints, see imm_for_neon_mov_operand predicate).
Provided we adjust the cost computations in arm_rtx_costs_internal(),
when the immediate does not meet the vmov constraints, we now generate:
mov r0, #imm
vdup.xx q0,r0
or
ldr r0, .L4
vdup.32 q0,r0
in the f32 case (with 1.1 as immediate).
Without the cost adjustment, we would generate:
vldr.64 d0, .L4
vldr.64 d1, .L4+8
and an associated literal pool entry.
Regarding the testsuite updates:
--------------------------------
* The signed versions of vdupq_* tests lack a version with an
immediate argument. This patch adds them, similar to what we already
have for vdupq_n_u*.c tests.
* Code generation for different immediate values is checked with the
new tests this patch introduces. Note there's no need for s8/u8 tests
because 8-bit immediates always comply wth imm_for_neon_mov_operand.
* We can remove xfail from vcmp*f tests since we now generate:
movw r3, #15462
vcmp.f16 eq, q0, r3
instead of the previous:
vldr.64 d6, .L5
vldr.64 d7, .L5+8
vcmp.f16 eq, q0, q3
Tested on arm-linux-gnueabihf and arm-none-eabi with no regression.
2024-07-02 Jolen Li <jolen.li@arm.com>
Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-base.cc (vdupq_impl): New class.
(vdupq): Use new implementation.
* config/arm/arm.cc (arm_rtx_costs_internal): Handle HFmode
for COST_DOUBLE. Update costing for CONST_VECTOR.
* config/arm/arm_mve_builtins.def: Merge vdupq_n_f, vdupq_n_s
and vdupq_n_u into vdupq_n.
* config/arm/mve.md (mve_vdup<mode>): Rename into ...
(@mve_vdup_n<mode>): ... this.
(@mve_<mve_insn>q_n_f<mode>): Delete.
(@mve_<mve_insn>q_n_<supf><mode>): Delete..
(@mve_<mve_insn>q_m_n_<supf><mode>): Update mve_unpredicated_insn
attribute.
(@mve_<mve_insn>q_m_n_f<mode>): Likewise.