]> gcc.gnu.org Git - gcc.git/commit
arm: [MVE intrinsics] Improve vdupq_n implementation
authorChristophe Lyon <christophe.lyon@arm.com>
Tue, 25 Jun 2024 13:47:23 +0000 (15:47 +0200)
committerChristophe Lyon <christophe.lyon@arm.com>
Wed, 16 Oct 2024 20:02:54 +0000 (22:02 +0200)
commit74caf97572d84c7c4503d10773e0f8e8544c50d9
tree4d5fe2235dd29f82f47c02c06b0e80f821dc866e
parent79dae32843854dacfff22f059a71b5a657d7c96f
arm: [MVE intrinsics] Improve vdupq_n implementation

This patch makes the non-predicated vdupq_n MVE intrinsics use
vec_duplicate rather than an unspec.  This enables the compiler to
generate better code sequences (for instance using vmov when
possible).

The patch renames the existing mve_vdup<mode> pattern into
@mve_vdupq_n<mode>, and removes the now useless
@mve_<mve_insn>q_n_f<mode> and @mve_<mve_insn>q_n_<supf><mode> ones.

As a side-effect, it needs to update the mve_unpredicated_insn
predicates in @mve_<mve_insn>q_m_n_<supf><mode> and
@mve_<mve_insn>q_m_n_f<mode>.

Using vec_duplicates means the compiler is now able to use vmov in the
tests with an immediate argument in vdupq_n_[su]{8,16,32}.c:
vmov.i8 q0,#0x1

However, this is only possible when the immediate has a suitable value
(MVE encoding constraints, see imm_for_neon_mov_operand predicate).

Provided we adjust the cost computations in arm_rtx_costs_internal(),
when the immediate does not meet the vmov constraints, we now generate:
mov r0, #imm
vdup.xx q0,r0

or
ldr r0, .L4
vdup.32 q0,r0
in the f32 case (with 1.1 as immediate).

Without the cost adjustment, we would generate:
vldr.64 d0, .L4
vldr.64 d1, .L4+8
and an associated literal pool entry.

Regarding the testsuite updates:
--------------------------------
* The signed versions of vdupq_* tests lack a version with an
immediate argument.  This patch adds them, similar to what we already
have for vdupq_n_u*.c tests.

* Code generation for different immediate values is checked with the
new tests this patch introduces.  Note there's no need for s8/u8 tests
because 8-bit immediates always comply wth imm_for_neon_mov_operand.

* We can remove xfail from vcmp*f tests since we now generate:
movw r3, #15462
vcmp.f16 eq, q0, r3
instead of the previous:
vldr.64 d6, .L5
vldr.64 d7, .L5+8
vcmp.f16 eq, q0, q3

Tested on arm-linux-gnueabihf and arm-none-eabi with no regression.

2024-07-02  Jolen Li  <jolen.li@arm.com>
    Christophe Lyon  <christophe.lyon@arm.com>

gcc/
* config/arm/arm-mve-builtins-base.cc (vdupq_impl): New class.
(vdupq): Use new implementation.
* config/arm/arm.cc (arm_rtx_costs_internal): Handle HFmode
for COST_DOUBLE. Update costing for CONST_VECTOR.
* config/arm/arm_mve_builtins.def: Merge vdupq_n_f, vdupq_n_s
and vdupq_n_u into vdupq_n.
* config/arm/mve.md (mve_vdup<mode>): Rename into ...
(@mve_vdup_n<mode>): ... this.
(@mve_<mve_insn>q_n_f<mode>): Delete.
(@mve_<mve_insn>q_n_<supf><mode>): Delete..
(@mve_<mve_insn>q_m_n_<supf><mode>): Update mve_unpredicated_insn
attribute.
(@mve_<mve_insn>q_m_n_f<mode>): Likewise.

gcc/testsuite/
* gcc.target/arm/mve/intrinsics/vdupq_n_u8.c (foo1): Update
expected code.
* gcc.target/arm/mve/intrinsics/vdupq_n_u16.c (foo1): Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_u32.c (foo1): Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_s8.c: Add test with
immediate argument.
* gcc.target/arm/mve/intrinsics/vdupq_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_f16.c (foo1): Update
expected code.
* gcc.target/arm/mve/intrinsics/vdupq_n_f32.c (foo1): Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c: Add test with
immediate argument.
* gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdupq_n_f32-2.c: New test.
* gcc.target/arm/mve/intrinsics/vdupq_n_s16-2.c: New test.
* gcc.target/arm/mve/intrinsics/vdupq_n_s32-2.c: New test.
* gcc.target/arm/mve/intrinsics/vdupq_n_u16-2.c: New test.
* gcc.target/arm/mve/intrinsics/vdupq_n_u32-2.c: New test.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c: Remove xfail.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c: Likewise.
35 files changed:
gcc/config/arm/arm-mve-builtins-base.cc
gcc/config/arm/arm.cc
gcc/config/arm/arm_mve_builtins.def
gcc/config/arm/mve.md
gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32-2.c [new file with mode: 0644]
gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s16-2.c [new file with mode: 0644]
gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s16.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s32-2.c [new file with mode: 0644]
gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s32.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_s8.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u16-2.c [new file with mode: 0644]
gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u16.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u32-2.c [new file with mode: 0644]
gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u32.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_u8.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c
gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c
This page took 0.072591 seconds and 5 git commands to generate.