]> gcc.gnu.org Git - gcc.git/commit
AArch64: Optimize right shift rounding narrowing
authorTamar Christina <tamar.christina@arm.com>
Thu, 2 Dec 2021 14:39:22 +0000 (14:39 +0000)
committerTamar Christina <tamar.christina@arm.com>
Thu, 2 Dec 2021 14:39:43 +0000 (14:39 +0000)
commit9b8830b6f3920b3ec6b9013230c687dc250bb6e9
tree1e5af8440fa2c7ff97be56d2b10d7304084f38dc
parentd47393d0b4d0d498795c4ae1353e6c156c1c4500
AArch64: Optimize right shift rounding narrowing

This optimizes right shift rounding narrow instructions to
rounding add narrow high where one vector is 0 when the shift amount is half
that of the original input type.

i.e.

uint32x4_t foo (uint64x2_t a, uint64x2_t b)
{
  return vrshrn_high_n_u64 (vrshrn_n_u64 (a, 32), b, 32);
}

now generates:

foo:
        movi    v3.4s, 0
        raddhn  v0.2s, v2.2d, v3.2d
        raddhn2 v0.4s, v2.2d, v3.2d

instead of:

foo:
        rshrn   v0.2s, v0.2d, 32
        rshrn2  v0.4s, v1.2d, 32
        ret

On Arm cores this is an improvement in both latency and throughput.
Because a vector zero is needed I created a new method
aarch64_gen_shareable_zero that creates zeros using V4SI and then takes a subreg
of the zero to the desired type.  This allows CSE to share all the zero
constants.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_gen_shareable_zero): New.
* config/aarch64/aarch64-simd.md (aarch64_rshrn<mode>,
aarch64_rshrn2<mode>): Generate rounding half-ing add when appropriate.
* config/aarch64/aarch64.c (aarch64_gen_shareable_zero): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd-intrinsics/shrn-1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/shrn-2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/shrn-3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/shrn-4.c: New test.
gcc/config/aarch64/aarch64-protos.h
gcc/config/aarch64/aarch64-simd.md
gcc/config/aarch64/aarch64.c
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/shrn-1.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/shrn-2.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/shrn-3.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/shrn-4.c [new file with mode: 0644]
This page took 0.065114 seconds and 5 git commands to generate.