[PATCH 4/4] aarch64: Use memcpy to copy structures in bfloat vst* intrinsics

Jonathan Wright Jonathan.Wright@arm.com
Thu Aug 5 17:17:39 GMT 2021


Hi,

As subject, this patch uses __builtin_memcpy to copy vector structures
instead of using a union - or constructing a new opaque structure one
vector at a time - in each of the vst[234][q] and vst1[q]_x[234] bfloat
Neon intrinsics in arm_neon.h.

It also adds new code generation tests to verify that superfluous move
instructions are not generated for the vst[234]q or vst1q_x[234] bfloat
intrinsics.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-30  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/arm_neon.h (vst1_bf16_x2): Use
	__builtin_memcpy instead of constructing an additional
	__builtin_aarch64_simd_oi one vector at a time.
	(vst1q_bf16_x2): Likewise.
	(vst1_bf16_x3): Use __builtin_memcpy instead of constructing
	an additional __builtin_aarch64_simd_ci one vector at a time.
	(vst1q_bf16_x3): Likewise.
	(vst1_bf16_x4): Use __builtin_memcpy instead of a union.
	(vst1q_bf16_x4): Likewise.
	(vst2_bf16): Use __builtin_memcpy instead of constructing an
	additional __builtin_aarch64_simd_oi one vector at a time.
	(vst2q_bf16): Likewise.
	(vst3_bf16): Use __builtin_memcpy instead of constructing an
	additional __builtin_aarch64_simd_ci mode one vector at a
	time.
	(vst3q_bf16): Likewise.
	(vst4_bf16): Use __builtin_memcpy instead of constructing an
	additional __builtin_aarch64_simd_xi one vector at a time.
	(vst4q_bf16): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
	tests.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rb14731.patch
Type: application/octet-stream
Size: 11231 bytes
Desc: rb14731.patch
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20210805/07281d51/attachment.obj>


More information about the Gcc-patches mailing list