[PATCH v2 15/16]Arm: Add MVE RTL patterns for Complex Addition, Multiply and FMA.

Sat Nov 14 15:11:20 GMT 2020

ping

> -----Original Message-----
> From: Gcc-patches <gcc-patches-bounces@gcc.gnu.org> On Behalf Of Tamar
> Christina
> Sent: Friday, September 25, 2020 3:32 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw <Richard.Earnshaw@arm.com>; nd <nd@arm.com>;
> Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>
> Subject: [PATCH v2 15/16]Arm: Add MVE RTL patterns for Complex Addition,
> Multiply and FMA.
> 
> Hi All,
> 
> This adds implementation for the optabs for complex operations.  With this
> the following C code:
> 
>   void f90 (int _Complex a[restrict N], int _Complex b[restrict N],
> 	    int _Complex c[restrict N])
>   {
>     for (int i=0; i < N; i++)
>       c[i] = a[i] + (b[i] * I);
>   }
> 
> generates
> 
>   .L3:
> 	  mov     r3, r0
> 	  vldrw.32	q2, [r3]
> 	  mov     r3, r1
> 	  vldrw.32	q1, [r3]
> 	  mov     r3, r2
> 	  vcadd.i32       q3, q2, q1, #90
> 	  adds    r0, r0, #16
> 	  vstrw.32	q3, [r3]
> 	  adds    r1, r1, #16
> 	  adds    r2, r2, #16
> 	  le      lr, .L3
> 	  pop     {r4, r5, r6, r7, r8, pc}
> 
> which is not ideal due to register allocation and addressing mode issues with
> MVE in general.  However -frename-register cleans up the register allocation:
> 
>   .L3:
> 	  mov     r5, r0
> 	  mov     r6, r1
> 	  vldrw.32	q2, [r5]
> 	  vldrw.32	q1, [r6]
> 	  mov     r7, r2
> 	  vcadd.i32       q3, q2, q1, #90
> 	  adds    r0, r0, #16
> 	  vstrw.32	q3, [r7]
> 	  adds    r1, r1, #16
> 	  adds    r2, r2, #16
> 	  le      lr, .L3
> 	  pop     {r4, r5, r6, r7, r8, pc}
> 
> but leaves the addressing mode problems.
> 
> Before this patch it generated a scalar loop
> 
>   .L2:
> 	  ldr     r7, [r0, r3, lsl #2]
> 	  ldr     r5, [r6, r3, lsl #2]
> 	  ldr     r4, [r1, r3, lsl #2]
> 	  subs    r5, r7, r5
> 	  ldr     r7, [lr, r3, lsl #2]
> 	  add     r4, r4, r7
> 	  str     r5, [r2, r3, lsl #2]
> 	  str     r4, [ip, r3, lsl #2]
> 	  adds    r3, r3, #2
> 	  cmp     r3, #200
> 	  bne     .L2
> 	  pop     {r4, r5, r6, r7, pc}
> 
> 
> 
> Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.
> Cross compiled arm-none-eabi and ran with -march=armv8.1-
> m.main+mve.fp -mfloat-abi=hard -mfpu=auto and regression is on-going.
> 
> Unfortunately MVE does not currently implement auto-vectorization of
> floating point values.  As such I cannot test this directly.  But since they share
> 90% of the code with NEON these should just work whenever support is
> added so I would still like to commit these.
> 
> To support this I had to refactor the MVE bits a bit.  This now uses the same
> unspecs for both NEON and MVE and removes the unneeded different
> signed and unsigned unspecs since they both point to the signed instruction.
> 
> I have tried multiple approaches to cleaning this up but I think this is the
> nicest it can get given the slight ISA differences.
> 
> Ok for master if no issues?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* config/arm/arm_mve.h (__arm_vcaddq_rot90_u8,
> __arm_vcaddq_rot270_u8,
> 	, __arm_vcaddq_rot90_s8, __arm_vcaddq_rot270_s8,
> 	__arm_vcaddq_rot90_u16, __arm_vcaddq_rot270_u16,
> __arm_vcaddq_rot90_s16,
> 	__arm_vcaddq_rot270_s16, __arm_vcaddq_rot90_u32,
> 	__arm_vcaddq_rot270_u32, __arm_vcaddq_rot90_s32,
> 	__arm_vcaddq_rot270_s32, __arm_vcmulq_rot90_f16,
> 	__arm_vcmulq_rot270_f16, __arm_vcmulq_rot180_f16,
> 	__arm_vcmulq_f16, __arm_vcaddq_rot90_f16,
> __arm_vcaddq_rot270_f16,
> 	__arm_vcmulq_rot90_f32, __arm_vcmulq_rot270_f32,
> 	__arm_vcmulq_rot180_f32, __arm_vcmulq_f32,
> __arm_vcaddq_rot90_f32,
> 	__arm_vcaddq_rot270_f32, __arm_vcmlaq_f16,
> __arm_vcmlaq_rot180_f16,
> 	__arm_vcmlaq_rot270_f16, __arm_vcmlaq_rot90_f16,
> __arm_vcmlaq_f32,
> 	__arm_vcmlaq_rot180_f32, __arm_vcmlaq_rot270_f32,
> 	__arm_vcmlaq_rot90_f32): Update builtin calls.
> 	* config/arm/arm_mve_builtins.def (vcaddq_rot90_u,
> vcaddq_rot270_u,
> 	vcaddq_rot90_s, vcaddq_rot270_s, vcaddq_rot90_f,
> vcaddq_rot270_f,
> 	vcmulq_f, vcmulq_rot90_f, vcmulq_rot180_f, vcmulq_rot270_f,
> 	vcmlaq_f, vcmlaq_rot90_f, vcmlaq_rot180_f, vcmlaq_rot270_f):
> Removed.
> 	(vcaddq_rot90, vcaddq_rot270, vcmulq, vcmulq_rot90,
> vcmulq_rot180,
> 	vcmulq_rot270, vcmlaq, vcmlaq_rot90, vcmlaq_rot180,
> vcmlaq_rot270):
> 	New.
> 	* config/arm/constraints.md (Dz): Include MVE.
> 	* config/arm/iterators.md (mve_rotsplit1, mve_rotsplit2): New.
> 	* config/arm/mve.md (VCADDQ_ROT270_S, VCADDQ_ROT90_S,
> VCADDQ_ROT270_U,
> 	VCADDQ_ROT90_U, VCADDQ_ROT270_F, VCADDQ_ROT90_F,
> VCMULQ_F,
> 	VCMULQ_ROT180_F, VCMULQ_ROT270_F, VCMULQ_ROT90_F,
> VCMLAQ_F,
> 	VCMLAQ_ROT180_F, VCMLAQ_ROT90_F, VCMLAQ_ROT270_F,
> VCADDQ_ROT270_S,
> 	VCADDQ_ROT270, VCADDQ_ROT90): Removed.
> 	(mve_rot, VCMUL): New.
> 	(mve_vcaddq_rot270_<supf><mode,
> mve_vcaddq_rot90_<supf><mode>,
> 	mve_vcaddq_rot270_f<mode>, mve_vcaddq_rot90_f<mode>,
> mve_vcmulq_f<mode,
> 	mve_vcmulq_rot180_f<mode>, mve_vcmulq_rot270_f<mode>,
> 	mve_vcmulq_rot90_f<mode>, mve_vcmlaq_f<mode>,
> mve_vcmlaq_rot180_f<mode>,
> 	mve_vcmlaq_rot270_f<mode>, mve_vcmlaq_rot90_f<mode>):
> Removed.
> 	(mve_vcmlaq<mve_rot><mode>, mve_vcmulq<mve_rot><mode>,
> 	mve_vcaddq<mve_rot><mode>, cadd<rot><mode>3,
> mve_vcaddq<mve_rot><mode>):
> 	New.
> 	* config/arm/neon.md (cadd<rot><mode>3,
> cml<fcmac1><rot_op><mode>4):
> 	Moved.
> 	(cmul<rot_op><mode>3): Exclude MVE types.
> 	* config/arm/unspecs.md (UNSPEC_VCMUL90, UNSPEC_VCMUL270):
> New.
> 	* config/arm/vec-common.md (cadd<rot><mode>3,
> cmul<rot_op><mode>3,
> 	arm_vcmla<rot><mode>, cml<fcmac1><rot_op><mode>4): New.
> 
> --