This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex mutliplication and addition

From: Tamar Christina <Tamar dot Christina at arm dot com>
To: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
Cc: nd <nd at arm dot com>, Ramana Radhakrishnan <Ramana dot Radhakrishnan at arm dot com>, Richard Earnshaw <Richard dot Earnshaw at arm dot com>, "nickc at redhat dot com" <nickc at redhat dot com>, Kyrylo Tkachov <Kyrylo dot Tkachov at arm dot com>
Date: Tue, 11 Dec 2018 15:46:50 +0000
Subject: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex mutliplication and addition

Hi All,

This patch adds NEON intrinsics and tests for the Armv8.3-a complex
multiplication and add instructions with a rotate along the Argand plane.

The instructions are documented in the ArmARM[1] and the intrinsics specification
will be published on the Arm website [2].

The Lane versions of these instructions are special in that they always select a pair.
using index 0 means selecting lane 0 and 1.  Because of this the range check for the
intrinsics require special handling.

On Arm, in order to implement some of the lane intrinsics we're using the structure of the
register file.  The lane variant of these instructions always select a D register, but the data
itself can be stored in Q registers.  This means that for single precision complex numbers you are
only allowed to select D[0] but using the register file layout you can get the range 0-1 for lane indices
by selecting between Dn[0] and Dn+1[0].

Same reasoning applies for half float complex numbers, except there your D register indexes can be 0 or 1, so you have
a total range of 4 elements (for a V8HF).


[1] https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
[2] https://developer.arm.com/docs/101028/latest

Bootstrapped Regtested on arm-none-gnueabihf and no issues.

Ok for trunk?

Thanks,
Tamar

gcc/ChangeLog:

2018-12-11  Tamar Christina  <tamar.christina@arm.com>

	* config/arm/arm-builtins.c
	(enum arm_type_qualifiers): Add qualifier_lane_pair_index.
	(MAC_LANE_PAIR_QUALIFIERS): New.
	(arm_expand_builtin_args): Use it.
	(arm_expand_builtin_1): Likewise.
	* config/arm/arm-protos.h (neon_vcmla_lane_prepare_operands): New.
	* config/arm/arm.c (neon_vcmla_lane_prepare_operands): New.
	* config/arm/arm-c.c (arm_cpu_builtins): Add __ARM_FEATURE_COMPLEX.
	* config/arm/arm_neon.h:
	(vcadd_rot90_f16): New.
	(vcaddq_rot90_f16): New.
	(vcadd_rot270_f16): New.
	(vcaddq_rot270_f16): New.
	(vcmla_f16): New.
	(vcmlaq_f16): New.
	(vcmla_lane_f16): New.
	(vcmla_laneq_f16): New.
	(vcmlaq_lane_f16): New.
	(vcmlaq_laneq_f16): New.
	(vcmla_rot90_f16): New.
	(vcmlaq_rot90_f16): New.
	(vcmla_rot90_lane_f16): New.
	(vcmla_rot90_laneq_f16): New.
	(vcmlaq_rot90_lane_f16): New.
	(vcmlaq_rot90_laneq_f16): New.
	(vcmla_rot180_f16): New.
	(vcmlaq_rot180_f16): New.
	(vcmla_rot180_lane_f16): New.
	(vcmla_rot180_laneq_f16): New.
	(vcmlaq_rot180_lane_f16): New.
	(vcmlaq_rot180_laneq_f16): New.
	(vcmla_rot270_f16): New.
	(vcmlaq_rot270_f16): New.
	(vcmla_rot270_lane_f16): New.
	(vcmla_rot270_laneq_f16): New.
	(vcmlaq_rot270_lane_f16): New.
	(vcmlaq_rot270_laneq_f16): New.
	(vcadd_rot90_f32): New.
	(vcaddq_rot90_f32): New.
	(vcadd_rot270_f32): New.
	(vcaddq_rot270_f32): New.
	(vcmla_f32): New.
	(vcmlaq_f32): New.
	(vcmla_lane_f32): New.
	(vcmla_laneq_f32): New.
	(vcmlaq_lane_f32): New.
	(vcmlaq_laneq_f32): New.
	(vcmla_rot90_f32): New.
	(vcmlaq_rot90_f32): New.
	(vcmla_rot90_lane_f32): New.
	(vcmla_rot90_laneq_f32): New.
	(vcmlaq_rot90_lane_f32): New.
	(vcmlaq_rot90_laneq_f32): New.
	(vcmla_rot180_f32): New.
	(vcmlaq_rot180_f32): New.
	(vcmla_rot180_lane_f32): New.
	(vcmla_rot180_laneq_f32): New.
	(vcmlaq_rot180_lane_f32): New.
	(vcmlaq_rot180_laneq_f32): New.
	(vcmla_rot270_f32): New.
	(vcmlaq_rot270_f32): New.
	(vcmla_rot270_lane_f32): New.
	(vcmla_rot270_laneq_f32): New.
	(vcmlaq_rot270_lane_f32): New.
	(vcmlaq_rot270_laneq_f32): New.
	* config/arm/arm_neon_builtins.def (vcadd90, vcadd270, vcmla0, vcmla90,
	vcmla180, vcmla270, vcmla_lane0, vcmla_lane90, vcmla_lane180, vcmla_lane270,
	vcmla_laneq0, vcmla_laneq90, vcmla_laneq180, vcmla_laneq270,
	vcmlaq_lane0, vcmlaq_lane90, vcmlaq_lane180, vcmlaq_lane270): New.
	* config/arm/neon.md (neon_vcmla_lane<rot><mode>,
	neon_vcmla_laneq<rot><mode>, neon_vcmlaq_lane<rot><mode>): New.

gcc/testsuite/ChangeLog:

2018-12-11  Tamar Christina  <tamar.christina@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/vector-complex.c: Add AArch32 regexpr.
	* gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c: Likewise.

--

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 563ca51dcd0d63046d2bf577ca86d5f70a466bcf..1c7eac4b9eae55b76687b9239a2d71f31cc7b8d9 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -82,7 +82,10 @@ enum arm_type_qualifiers
   /* A void pointer.  */
   qualifier_void_pointer = 0x800,
   /* A const void pointer.  */
-  qualifier_const_void_pointer = 0x802
+  qualifier_const_void_pointer = 0x802,
+  /* Lane indices selected in pairs - must be within range of previous
+     argument = a vector.  */
+  qualifier_lane_pair_index = 0x1000
 };
 
 /*  The qualifier_internal allows generation of a unary builtin from
@@ -144,6 +147,13 @@ arm_mac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
       qualifier_none, qualifier_lane_index };
 #define MAC_LANE_QUALIFIERS (arm_mac_lane_qualifiers)
 
+/* T (T, T, T, lane pair index).  */
+static enum arm_type_qualifiers
+arm_mac_lane_pair_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none,
+      qualifier_none, qualifier_lane_pair_index };
+#define MAC_LANE_PAIR_QUALIFIERS (arm_mac_lane_pair_qualifiers)
+
 /* unsigned T (unsigned T, unsigned T, unsigend T, lane index).  */
 static enum arm_type_qualifiers
 arm_umac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -2129,6 +2139,7 @@ typedef enum {
   ARG_BUILTIN_CONSTANT,
   ARG_BUILTIN_LANE_INDEX,
   ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX,
+  ARG_BUILTIN_LANE_PAIR_INDEX,
   ARG_BUILTIN_NEON_MEMORY,
   ARG_BUILTIN_MEMORY,
   ARG_BUILTIN_STOP
@@ -2266,6 +2277,19 @@ arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
 		  machine_mode vmode = mode[argc - 1];
 		  neon_lane_bounds (op[argc], 0, GET_MODE_NUNITS (vmode), exp);
 		}
+	      /* If the lane index isn't a constant then error out.  */
+	      goto constant_arg;
+
+	    case ARG_BUILTIN_LANE_PAIR_INDEX:
+	      /* Previous argument must be a vector, which this indexes. The
+		 indexing will always select i and i+1 out of the vector, which
+		 puts a limit on i.  */
+	      gcc_assert (argc > 0);
+	      if (CONST_INT_P (op[argc]))
+		{
+		  machine_mode vmode = mode[argc - 1];
+		  neon_lane_bounds (op[argc], 0, GET_MODE_NUNITS (vmode) / 2, exp);
+		}
 	      /* If the lane index isn't a constant then the next
 		 case will error.  */
 	      /* Fall through.  */
@@ -2427,6 +2451,8 @@ arm_expand_builtin_1 (int fcode, tree exp, rtx target,
 
       if (d->qualifiers[qualifiers_k] & qualifier_lane_index)
 	args[k] = ARG_BUILTIN_LANE_INDEX;
+      else if (d->qualifiers[qualifiers_k] & qualifier_lane_pair_index)
+	args[k] = ARG_BUILTIN_LANE_PAIR_INDEX;
       else if (d->qualifiers[qualifiers_k] & qualifier_struct_load_store_lane_index)
 	args[k] = ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX;
       else if (d->qualifiers[qualifiers_k] & qualifier_immediate)
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 4471f7914cf282c516a142174f9913e491558b44..89afc65572f3cdc98fff15afb78ef3af602c5b72 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -76,6 +76,7 @@ arm_cpu_builtins (struct cpp_reader* pfile)
 
   def_or_undef_macro (pfile, "__ARM_FEATURE_CRC32", TARGET_CRC32);
   def_or_undef_macro (pfile, "__ARM_FEATURE_DOTPROD", TARGET_DOTPROD);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_COMPLEX", TARGET_COMPLEX);
   def_or_undef_macro (pfile, "__ARM_32BIT_STATE", TARGET_32BIT);
 
   cpp_undef (pfile, "__ARM_FEATURE_CMSE");
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index cea98669111d318954e9f6102db74172e675304b..f6fec824e68020794a58b94157e064e70b60c456 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -109,6 +109,8 @@ extern int arm_coproc_mem_operand (rtx, bool);
 extern int neon_vector_mem_operand (rtx, int, bool);
 extern int neon_struct_mem_operand (rtx);
 
+extern rtx *neon_vcmla_lane_prepare_operands (machine_mode, rtx *);
+
 extern int tls_mentioned_p (rtx);
 extern int symbol_mentioned_p (rtx);
 extern int label_mentioned_p (rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index cbcbeeb6e076bb8f632e5c31dd751937af4514f5..f1df6e585c4f8ceac0478d8cb9cd91bdc283f323 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12680,6 +12680,44 @@ neon_struct_mem_operand (rtx op)
   return FALSE;
 }
 
+/* Prepares the operands for the VCMLA by lane instruction such that the right
+   register number is selected.  This instruction is special in that it always
+   requires a D register, however there is a choice to be made between Dn[0],
+   Dn[1], D(n+1)[0], and D(n+1)[1] depending on the mode of the registers and
+   the PATTERNMODE of the insn.
+
+   The VCMLA by lane function always selects two values. For instance given D0
+   and a V2SF, the only valid index is 0 as the values in S0 and S1 will be
+   used by the instruction.  However given V4SF then index 0 and 1 are valid as
+   D0[0] or D1[0] are both valid.
+
+   This function centralizes that information based on OPERANDS, OPERANDS[3]
+   will be changed from a REG into a CONST_INT RTX and OPERANDS[4] will be
+   updated to contain the right index.  */
+
+rtx *
+neon_vcmla_lane_prepare_operands (machine_mode patternmode, rtx *operands)
+{
+  int lane = NEON_ENDIAN_LANE_N (patternmode, INTVAL (operands[4]));
+  machine_mode constmode = SImode;
+  machine_mode mode = GET_MODE (operands[3]);
+  int regno = REGNO (operands[3]);
+  regno = ((regno - FIRST_VFP_REGNUM) >> 1);
+  if (lane >= GET_MODE_NUNITS (mode) / 4)
+    {
+      operands[3] = gen_int_mode (regno + 1, constmode);
+      operands[4]
+	= gen_int_mode (lane - GET_MODE_NUNITS (mode) / 4, constmode);
+    }
+  else
+    {
+      operands[3] = gen_int_mode (regno, constmode);
+      operands[4] = gen_int_mode (lane, constmode);
+    }
+  return operands;
+}
+
+
 /* Return true if X is a register that will be eliminated later on.  */
 int
 arm_eliminable_register (rtx x)
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 6213a4aa0dabec756441523eee870e11485bb1c7..bb3acd20ff3ba6782b1be4363047f62fbb1779e8 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18307,6 +18307,445 @@ vfmlsl_laneq_high_u32 (float32x2_t __r, float16x4_t __a, float16x8_t __b,
 #pragma GCC pop_options
 #endif
 
+/* AdvSIMD Complex numbers intrinsics.  */
+#if __ARM_ARCH >= 8
+#pragma GCC push_options
+#pragma GCC target(("arch=armv8.3-a"))
+
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#pragma GCC push_options
+#pragma GCC target(("+fp16"))
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcadd_rot90_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcadd90v4hf (__a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcaddq_rot90_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcadd90v8hf (__a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcadd_rot270_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcadd90v4hf (__a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcaddq_rot270_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcadd90v8hf (__a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcmla0v4hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcmla0v8hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_lane_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b,
+		const int __index)
+{
+  return __builtin_neon_vcmla_lane0v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_laneq_f16 (float16x4_t __r, float16x4_t __a, float16x8_t __b,
+		 const int __index)
+{
+  return __builtin_neon_vcmla_laneq0v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_lane_f16 (float16x8_t __r, float16x8_t __a, float16x4_t __b,
+		 const int __index)
+{
+  return __builtin_neon_vcmlaq_lane0v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_laneq_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b,
+		  const int __index)
+{
+  return __builtin_neon_vcmla_lane0v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcmla90v4hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcmla90v8hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_lane_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b,
+		      const int __index)
+{
+  return __builtin_neon_vcmla_lane90v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_laneq_f16 (float16x4_t __r, float16x4_t __a, float16x8_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_laneq90v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_lane_f16 (float16x8_t __r, float16x8_t __a, float16x4_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmlaq_lane90v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_laneq_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_lane90v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcmla180v4hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcmla180v8hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_lane_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_lane180v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_laneq_f16 (float16x4_t __r, float16x4_t __a, float16x8_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_laneq180v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_lane_f16 (float16x8_t __r, float16x8_t __a, float16x4_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmlaq_lane180v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_laneq_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b,
+			 const int __index)
+{
+  return __builtin_neon_vcmla_lane180v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcmla270v4hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcmla270v8hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_lane_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_lane270v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_laneq_f16 (float16x4_t __r, float16x4_t __a, float16x8_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_laneq270v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_lane_f16 (float16x8_t __r, float16x8_t __a, float16x4_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmlaq_lane270v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_laneq_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b,
+			 const int __index)
+{
+  return __builtin_neon_vcmla_lane270v8hf (__r, __a, __b, __index);
+}
+
+#pragma GCC pop_options
+#endif
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcadd_rot90_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcadd90v2sf (__a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcaddq_rot90_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcadd90v4sf (__a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcadd_rot270_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcadd90v2sf (__a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcaddq_rot270_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcadd90v4sf (__a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcmla0v2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcmla0v4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_lane_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b,
+		const int __index)
+{
+  return __builtin_neon_vcmla_lane0v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_laneq_f32 (float32x2_t __r, float32x2_t __a, float32x4_t __b,
+		 const int __index)
+{
+  return __builtin_neon_vcmla_laneq0v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_lane_f32 (float32x4_t __r, float32x4_t __a, float32x2_t __b,
+		 const int __index)
+{
+  return __builtin_neon_vcmlaq_lane0v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
+		  const int __index)
+{
+  return __builtin_neon_vcmla_lane0v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcmla90v2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcmla90v4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_lane_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b,
+		      const int __index)
+{
+  return __builtin_neon_vcmla_lane90v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_laneq_f32 (float32x2_t __r, float32x2_t __a, float32x4_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_laneq90v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_lane_f32 (float32x4_t __r, float32x4_t __a, float32x2_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmlaq_lane90v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_lane90v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcmla180v2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcmla180v4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_lane_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_lane180v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_laneq_f32 (float32x2_t __r, float32x2_t __a, float32x4_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_laneq180v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_lane_f32 (float32x4_t __r, float32x4_t __a, float32x2_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmlaq_lane180v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
+			 const int __index)
+{
+  return __builtin_neon_vcmla_lane180v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcmla270v2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcmla270v4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_lane_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_lane270v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_laneq_f32 (float32x2_t __r, float32x2_t __a, float32x4_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_laneq270v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_lane_f32 (float32x4_t __r, float32x4_t __a, float32x2_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmlaq_lane270v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
+			 const int __index)
+{
+  return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
+}
+
+#pragma GCC pop_options
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 6ec293324fb879d9528ad6cc998d8a893f2cbaab..dcccc84940a9214d6795b4384e84de8150f2273d 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -351,3 +351,25 @@ VAR2 (TERNOP, sdot, v8qi, v16qi)
 VAR2 (UTERNOP, udot, v8qi, v16qi)
 VAR2 (MAC_LANE, sdot_lane, v8qi, v16qi)
 VAR2 (UMAC_LANE, udot_lane, v8qi, v16qi)
+
+VAR4 (BINOP, vcadd90, v4hf, v2sf, v8hf, v4sf)
+VAR4 (BINOP, vcadd270, v4hf, v2sf, v8hf, v4sf)
+VAR4 (TERNOP, vcmla0, v2sf, v4sf, v4hf, v8hf)
+VAR4 (TERNOP, vcmla90, v2sf, v4sf, v4hf, v8hf)
+VAR4 (TERNOP, vcmla180, v2sf, v4sf, v4hf, v8hf)
+VAR4 (TERNOP, vcmla270, v2sf, v4sf, v4hf, v8hf)
+
+VAR4 (MAC_LANE_PAIR, vcmla_lane0, v2sf, v4hf, v8hf, v4sf)
+VAR4 (MAC_LANE_PAIR, vcmla_lane90, v2sf, v4hf, v8hf, v4sf)
+VAR4 (MAC_LANE_PAIR, vcmla_lane180, v2sf, v4hf, v8hf, v4sf)
+VAR4 (MAC_LANE_PAIR, vcmla_lane270, v2sf, v4hf, v8hf, v4sf)
+
+VAR2 (MAC_LANE_PAIR, vcmla_laneq0, v2sf, v4hf)
+VAR2 (MAC_LANE_PAIR, vcmla_laneq90, v2sf, v4hf)
+VAR2 (MAC_LANE_PAIR, vcmla_laneq180, v2sf, v4hf)
+VAR2 (MAC_LANE_PAIR, vcmla_laneq270, v2sf, v4hf)
+
+VAR2 (MAC_LANE_PAIR, vcmlaq_lane0, v4sf, v8hf)
+VAR2 (MAC_LANE_PAIR, vcmlaq_lane90, v4sf, v8hf)
+VAR2 (MAC_LANE_PAIR, vcmlaq_lane180, v4sf, v8hf)
+VAR2 (MAC_LANE_PAIR, vcmlaq_lane270, v4sf, v8hf)
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index f50075bf5ffb6be6db1975087da0b468ab05a8a2..795d7e0b9f4aca4a9f5eba61b7fce2ceb7f006fb 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3463,6 +3463,51 @@
   [(set_attr "type" "neon_fcmla")]
 )
 
+(define_insn "neon_vcmla_lane<rot><mode>"
+  [(set (match_operand:VF 0 "s_register_operand" "=w")
+	(plus:VF (match_operand:VF 1 "s_register_operand" "0")
+		 (unspec:VF [(match_operand:VF 2 "s_register_operand" "w")
+			     (match_operand:VF 3 "s_register_operand" "<VF_constraint>")
+			     (match_operand:SI 4 "const_int_operand" "n")]
+			     VCMLA)))]
+  "TARGET_COMPLEX"
+  {
+    operands = neon_vcmla_lane_prepare_operands (<MODE>mode, operands);
+    return "vcmla.<V_s_elem>\t%<V_reg>0, %<V_reg>2, d%c3[%c4], #<rot>";
+  }
+  [(set_attr "type" "neon_fcmla")]
+)
+
+(define_insn "neon_vcmla_laneq<rot><mode>"
+  [(set (match_operand:VDF 0 "s_register_operand" "=w")
+	(plus:VDF (match_operand:VDF 1 "s_register_operand" "0")
+		  (unspec:VDF [(match_operand:VDF 2 "s_register_operand" "w")
+			      (match_operand:<V_DOUBLE> 3 "s_register_operand" "<VF_constraint>")
+			      (match_operand:SI 4 "const_int_operand" "n")]
+			      VCMLA)))]
+  "TARGET_COMPLEX"
+  {
+    operands = neon_vcmla_lane_prepare_operands (<MODE>mode, operands);
+    return "vcmla.<V_s_elem>\t%<V_reg>0, %<V_reg>2, d%c3[%c4], #<rot>";
+  }
+  [(set_attr "type" "neon_fcmla")]
+)
+
+(define_insn "neon_vcmlaq_lane<rot><mode>"
+  [(set (match_operand:VQ_HSF 0 "s_register_operand" "=w")
+	(plus:VQ_HSF (match_operand:VQ_HSF 1 "s_register_operand" "0")
+		 (unspec:VQ_HSF [(match_operand:VQ_HSF 2 "s_register_operand" "w")
+				 (match_operand:<V_HALF> 3 "s_register_operand" "<VF_constraint>")
+				 (match_operand:SI 4 "const_int_operand" "n")]
+				 VCMLA)))]
+  "TARGET_COMPLEX"
+  {
+    operands = neon_vcmla_lane_prepare_operands (<MODE>mode, operands);
+    return "vcmla.<V_s_elem>\t%<V_reg>0, %<V_reg>2, d%c3[%c4], #<rot>";
+  }
+  [(set_attr "type" "neon_fcmla")]
+)
+
 ;; The complex mla operations always need to expand to two instructions.
 ;; The first operation does half the computation and the second does the
 ;; remainder.  Because of this, expand early.
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c
index c5c6c905284214dfabdf289789e10e5d2ee2a1a9..514a4fe600a2ddb38db8a96e09feecc45a424c01 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c
@@ -1,4 +1,3 @@
-/* { dg-skip-if "" { arm-*-* } } */
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_v8_3a_complex_neon_ok } */
 /* { dg-add-options arm_v8_3a_complex_neon }  */
@@ -257,3 +256,22 @@ test_vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b)
 /* { dg-final { scan-assembler-times {fcmla\tv[0-9]+\.4s, v[0-9]+\.4s, v[0-9]+\.s\[1\], #270} 1 { target { aarch64*-*-* } } } } */
 /* { dg-final { scan-assembler-times {fcmla\tv[0-9]+\.4s, v[0-9]+\.4s, v[0-9]+\.s\[1\], #90} 1 { target { aarch64*-*-* } } } } */
 
+/* { dg-final { scan-assembler-times {vcadd.f32\td[0-9]+, d[0-9]+, d[0-9]+, #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcadd.f32\tq[0-9]+, q[0-9]+, q[0-9]+, #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+, #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+, #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+, #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+, #90} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #0} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #180} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #270} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #0} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #180} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #270} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, q[0-9]+, #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, q[0-9]+, #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, q[0-9]+, #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, q[0-9]+, #90} 1 { target { arm*-*-* } } } } */
+
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c
index ab62f03a213f303f2a4427ce7254f05f077c1ab7..ba9a66ed54af7a4ea108fea5fff12fb59f6c5706 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c
@@ -1,4 +1,3 @@
-/* { dg-skip-if "" { arm-*-* } } */
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_v8_3a_complex_neon_ok } */
 /* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */
@@ -308,3 +307,29 @@ test_vcmlaq_rot270_laneq_f16_2 (float16x8_t __r, float16x8_t __a, float16x8_t __
 /* { dg-final { scan-assembler-times {fcmla\tv[0-9]+\.8h, v[0-9]+\.8h, v[0-9]+\.h\[3\], #270} 1 { target { aarch64*-*-* } } } } */
 /* { dg-final { scan-assembler-times {fcmla\tv[0-9]+\.8h, v[0-9]+\.8h, v[0-9]+\.h\[3\], #90} 1 { target { aarch64*-*-* } } } } */
 
+/* { dg-final { scan-assembler-times {vcadd.f16\td[0-9]+, d[0-9]+, d[0-9]+, #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcadd.f16\tq[0-9]+, q[0-9]+, q[0-9]+, #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+, #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #0} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #180} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #270} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #90} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[1\], #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[1\], #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[1\], #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+, #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[1\], #90} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+, #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+, #90} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #0} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #180} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #270} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #90} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[1\], #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[1\], #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[1\], #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[1\], #90} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, q[0-9]+, #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, q[0-9]+, #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, q[0-9]+, #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, q[0-9]+, #90} 1 { target { arm*-*-* } } } } */

Follow-Ups:
- RE: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex mutliplication and addition
  - From: Tamar Christina
- Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex mutliplication and addition
  - From: Kyrill Tkachov

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]