This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions.

From: Matthew Wahab <matthew dot wahab at foss dot arm dot com>
To: Joseph Myers <joseph at codesourcery dot com>
Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>
Date: Mon, 4 Jul 2016 15:09:13 +0100
Subject: Re: [PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions.
Authentication-results: sourceware.org; auth=none
References: <573B28A3.9030603@foss.arm.com> <573B2CA9.5060703@foss.arm.com> <alpine.DEB.2.20.1605180057010.6325@digraph.polyomino.org.uk>

On 18/05/16 01:58, Joseph Myers wrote:
> On Tue, 17 May 2016, Matthew Wahab wrote:
>
>> As with the VFP FP16 arithmetic instructions, operations on __fp16
>> values are done by conversion to single-precision. Any new optimization
>> supported by the instruction descriptions can only apply to code
>> generated using intrinsics added in this patch series.
>
> As with the scalar instructions, I think it is legitimate in most cases to
> optimize arithmetic via single precision to work direct on __fp16 values
> (and this would be natural for vectorization of __fp16 arithmetic).
>
>> A number of the instructions are modelled as two variants, one using
>> UNSPEC and the other using RTL operations, with the model used decided
>> by the funsafe-math-optimizations flag. This follows the
>> single-precision instructions and is due to the half-precision
>> operations having the same conditions and restrictions on their use in
>> optmizations (when they are enabled).
>
> (Of course, these restrictions still apply.)

The F16 support generally follows the F32 implementation and, for F32,
direct arithmetic vector operations are only available when
unsafe-math-optimizations is enabled. I want to check the behaviour of
the F16 operations when unsafe-math is enabled so I'll defer to a follow
up patch the change to use standard names for the vector operations.

There are still some changes from the previous patch:

- Two fma/fmsub patterns *fma<VH:mode>4 and <*fmsub<VH:mode>4 are
  dropped since they just duplicated *fma<VH:mode>4_intrinsic and
  <*fmsub<VH:mode>4_intrinsic.

- Patterns neon_vadd<mode>_unspec and neon_vsub<mode>_unspec are
  dropped, they were redundant.

- <absneg_str><mode>2_fp16 is renamed to <absneg_str><mode>2. This
  implements the abs and neg operations which are always safe to use.

- neon_vsqrte<mode> is renamed to neon_vrsqrte<mode>. This is a
  misspelled intrinsic that wasn't caught in testing because the
  relevant test case is missing. The intrinsic is fixed here and in
  other patches and an advsimd-intrinsics test added later in the
  (updated) series.

- neon_vcvt<sup>_n<mode: The bounds on the scalar were wrong, the
  correct range for f16 is 0-17.

- Test armv8_2-fp16-arith-1.c is updated to expect f16 arithmetic
  instructions rather then f32 and to use the neon command line options.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/iterators.md (VCVTHI): New.
	(NEON_VCMP): Add UNSPEC_VCLT and UNSPEC_VCLE.  Fix a long line.
	(NEON_VAGLTE): New.
	(VFM_LANE_AS): New.
	(VH_CVTTO): New.
	(V_reg): Add HF, V4HF and V8HF.  Fix white-space.
	(V_HALF): Add V4HF.  Fix white-space.
	(V_if_elem): Add HF, V4HF and V8HF.  Fix white-space.
	(V_s_elem): Likewise.
	(V_sz_elem): Fix white-space.
	(V_elem_ch): Likewise.
	(VH_elem_ch): New.
	(scalar_mul_constraint): Add V8HF and V4HF.
	(Is_float_mode): Fix white-space.
	(Is_d_reg): Fix white-space.
	(q): Add HF.  Fix white-space.
	(float_sup): New.
	(float_SUP): New.
	(cmp_op_unsp): Add UNSPEC_VCALE and UNSPEC_VCALT.
	(neon_vfm_lane_as): New.
	* config/arm/neon.md (add<mode>3_fp16): New.
	(sub<mode>3_fp16): New.
	(mul<mode>3add<mode>_neon): New.
	(fma<VH:mode>4_intrinsic): New.
	(fmsub<VCVTF:mode>4_intrinsic): Fix white-space.
	(fmsub<VH:mode>4_intrinsic): New.
	(<absneg_str><mode>2): New.
	(neon_v<absneg_str><mode>): New.
	(neon_v<fp16_rnd_str><mode>): New.
	(neon_vrsqrte<mode>): New.
	(neon_vpaddv4hf): New.
	(neon_vadd<mode>): New.
	(neon_vsub<mode>): New.
	(neon_vmulf<mode>): New.
	(neon_vfma<VH:mode>): New.
	(neon_vfms<VH:mode>): New.
	(neon_vc<cmp_op><mode>): New.
	(neon_vc<cmp_op><mode>_fp16insn): New
	(neon_vc<cmp_op_unsp><mode>_fp16insn_unspec): New.
	(neon_vca<cmp_op><mode>): New.
	(neon_vca<cmp_op><mode>_fp16insn): New.
	(neon_vca<cmp_op_unsp><mode>_fp16insn_unspec): New.
	(neon_vc<cmp_op>z<mode>): New.
	(neon_vabd<mode>): New.
	(neon_v<maxmin>f<mode>): New.
	(neon_vp<maxmin>fv4hf: New.
	(neon_<fmaxmin_op><mode>): New.
	(neon_vrecps<mode>): New.
	(neon_vrsqrts<mode>): New.
	(neon_vrecpe<mode>): New (VH variant).
	(neon_vdup_lane<mode>_internal): New.
	(neon_vdup_lane<mode>): New.
	(neon_vcvt<sup><mode>): New (VCVTHI variant).
	(neon_vcvt<sup><mode>): New (VH variant).
	(neon_vcvt<sup>_n<mode>): New (VH variant).
	(neon_vcvt<sup>_n<mode>): New (VCVTHI variant).
	(neon_vcvt<vcvth_op><sup><mode>): New.
	(neon_vmul_lane<mode>): New.
	(neon_vmul_n<mode>): New.
	* config/arm/unspecs.md (UNSPEC_VCALE): New
	(UNSPEC_VCALT): New.
	(UNSPEC_VFMA_LANE): New.
	(UNSPECS_VFMS_LANE): New.

testsuite/
2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/armv8_2-fp16-arith-1.c: Use arm_v8_2a_fp16_neon
	options.  Add tests for float16x4_t and float16x8_t.

>From 4cbebc297f74f0c2e3ddac600d7902083c09c934 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wahab@arm.com>
Date: Thu, 7 Apr 2016 16:19:57 +0100
Subject: [PATCH 09/17] [PATCH 9/17][ARM] Add NEON FP16 arithmetic
 instructions.

2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/iterators.md (VCVTHI): New.
	(NEON_VCMP): Add UNSPEC_VCLT and UNSPEC_VCLE.  Fix a long line.
	(NEON_VAGLTE): New.
	(VFM_LANE_AS): New.
	(VH_CVTTO): New.
	(V_reg): Add HF, V4HF and V8HF.  Fix white-space.
	(V_HALF): Add V4HF.  Fix white-space.
	(V_if_elem): Add HF, V4HF and V8HF.  Fix white-space.
	(V_s_elem): Likewise.
	(V_sz_elem): Fix white-space.
	(V_elem_ch): Likewise.
	(VH_elem_ch): New.
	(scalar_mul_constraint): Add V8HF and V4HF.
	(Is_float_mode): Fix white-space.
	(Is_d_reg): Add V4HF and V8HF.  Fix white-space.
	(q): Add HF.  Fix white-space.
	(float_sup): New.
	(float_SUP): New.
	(cmp_op_unsp): Add UNSPEC_VCALE and UNSPEC_VCALT.
	(neon_vfm_lane_as): New.
	* config/arm/neon.md (add<mode>3_fp16): New.
	(sub<mode>3_fp16): New.
	(mul<mode>3add<mode>_neon): New.
	(fma<VH:mode>4_intrinsic): New.
	(fmsub<VCVTF:mode>4_intrinsic): Fix white-space.
	(fmsub<VH:mode>4_intrinsic): New.
	(<absneg_str><mode>2): New.
	(neon_v<absneg_str><mode>): New.
	(neon_v<fp16_rnd_str><mode>): New.
	(neon_vrsqrte<mode>): New.
	(neon_vpaddv4hf): New.
	(neon_vadd<mode>): New.
	(neon_vsub<mode>): New.
	(neon_vmulf<mode>): New.
	(neon_vfma<VH:mode>): New.
	(neon_vfms<VH:mode>): New.
	(neon_vc<cmp_op><mode>): New.
	(neon_vc<cmp_op><mode>_fp16insn): New
	(neon_vc<cmp_op_unsp><mode>_fp16insn_unspec): New.
	(neon_vca<cmp_op><mode>): New.
	(neon_vca<cmp_op><mode>_fp16insn): New.
	(neon_vca<cmp_op_unsp><mode>_fp16insn_unspec): New.
	(neon_vc<cmp_op>z<mode>): New.
	(neon_vabd<mode>): New.
	(neon_v<maxmin>f<mode>): New.
	(neon_vp<maxmin>fv4hf: New.
	(neon_<fmaxmin_op><mode>): New.
	(neon_vrecps<mode>): New.
	(neon_vrsqrts<mode>): New.
	(neon_vrecpe<mode>): New (VH variant).
	(neon_vdup_lane<mode>_internal): New.
	(neon_vdup_lane<mode>): New.
	(neon_vcvt<sup><mode>): New (VCVTHI variant).
	(neon_vcvt<sup><mode>): New (VH variant).
	(neon_vcvt<sup>_n<mode>): New (VH variant).
	(neon_vcvt<sup>_n<mode>): New (VCVTHI variant).
	(neon_vcvt<vcvth_op><sup><mode>): New.
	(neon_vmul_lane<mode>): New.
	(neon_vmul_n<mode>): New.
	* config/arm/unspecs.md (UNSPEC_VCALE): New
	(UNSPEC_VCALT): New.
	(UNSPEC_VFMA_LANE): New.
	(UNSPECS_VFMS_LANE): New.

testsuite/
2016-07-04  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/armv8_2-fp16-arith-1.c: Use arm_v8_2a_fp16_neon
	options.  Add tests for float16x4_t and float16x8_t.
---
 gcc/config/arm/iterators.md                        | 121 ++++--
 gcc/config/arm/neon.md                             | 459 ++++++++++++++++++++-
 gcc/config/arm/unspecs.md                          |   5 +-
 .../gcc.target/arm/armv8_2-fp16-arith-1.c          |  53 ++-
 4 files changed, 579 insertions(+), 59 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 9371b6a..be39e4a 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -145,6 +145,9 @@
 ;; Vector modes form int->float conversions.
 (define_mode_iterator VCVTI [V2SI V4SI])
 
+;; Vector modes for int->half conversions.
+(define_mode_iterator VCVTHI [V4HI V8HI])
+
 ;; Vector modes for doubleword multiply-accumulate, etc. insns.
 (define_mode_iterator VMD [V4HI V2SI V2SF])
 
@@ -267,10 +270,14 @@
 (define_int_iterator VRINT [UNSPEC_VRINTZ UNSPEC_VRINTP UNSPEC_VRINTM
                             UNSPEC_VRINTR UNSPEC_VRINTX UNSPEC_VRINTA])
 
-(define_int_iterator NEON_VCMP [UNSPEC_VCEQ UNSPEC_VCGT UNSPEC_VCGE UNSPEC_VCLT UNSPEC_VCLE])
+(define_int_iterator NEON_VCMP [UNSPEC_VCEQ UNSPEC_VCGT UNSPEC_VCGE
+				UNSPEC_VCLT UNSPEC_VCLE])
 
 (define_int_iterator NEON_VACMP [UNSPEC_VCAGE UNSPEC_VCAGT])
 
+(define_int_iterator NEON_VAGLTE [UNSPEC_VCAGE UNSPEC_VCAGT
+				  UNSPEC_VCALE UNSPEC_VCALT])
+
 (define_int_iterator VCVT [UNSPEC_VRINTP UNSPEC_VRINTM UNSPEC_VRINTA])
 
 (define_int_iterator NEON_VRINT [UNSPEC_NVRINTP UNSPEC_NVRINTZ UNSPEC_NVRINTM
@@ -398,6 +405,8 @@
 
 (define_int_iterator VQRDMLH_AS [UNSPEC_VQRDMLAH UNSPEC_VQRDMLSH])
 
+(define_int_iterator VFM_LANE_AS [UNSPEC_VFMA_LANE UNSPEC_VFMS_LANE])
+
 ;;----------------------------------------------------------------------------
 ;; Mode attributes
 ;;----------------------------------------------------------------------------
@@ -416,6 +425,10 @@
 (define_mode_attr V_cvtto [(V2SI "v2sf") (V2SF "v2si")
                            (V4SI "v4sf") (V4SF "v4si")])
 
+;; (Opposite) mode to convert to/from for vector-half mode conversions.
+(define_mode_attr VH_CVTTO [(V4HI "V4HF") (V4HF "V4HI")
+			    (V8HI "V8HF") (V8HF "V8HI")])
+
 ;; Define element mode for each vector mode.
 (define_mode_attr V_elem [(V8QI "QI") (V16QI "QI")
 			  (V4HI "HI") (V8HI "HI")
@@ -459,12 +472,13 @@
 
 ;; Register width from element mode
 (define_mode_attr V_reg [(V8QI "P") (V16QI "q")
-                         (V4HI "P") (V8HI  "q")
-                         (V4HF "P") (V8HF  "q")
-                         (V2SI "P") (V4SI  "q")
-                         (V2SF "P") (V4SF  "q")
-                         (DI   "P") (V2DI  "q")
-                         (SF   "")  (DF    "P")])
+			 (V4HI "P") (V8HI  "q")
+			 (V4HF "P") (V8HF  "q")
+			 (V2SI "P") (V4SI  "q")
+			 (V2SF "P") (V4SF  "q")
+			 (DI   "P") (V2DI  "q")
+			 (SF   "")  (DF    "P")
+			 (HF   "")])
 
 ;; Wider modes with the same number of elements.
 (define_mode_attr V_widen [(V8QI "V8HI") (V4HI "V4SI") (V2SI "V2DI")])
@@ -480,7 +494,7 @@
 (define_mode_attr V_HALF [(V16QI "V8QI") (V8HI "V4HI")
 			  (V8HF "V4HF") (V4SI  "V2SI")
 			  (V4SF "V2SF") (V2DF "DF")
-                          (V2DI "DI")])
+			  (V2DI "DI") (V4HF "HF")])
 
 ;; Same, but lower-case.
 (define_mode_attr V_half [(V16QI "v8qi") (V8HI "v4hi")
@@ -529,18 +543,22 @@
 ;; Get element type from double-width mode, for operations where we 
 ;; don't care about signedness.
 (define_mode_attr V_if_elem [(V8QI "i8")  (V16QI "i8")
-                 (V4HI "i16") (V8HI  "i16")
-                             (V2SI "i32") (V4SI  "i32")
-                             (DI   "i64") (V2DI  "i64")
-                 (V2SF "f32") (V4SF  "f32")
-                 (SF "f32") (DF "f64")])
+			     (V4HI "i16") (V8HI  "i16")
+			     (V2SI "i32") (V4SI  "i32")
+			     (DI   "i64") (V2DI  "i64")
+			     (V2SF "f32") (V4SF  "f32")
+			     (SF   "f32") (DF    "f64")
+			     (HF   "f16") (V4HF  "f16")
+			     (V8HF "f16")])
 
 ;; Same, but for operations which work on signed values.
 (define_mode_attr V_s_elem [(V8QI "s8")  (V16QI "s8")
-                (V4HI "s16") (V8HI  "s16")
-                            (V2SI "s32") (V4SI  "s32")
-                            (DI   "s64") (V2DI  "s64")
-                (V2SF "f32") (V4SF  "f32")])
+			    (V4HI "s16") (V8HI  "s16")
+			    (V2SI "s32") (V4SI  "s32")
+			    (DI   "s64") (V2DI  "s64")
+			    (V2SF "f32") (V4SF  "f32")
+			    (HF   "f16") (V4HF  "f16")
+			    (V8HF "f16")])
 
 ;; Same, but for operations which work on unsigned values.
 (define_mode_attr V_u_elem [(V8QI "u8")  (V16QI "u8")
@@ -557,17 +575,22 @@
                              (V2SF "32") (V4SF "32")])
 
 (define_mode_attr V_sz_elem [(V8QI "8")  (V16QI "8")
-                 (V4HI "16") (V8HI  "16")
-                             (V2SI "32") (V4SI  "32")
-                             (DI   "64") (V2DI  "64")
+			     (V4HI "16") (V8HI  "16")
+			     (V2SI "32") (V4SI  "32")
+			     (DI   "64") (V2DI  "64")
 			     (V4HF "16") (V8HF "16")
-                 (V2SF "32") (V4SF  "32")])
+			     (V2SF "32") (V4SF  "32")])
 
 (define_mode_attr V_elem_ch [(V8QI "b")  (V16QI "b")
-                             (V4HI "h") (V8HI  "h")
-                             (V2SI "s") (V4SI  "s")
-                             (DI   "d") (V2DI  "d")
-                             (V2SF "s") (V4SF  "s")])
+			     (V4HI "h") (V8HI  "h")
+			     (V2SI "s") (V4SI  "s")
+			     (DI   "d") (V2DI  "d")
+			     (V2SF "s") (V4SF  "s")
+			     (V2SF "s") (V4SF  "s")])
+
+(define_mode_attr VH_elem_ch [(V4HI "s") (V8HI  "s")
+			      (V4HF "s") (V8HF  "s")
+			      (HF "s")])
 
 ;; Element sizes for duplicating ARM registers to all elements of a vector.
 (define_mode_attr VD_dup [(V8QI "8") (V4HI "16") (V2SI "32") (V2SF "32")])
@@ -603,16 +626,17 @@
 ;; This mode attribute is used to obtain the correct register constraints.
 
 (define_mode_attr scalar_mul_constraint [(V4HI "x") (V2SI "t") (V2SF "t")
-                                         (V8HI "x") (V4SI "t") (V4SF "t")])
+					 (V8HI "x") (V4SI "t") (V4SF "t")
+					 (V8HF "x") (V4HF "x")])
 
 ;; Predicates used for setting type for neon instructions
 
 (define_mode_attr Is_float_mode [(V8QI "false") (V16QI "false")
-                 (V4HI "false") (V8HI "false")
-                 (V2SI "false") (V4SI "false")
-                 (V4HF "true") (V8HF "true")
-                 (V2SF "true") (V4SF "true")
-                 (DI "false") (V2DI "false")])
+				 (V4HI "false") (V8HI "false")
+				 (V2SI "false") (V4SI "false")
+				 (V4HF "true") (V8HF "true")
+				 (V2SF "true") (V4SF "true")
+				 (DI "false") (V2DI "false")])
 
 (define_mode_attr Scalar_mul_8_16 [(V8QI "true") (V16QI "true")
 				   (V4HI "true") (V8HI "true")
@@ -621,10 +645,10 @@
 				   (DI "false") (V2DI "false")])
 
 (define_mode_attr Is_d_reg [(V8QI "true") (V16QI "false")
-                            (V4HI "true") (V8HI  "false")
-                            (V2SI "true") (V4SI  "false")
-                            (V2SF "true") (V4SF  "false")
-                            (DI   "true") (V2DI  "false")
+			    (V4HI "true") (V8HI  "false")
+			    (V2SI "true") (V4SI  "false")
+			    (V2SF "true") (V4SF  "false")
+			    (DI   "true") (V2DI  "false")
 			    (V4HF "true") (V8HF  "false")])
 
 (define_mode_attr V_mode_nunits [(V8QI "8") (V16QI "16")
@@ -670,12 +694,14 @@
 
 ;; Mode attribute used to build the "type" attribute.
 (define_mode_attr q [(V8QI "") (V16QI "_q")
-                     (V4HI "") (V8HI "_q")
-                     (V2SI "") (V4SI "_q")
+		     (V4HI "") (V8HI "_q")
+		     (V2SI "") (V4SI "_q")
 		     (V4HF "") (V8HF "_q")
-                     (V2SF "") (V4SF "_q")
-                     (DI "")   (V2DI "_q")
-                     (DF "")   (V2DF "_q")])
+		     (V2SF "") (V4SF "_q")
+		     (V4HF "") (V8HF "_q")
+		     (DI "")   (V2DI "_q")
+		     (DF "")   (V2DF "_q")
+		     (HF "")])
 
 (define_mode_attr pf [(V8QI "p") (V16QI "p") (V2SF "f") (V4SF "f")])
 
@@ -718,6 +744,10 @@
 ;; Conversions.
 (define_code_attr FCVTI32typename [(unsigned_float "u32") (float "s32")])
 
+(define_code_attr float_sup [(unsigned_float "u") (float "s")])
+
+(define_code_attr float_SUP [(unsigned_float "U") (float "S")])
+
 ;;----------------------------------------------------------------------------
 ;; Int attributes
 ;;----------------------------------------------------------------------------
@@ -790,9 +820,10 @@
    (UNSPEC_VRNDP "vrintp") (UNSPEC_VRNDX "vrintx")])
 
 (define_int_attr cmp_op_unsp [(UNSPEC_VCEQ "eq") (UNSPEC_VCGT "gt")
-                              (UNSPEC_VCGE "ge") (UNSPEC_VCLE "le")
-                              (UNSPEC_VCLT "lt") (UNSPEC_VCAGE "ge")
-                              (UNSPEC_VCAGT "gt")])
+			      (UNSPEC_VCGE "ge") (UNSPEC_VCLE "le")
+			      (UNSPEC_VCLT "lt") (UNSPEC_VCAGE "ge")
+			      (UNSPEC_VCAGT "gt") (UNSPEC_VCALE "le")
+			      (UNSPEC_VCALT "lt")])
 
 (define_int_attr r [
   (UNSPEC_VRHADD_S "r") (UNSPEC_VRHADD_U "r")
@@ -908,3 +939,7 @@
 
 ;; Attributes for VQRDMLAH/VQRDMLSH
 (define_int_attr neon_rdma_as [(UNSPEC_VQRDMLAH "a") (UNSPEC_VQRDMLSH "s")])
+
+;; Attributes for VFMA_LANE/ VFMS_LANE
+(define_int_attr neon_vfm_lane_as
+ [(UNSPEC_VFMA_LANE "a") (UNSPEC_VFMS_LANE "s")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index c7bb121..0532333 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -505,6 +505,20 @@
                     (const_string "neon_add<q>")))]
 )
 
+(define_insn "add<mode>3_fp16"
+  [(set
+    (match_operand:VH 0 "s_register_operand" "=w")
+    (plus:VH
+     (match_operand:VH 1 "s_register_operand" "w")
+     (match_operand:VH 2 "s_register_operand" "w")))]
+ "TARGET_NEON_FP16INST"
+ "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set (attr "type")
+   (if_then_else (match_test "<Is_float_mode>")
+    (const_string "neon_fp_addsub_s<q>")
+    (const_string "neon_add<q>")))]
+)
+
 (define_insn "adddi3_neon"
   [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?w,?&r,?&r,?&r")
         (plus:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,w,r,0,r")
@@ -543,6 +557,17 @@
                     (const_string "neon_sub<q>")))]
 )
 
+(define_insn "sub<mode>3_fp16"
+ [(set
+   (match_operand:VH 0 "s_register_operand" "=w")
+   (minus:VH
+    (match_operand:VH 1 "s_register_operand" "w")
+    (match_operand:VH 2 "s_register_operand" "w")))]
+ "TARGET_NEON_FP16INST"
+ "vsub.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_sub<q>")]
+)
+
 (define_insn "subdi3_neon"
   [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?&r,?w")
         (minus:DI (match_operand:DI 1 "s_register_operand" "w,0,r,0,w")
@@ -591,6 +616,16 @@
 		    (const_string "neon_mla_<V_elem_ch><q>")))]
 )
 
+(define_insn "mul<mode>3add<mode>_neon"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+	(plus:VH (mult:VH (match_operand:VH 2 "s_register_operand" "w")
+			  (match_operand:VH 3 "s_register_operand" "w"))
+		  (match_operand:VH 1 "s_register_operand" "0")))]
+  "TARGET_NEON_FP16INST && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "vmla.f16\t%<V_reg>0, %<V_reg>2, %<V_reg>3"
+  [(set_attr "type" "neon_fp_mla_s<q>")]
+)
+
 (define_insn "mul<mode>3neg<mode>add<mode>_neon"
   [(set (match_operand:VDQW 0 "s_register_operand" "=w")
         (minus:VDQW (match_operand:VDQW 1 "s_register_operand" "0")
@@ -629,6 +664,19 @@
   [(set_attr "type" "neon_fp_mla_s<q>")]
 )
 
+;; There is limited support for unsafe-math optimizations using the NEON FP16
+;; arithmetic instructions, so only the intrinsic is currently supported.
+(define_insn "fma<VH:mode>4_intrinsic"
+ [(set (match_operand:VH 0 "register_operand" "=w")
+   (fma:VH
+    (match_operand:VH 1 "register_operand" "w")
+    (match_operand:VH 2 "register_operand" "w")
+    (match_operand:VH 3 "register_operand" "0")))]
+ "TARGET_NEON_FP16INST"
+ "vfma.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_mla_s<q>")]
+)
+
 (define_insn "*fmsub<VCVTF:mode>4"
   [(set (match_operand:VCVTF 0 "register_operand" "=w")
         (fma:VCVTF (neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w"))
@@ -640,13 +688,25 @@
 )
 
 (define_insn "fmsub<VCVTF:mode>4_intrinsic"
-  [(set (match_operand:VCVTF 0 "register_operand" "=w")
-        (fma:VCVTF (neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w"))
-		   (match_operand:VCVTF 2 "register_operand" "w")
-		   (match_operand:VCVTF 3 "register_operand" "0")))]
-  "TARGET_NEON && TARGET_FMA"
-  "vfms%?.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
-  [(set_attr "type" "neon_fp_mla_s<q>")]
+ [(set (match_operand:VCVTF 0 "register_operand" "=w")
+   (fma:VCVTF
+    (neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w"))
+    (match_operand:VCVTF 2 "register_operand" "w")
+    (match_operand:VCVTF 3 "register_operand" "0")))]
+ "TARGET_NEON && TARGET_FMA"
+ "vfms%?.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_mla_s<q>")]
+)
+
+(define_insn "fmsub<VH:mode>4_intrinsic"
+ [(set (match_operand:VH 0 "register_operand" "=w")
+   (fma:VH
+    (neg:VH (match_operand:VH 1 "register_operand" "w"))
+    (match_operand:VH 2 "register_operand" "w")
+    (match_operand:VH 3 "register_operand" "0")))]
+ "TARGET_NEON_FP16INST"
+ "vfms.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_mla_s<q>")]
 )
 
 (define_insn "neon_vrint<NEON_VRINT:nvrint_variant><VCVTF:mode>"
@@ -860,6 +920,44 @@
   ""
 )
 
+(define_insn "<absneg_str><mode>2"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+    (ABSNEG:VH (match_operand:VH 1 "s_register_operand" "w")))]
+ "TARGET_NEON_FP16INST"
+ "v<absneg_str>.<V_s_elem>\t%<V_reg>0, %<V_reg>1"
+ [(set_attr "type" "neon_abs<q>")]
+)
+
+(define_expand "neon_v<absneg_str><mode>"
+ [(set
+   (match_operand:VH 0 "s_register_operand")
+   (ABSNEG:VH (match_operand:VH 1 "s_register_operand")))]
+ "TARGET_NEON_FP16INST"
+{
+  emit_insn (gen_<absneg_str><mode>2 (operands[0], operands[1]));
+  DONE;
+})
+
+(define_insn "neon_v<fp16_rnd_str><mode>"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+    (unspec:VH
+     [(match_operand:VH 1 "s_register_operand" "w")]
+     FP16_RND))]
+ "TARGET_NEON_FP16INST"
+ "<fp16_rnd_insn>.<V_s_elem>\t%<V_reg>0, %<V_reg>1"
+ [(set_attr "type" "neon_fp_round_s<q>")]
+)
+
+(define_insn "neon_vrsqrte<mode>"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+    (unspec:VH
+     [(match_operand:VH 1 "s_register_operand" "w")]
+     UNSPEC_VRSQRTE))]
+  "TARGET_NEON_FP16INST"
+  "vrsqrte.f16\t%<V_reg>0, %<V_reg>1"
+ [(set_attr "type" "neon_fp_rsqrte_s<q>")]
+)
+
 (define_insn "*umin<mode>3_neon"
   [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
 	(umin:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w")
@@ -1601,6 +1699,17 @@
                     (const_string "neon_reduc_add<q>")))]
 )
 
+(define_insn "neon_vpaddv4hf"
+ [(set
+   (match_operand:V4HF 0 "s_register_operand" "=w")
+   (unspec:V4HF [(match_operand:V4HF 1 "s_register_operand" "w")
+		 (match_operand:V4HF 2 "s_register_operand" "w")]
+    UNSPEC_VPADD))]
+ "TARGET_NEON_FP16INST"
+ "vpadd.f16\t%P0, %P1, %P2"
+ [(set_attr "type" "neon_reduc_add")]
+)
+
 (define_insn "neon_vpsmin<mode>"
   [(set (match_operand:VD 0 "s_register_operand" "=w")
 	(unspec:VD [(match_operand:VD 1 "s_register_operand" "w")
@@ -1949,6 +2058,26 @@
   DONE;
 })
 
+(define_expand "neon_vadd<mode>"
+  [(match_operand:VH 0 "s_register_operand")
+   (match_operand:VH 1 "s_register_operand")
+   (match_operand:VH 2 "s_register_operand")]
+  "TARGET_NEON_FP16INST"
+{
+  emit_insn (gen_add<mode>3_fp16 (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "neon_vsub<mode>"
+  [(match_operand:VH 0 "s_register_operand")
+   (match_operand:VH 1 "s_register_operand")
+   (match_operand:VH 2 "s_register_operand")]
+  "TARGET_NEON_FP16INST"
+{
+  emit_insn (gen_sub<mode>3_fp16 (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 ; Note that NEON operations don't support the full IEEE 754 standard: in
 ; particular, denormal values are flushed to zero.  This means that GCC cannot
 ; use those instructions for autovectorization, etc. unless
@@ -2040,6 +2169,17 @@
                     (const_string "neon_mul_<V_elem_ch><q>")))]
 )
 
+(define_insn "neon_vmulf<mode>"
+ [(set
+   (match_operand:VH 0 "s_register_operand" "=w")
+   (mult:VH
+    (match_operand:VH 1 "s_register_operand" "w")
+    (match_operand:VH 2 "s_register_operand" "w")))]
+  "TARGET_NEON_FP16INST"
+  "vmul.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_mul_<VH_elem_ch><q>")]
+)
+
 (define_expand "neon_vmla<mode>"
   [(match_operand:VDQW 0 "s_register_operand" "=w")
    (match_operand:VDQW 1 "s_register_operand" "0")
@@ -2068,6 +2208,18 @@
   DONE;
 })
 
+(define_expand "neon_vfma<VH:mode>"
+  [(match_operand:VH 0 "s_register_operand")
+   (match_operand:VH 1 "s_register_operand")
+   (match_operand:VH 2 "s_register_operand")
+   (match_operand:VH 3 "s_register_operand")]
+  "TARGET_NEON_FP16INST"
+{
+  emit_insn (gen_fma<mode>4_intrinsic (operands[0], operands[2], operands[3],
+				       operands[1]));
+  DONE;
+})
+
 (define_expand "neon_vfms<VCVTF:mode>"
   [(match_operand:VCVTF 0 "s_register_operand")
    (match_operand:VCVTF 1 "s_register_operand")
@@ -2080,6 +2232,18 @@
   DONE;
 })
 
+(define_expand "neon_vfms<VH:mode>"
+  [(match_operand:VH 0 "s_register_operand")
+   (match_operand:VH 1 "s_register_operand")
+   (match_operand:VH 2 "s_register_operand")
+   (match_operand:VH 3 "s_register_operand")]
+  "TARGET_NEON_FP16INST"
+{
+  emit_insn (gen_fmsub<mode>4_intrinsic (operands[0], operands[2], operands[3],
+					 operands[1]));
+  DONE;
+})
+
 ; Used for intrinsics when flag_unsafe_math_optimizations is false.
 
 (define_insn "neon_vmla<mode>_unspec"
@@ -2380,6 +2544,72 @@
   [(set_attr "type" "neon_fp_compare_s<q>")]
 )
 
+(define_expand "neon_vc<cmp_op><mode>"
+ [(match_operand:<V_cmp_result> 0 "s_register_operand")
+  (neg:<V_cmp_result>
+   (COMPARISONS:VH
+    (match_operand:VH 1 "s_register_operand")
+    (match_operand:VH 2 "reg_or_zero_operand")))]
+ "TARGET_NEON_FP16INST"
+{
+  /* For FP comparisons use UNSPECS unless -funsafe-math-optimizations
+     are enabled.  */
+  if (GET_MODE_CLASS (<MODE>mode) == MODE_VECTOR_FLOAT
+      && !flag_unsafe_math_optimizations)
+    emit_insn
+      (gen_neon_vc<cmp_op><mode>_fp16insn_unspec
+       (operands[0], operands[1], operands[2]));
+  else
+    emit_insn
+      (gen_neon_vc<cmp_op><mode>_fp16insn
+       (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_insn "neon_vc<cmp_op><mode>_fp16insn"
+ [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w,w")
+   (neg:<V_cmp_result>
+    (COMPARISONS:<V_cmp_result>
+     (match_operand:VH 1 "s_register_operand" "w,w")
+     (match_operand:VH 2 "reg_or_zero_operand" "w,Dz"))))]
+ "TARGET_NEON_FP16INST
+  && !(GET_MODE_CLASS (<MODE>mode) == MODE_VECTOR_FLOAT
+  && !flag_unsafe_math_optimizations)"
+{
+  char pattern[100];
+  sprintf (pattern, "vc<cmp_op>.%s%%#<V_sz_elem>\t%%<V_reg>0,"
+	   " %%<V_reg>1, %s",
+	   GET_MODE_CLASS (<MODE>mode) == MODE_VECTOR_FLOAT
+	   ? "f" : "<cmp_type>",
+	   which_alternative == 0
+	   ? "%<V_reg>2" : "#0");
+  output_asm_insn (pattern, operands);
+  return "";
+}
+ [(set (attr "type")
+   (if_then_else (match_operand 2 "zero_operand")
+    (const_string "neon_compare_zero<q>")
+    (const_string "neon_compare<q>")))])
+
+(define_insn "neon_vc<cmp_op_unsp><mode>_fp16insn_unspec"
+ [(set
+   (match_operand:<V_cmp_result> 0 "s_register_operand" "=w,w")
+   (unspec:<V_cmp_result>
+    [(match_operand:VH 1 "s_register_operand" "w,w")
+     (match_operand:VH 2 "reg_or_zero_operand" "w,Dz")]
+    NEON_VCMP))]
+ "TARGET_NEON_FP16INST"
+{
+  char pattern[100];
+  sprintf (pattern, "vc<cmp_op_unsp>.f%%#<V_sz_elem>\t%%<V_reg>0,"
+	   " %%<V_reg>1, %s",
+	   which_alternative == 0
+	   ? "%<V_reg>2" : "#0");
+  output_asm_insn (pattern, operands);
+  return "";
+}
+ [(set_attr "type" "neon_fp_compare_s<q>")])
+
 (define_insn "neon_vc<cmp_op>u<mode>"
   [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
         (neg:<V_cmp_result>
@@ -2431,6 +2661,60 @@
   [(set_attr "type" "neon_fp_compare_s<q>")]
 )
 
+(define_expand "neon_vca<cmp_op><mode>"
+  [(set
+    (match_operand:<V_cmp_result> 0 "s_register_operand")
+    (neg:<V_cmp_result>
+     (GLTE:<V_cmp_result>
+      (abs:VH (match_operand:VH 1 "s_register_operand"))
+      (abs:VH (match_operand:VH 2 "s_register_operand")))))]
+ "TARGET_NEON_FP16INST"
+{
+  if (flag_unsafe_math_optimizations)
+    emit_insn (gen_neon_vca<cmp_op><mode>_fp16insn
+	       (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_neon_vca<cmp_op><mode>_fp16insn_unspec
+	       (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_insn "neon_vca<cmp_op><mode>_fp16insn"
+  [(set
+    (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
+    (neg:<V_cmp_result>
+     (GLTE:<V_cmp_result>
+      (abs:VH (match_operand:VH 1 "s_register_operand" "w"))
+      (abs:VH (match_operand:VH 2 "s_register_operand" "w")))))]
+ "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
+ "vac<cmp_op>.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_compare_s<q>")]
+)
+
+(define_insn "neon_vca<cmp_op_unsp><mode>_fp16insn_unspec"
+ [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
+   (unspec:<V_cmp_result>
+    [(match_operand:VH 1 "s_register_operand" "w")
+     (match_operand:VH 2 "s_register_operand" "w")]
+    NEON_VAGLTE))]
+ "TARGET_NEON"
+ "vac<cmp_op_unsp>.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_compare_s<q>")]
+)
+
+(define_expand "neon_vc<cmp_op>z<mode>"
+ [(set
+   (match_operand:<V_cmp_result> 0 "s_register_operand")
+   (COMPARISONS:<V_cmp_result>
+    (match_operand:VH 1 "s_register_operand")
+    (const_int 0)))]
+ "TARGET_NEON_FP16INST"
+ {
+  emit_insn (gen_neon_vc<cmp_op><mode> (operands[0], operands[1],
+					CONST0_RTX (<MODE>mode)));
+  DONE;
+})
+
 (define_insn "neon_vtst<mode>"
   [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
         (unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
@@ -2451,6 +2735,16 @@
   [(set_attr "type" "neon_abd<q>")]
 )
 
+(define_insn "neon_vabd<mode>"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+    (unspec:VH [(match_operand:VH 1 "s_register_operand" "w")
+		(match_operand:VH 2 "s_register_operand" "w")]
+     UNSPEC_VABD_F))]
+ "TARGET_NEON_FP16INST"
+ "vabd.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  [(set_attr "type" "neon_abd<q>")]
+)
+
 (define_insn "neon_vabdf<mode>"
   [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
         (unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
@@ -2513,6 +2807,40 @@
   [(set_attr "type" "neon_fp_minmax_s<q>")]
 )
 
+(define_insn "neon_v<maxmin>f<mode>"
+ [(set (match_operand:VH 0 "s_register_operand" "=w")
+   (unspec:VH
+    [(match_operand:VH 1 "s_register_operand" "w")
+     (match_operand:VH 2 "s_register_operand" "w")]
+    VMAXMINF))]
+ "TARGET_NEON_FP16INST"
+ "v<maxmin>.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_minmax_s<q>")]
+)
+
+(define_insn "neon_vp<maxmin>fv4hf"
+ [(set (match_operand:V4HF 0 "s_register_operand" "=w")
+   (unspec:V4HF
+    [(match_operand:V4HF 1 "s_register_operand" "w")
+     (match_operand:V4HF 2 "s_register_operand" "w")]
+    VPMAXMINF))]
+ "TARGET_NEON_FP16INST"
+ "vp<maxmin>.f16\t%P0, %P1, %P2"
+  [(set_attr "type" "neon_reduc_minmax")]
+)
+
+(define_insn "neon_<fmaxmin_op><mode>"
+ [(set
+   (match_operand:VH 0 "s_register_operand" "=w")
+   (unspec:VH
+    [(match_operand:VH 1 "s_register_operand" "w")
+     (match_operand:VH 2 "s_register_operand" "w")]
+    VMAXMINFNM))]
+ "TARGET_NEON_FP16INST"
+ "<fmaxmin_op>.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_minmax_s<q>")]
+)
+
 ;; Vector forms for the IEEE-754 fmax()/fmin() functions
 (define_insn "<fmaxmin><mode>3"
   [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
@@ -2584,6 +2912,17 @@
   [(set_attr "type" "neon_fp_recps_s<q>")]
 )
 
+(define_insn "neon_vrecps<mode>"
+  [(set
+    (match_operand:VH 0 "s_register_operand" "=w")
+    (unspec:VH [(match_operand:VH 1 "s_register_operand" "w")
+		(match_operand:VH 2 "s_register_operand" "w")]
+     UNSPEC_VRECPS))]
+  "TARGET_NEON_FP16INST"
+  "vrecps.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  [(set_attr "type" "neon_fp_recps_s<q>")]
+)
+
 (define_insn "neon_vrsqrts<mode>"
   [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
         (unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
@@ -2594,6 +2933,17 @@
   [(set_attr "type" "neon_fp_rsqrts_s<q>")]
 )
 
+(define_insn "neon_vrsqrts<mode>"
+  [(set
+    (match_operand:VH 0 "s_register_operand" "=w")
+    (unspec:VH [(match_operand:VH 1 "s_register_operand" "w")
+		 (match_operand:VH 2 "s_register_operand" "w")]
+     UNSPEC_VRSQRTS))]
+ "TARGET_NEON_FP16INST"
+ "vrsqrts.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_rsqrts_s<q>")]
+)
+
 (define_expand "neon_vabs<mode>"
   [(match_operand:VDQW 0 "s_register_operand" "")
    (match_operand:VDQW 1 "s_register_operand" "")]
@@ -2709,6 +3059,15 @@
 })
 
 (define_insn "neon_vrecpe<mode>"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+	(unspec:VH [(match_operand:VH 1 "s_register_operand" "w")]
+		   UNSPEC_VRECPE))]
+  "TARGET_NEON_FP16INST"
+  "vrecpe.f16\t%<V_reg>0, %<V_reg>1"
+  [(set_attr "type" "neon_fp_recpe_s<q>")]
+)
+
+(define_insn "neon_vrecpe<mode>"
   [(set (match_operand:V32 0 "s_register_operand" "=w")
 	(unspec:V32 [(match_operand:V32 1 "s_register_operand" "w")]
                     UNSPEC_VRECPE))]
@@ -3251,6 +3610,28 @@ if (BYTES_BIG_ENDIAN)
   [(set_attr "type" "neon_fp_cvt_narrow_s_q")]
 )
 
+(define_insn "neon_vcvt<sup><mode>"
+ [(set
+   (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
+   (unspec:<VH_CVTTO>
+    [(match_operand:VCVTHI 1 "s_register_operand" "w")]
+    VCVT_US))]
+ "TARGET_NEON_FP16INST"
+ "vcvt.f16.<sup>%#16\t%<V_reg>0, %<V_reg>1"
+  [(set_attr "type" "neon_int_to_fp_<VH_elem_ch><q>")]
+)
+
+(define_insn "neon_vcvt<sup><mode>"
+ [(set
+   (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
+   (unspec:<VH_CVTTO>
+    [(match_operand:VH 1 "s_register_operand" "w")]
+    VCVT_US))]
+ "TARGET_NEON_FP16INST"
+ "vcvt.<sup>%#16.f16\t%<V_reg>0, %<V_reg>1"
+  [(set_attr "type" "neon_fp_to_int_<VH_elem_ch><q>")]
+)
+
 (define_insn "neon_vcvt<sup>_n<mode>"
   [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
 	(unspec:<V_CVTTO> [(match_operand:VCVTF 1 "s_register_operand" "w")
@@ -3265,6 +3646,20 @@ if (BYTES_BIG_ENDIAN)
 )
 
 (define_insn "neon_vcvt<sup>_n<mode>"
+ [(set (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
+   (unspec:<VH_CVTTO>
+    [(match_operand:VH 1 "s_register_operand" "w")
+     (match_operand:SI 2 "immediate_operand" "i")]
+    VCVT_US_N))]
+  "TARGET_NEON_FP16INST"
+{
+  neon_const_bounds (operands[2], 0, 17);
+  return "vcvt.<sup>%#16.f16\t%<V_reg>0, %<V_reg>1, %2";
+}
+ [(set_attr "type" "neon_fp_to_int_<VH_elem_ch><q>")]
+)
+
+(define_insn "neon_vcvt<sup>_n<mode>"
   [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
 	(unspec:<V_CVTTO> [(match_operand:VCVTI 1 "s_register_operand" "w")
 			   (match_operand:SI 2 "immediate_operand" "i")]
@@ -3277,6 +3672,31 @@ if (BYTES_BIG_ENDIAN)
   [(set_attr "type" "neon_int_to_fp_<V_elem_ch><q>")]
 )
 
+(define_insn "neon_vcvt<sup>_n<mode>"
+ [(set (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
+   (unspec:<VH_CVTTO>
+    [(match_operand:VCVTHI 1 "s_register_operand" "w")
+     (match_operand:SI 2 "immediate_operand" "i")]
+    VCVT_US_N))]
+ "TARGET_NEON_FP16INST"
+{
+  neon_const_bounds (operands[2], 0, 17);
+  return "vcvt.f16.<sup>%#16\t%<V_reg>0, %<V_reg>1, %2";
+}
+ [(set_attr "type" "neon_int_to_fp_<VH_elem_ch><q>")]
+)
+
+(define_insn "neon_vcvt<vcvth_op><sup><mode>"
+ [(set
+   (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
+   (unspec:<VH_CVTTO>
+    [(match_operand:VH 1 "s_register_operand" "w")]
+    VCVT_HF_US))]
+ "TARGET_NEON_FP16INST"
+ "vcvt<vcvth_op>.<sup>%#16.f16\t%<V_reg>0, %<V_reg>1"
+  [(set_attr "type" "neon_fp_to_int_<VH_elem_ch><q>")]
+)
+
 (define_insn "neon_vmovn<mode>"
   [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
 	(unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")]
@@ -3347,6 +3767,18 @@ if (BYTES_BIG_ENDIAN)
                    (const_string "neon_mul_<V_elem_ch>_scalar<q>")))]
 )
 
+(define_insn "neon_vmul_lane<mode>"
+  [(set (match_operand:VH 0 "s_register_operand" "=w")
+	(unspec:VH [(match_operand:VH 1 "s_register_operand" "w")
+		    (match_operand:V4HF 2 "s_register_operand"
+		     "<scalar_mul_constraint>")
+		     (match_operand:SI 3 "immediate_operand" "i")]
+		     UNSPEC_VMUL_LANE))]
+  "TARGET_NEON_FP16INST"
+  "vmul.f16\t%<V_reg>0, %<V_reg>1, %P2[%c3]"
+  [(set_attr "type" "neon_fp_mul_s_scalar<q>")]
+)
+
 (define_insn "neon_vmull<sup>_lane<mode>"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
 	(unspec:<V_widen> [(match_operand:VMDI 1 "s_register_operand" "w")
@@ -3601,6 +4033,19 @@ if (BYTES_BIG_ENDIAN)
   DONE;
 })
 
+(define_expand "neon_vmul_n<mode>"
+  [(match_operand:VH 0 "s_register_operand")
+   (match_operand:VH 1 "s_register_operand")
+   (match_operand:<V_elem> 2 "s_register_operand")]
+  "TARGET_NEON_FP16INST"
+{
+  rtx tmp = gen_reg_rtx (V4HFmode);
+  emit_insn (gen_neon_vset_lanev4hf (tmp, operands[2], tmp, const0_rtx));
+  emit_insn (gen_neon_vmul_lane<mode> (operands[0], operands[1], tmp,
+				       const0_rtx));
+  DONE;
+})
+
 (define_expand "neon_vmulls_n<mode>"
   [(match_operand:<V_widen> 0 "s_register_operand" "")
    (match_operand:VMDI 1 "s_register_operand" "")
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 57a47ff..bee8795 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -191,6 +191,8 @@
   UNSPEC_VBSL
   UNSPEC_VCAGE
   UNSPEC_VCAGT
+  UNSPEC_VCALE
+  UNSPEC_VCALT
   UNSPEC_VCEQ
   UNSPEC_VCGE
   UNSPEC_VCGEU
@@ -258,6 +260,8 @@
   UNSPEC_VMLSL_S_LANE
   UNSPEC_VMLSL_U_LANE
   UNSPEC_VMLSL_LANE
+  UNSPEC_VFMA_LANE
+  UNSPEC_VFMS_LANE
   UNSPEC_VMOVL_S
   UNSPEC_VMOVL_U
   UNSPEC_VMOVN
@@ -387,4 +391,3 @@
   UNSPEC_VRNDP
   UNSPEC_VRNDX
 ])
-
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c
index e7da3fc..b88f43f 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c
@@ -1,7 +1,7 @@
 /* { dg-do compile }  */
-/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok }  */
 /* { dg-options "-O2 -ffast-math" }  */
-/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
 
 /* Test instructions generated for half-precision arithmetic.  */
 
@@ -9,6 +9,9 @@ typedef __fp16 float16_t;
 typedef __simd64_float16_t float16x4_t;
 typedef __simd128_float16_t float16x8_t;
 
+typedef short int16x4_t __attribute__ ((vector_size (8)));
+typedef short int int16x8_t  __attribute__ ((vector_size (16)));
+
 float16_t
 fp16_abs (float16_t a)
 {
@@ -50,15 +53,49 @@ TEST_CMP (greaterthan, >, int, float16_t)
 TEST_CMP (lessthanequal, <=, int, float16_t)
 TEST_CMP (greaterthanqual, >=, int, float16_t)
 
+/* Vectors of size 4.  */
+
+TEST_UNOP (neg, -, float16x4_t)
+
+TEST_BINOP (add, +, float16x4_t)
+TEST_BINOP (sub, -, float16x4_t)
+TEST_BINOP (mult, *, float16x4_t)
+TEST_BINOP (div, /, float16x4_t)
+
+TEST_CMP (equal, ==, int16x4_t, float16x4_t)
+TEST_CMP (unequal, !=, int16x4_t, float16x4_t)
+TEST_CMP (lessthan, <, int16x4_t, float16x4_t)
+TEST_CMP (greaterthan, >, int16x4_t, float16x4_t)
+TEST_CMP (lessthanequal, <=, int16x4_t, float16x4_t)
+TEST_CMP (greaterthanqual, >=, int16x4_t, float16x4_t)
+
+/* Vectors of size 8.  */
+
+TEST_UNOP (neg, -, float16x8_t)
+
+TEST_BINOP (add, +, float16x8_t)
+TEST_BINOP (sub, -, float16x8_t)
+TEST_BINOP (mult, *, float16x8_t)
+TEST_BINOP (div, /, float16x8_t)
+
+TEST_CMP (equal, ==, int16x8_t, float16x8_t)
+TEST_CMP (unequal, !=, int16x8_t, float16x8_t)
+TEST_CMP (lessthan, <, int16x8_t, float16x8_t)
+TEST_CMP (greaterthan, >, int16x8_t, float16x8_t)
+TEST_CMP (lessthanequal, <=, int16x8_t, float16x8_t)
+TEST_CMP (greaterthanqual, >=, int16x8_t, float16x8_t)
+
 /* { dg-final { scan-assembler-times {vneg\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vneg\.f16\td[0-9]+, d[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vneg\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
 /* { dg-final { scan-assembler-times {vabs\.f16\ts[0-9]+, s[0-9]+} 2 } }  */
 
-/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
-/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
-/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
-/* { dg-final { scan-assembler-times {vdiv\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
-/* { dg-final { scan-assembler-times {vcmp\.f32\ts[0-9]+, s[0-9]+} 2 } }  */
-/* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, s[0-9]+} 4 } }  */
+/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } }  */
+/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } }  */
+/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } }  */
+/* { dg-final { scan-assembler-times {vdiv\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } }  */
+/* { dg-final { scan-assembler-times {vcmp\.f32\ts[0-9]+, s[0-9]+} 26 } }  */
+/* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, s[0-9]+} 52 } }  */
 
 /* { dg-final { scan-assembler-not {vadd\.f32} } }  */
 /* { dg-final { scan-assembler-not {vsub\.f32} } }  */
-- 
2.1.4

Follow-Ups:
- Re: [PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions.
  - From: Ramana Radhakrishnan

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]