This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Add optabs for common types of permutation

From: Richard Sandiford <richard dot sandiford at linaro dot org>
To: gcc-patches at gcc dot gnu dot org
Cc: richard dot earnshaw at arm dot com, james dot greenhalgh at arm dot com, marcus dot shawcroft at arm dot com
Date: Thu, 09 Nov 2017 13:24:40 +0000
Subject: Add optabs for common types of permutation
Authentication-results: sourceware.org; auth=none
...so that we can use them for variable-length vectors.  For now
constant-length vectors continue to use VEC_PERM_EXPR and the
vec_perm_const optab even for cases that the new optabs could
handle.

The vector optabs are inconsistent about whether there should be
an underscore before the mode part of the name, but the other lo/hi
optabs have one.

Doing this means that we're able to optimise some SLP tests using
non-SLP (for now) on targets with variable-length vectors, so the
patch needs to add a few XFAILs.  Most of these go away with later
patches.

Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
and powerpc64le-linus-gnu.  OK to install?

Richard


2017-11-09  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/md.texi (vec_reverse, vec_interleave_lo, vec_interleave_hi)
	(vec_extract_even, vec_extract_odd): Document new optabs.
	* internal-fn.def (VEC_INTERLEAVE_LO, VEC_INTERLEAVE_HI)
	(VEC_EXTRACT_EVEN, VEC_EXTRACT_ODD, VEC_REVERSE): New internal
	functions.
	* optabs.def (vec_interleave_lo_optab, vec_interleave_hi_optab)
	(vec_extract_even_optab, vec_extract_odd_optab, vec_reverse_optab):
	New optabs.
	* tree-vect-data-refs.c: Include internal-fn.h.
	(vect_grouped_store_supported): Try using IFN_VEC_INTERLEAVE_{LO,HI}.
	(vect_permute_store_chain): Use them here too.
	(vect_grouped_load_supported): Try using IFN_VEC_EXTRACT_{EVEN,ODD}.
	(vect_permute_load_chain): Use them here too.
	* tree-vect-stmts.c (can_reverse_vector_p): New function.
	(get_negative_load_store_type): Use it.
	(reverse_vector): New function.
	(vectorizable_store, vectorizable_load): Use it.
	* config/aarch64/iterators.md (perm_optab): New iterator.
	* config/aarch64/aarch64-sve.md (<perm_optab>_<mode>): New expander.
	(vec_reverse_<mode>): Likewise.

gcc/testsuite/
	* gcc.dg/vect/no-vfa-vect-depend-2.c: Remove XFAIL.
	* gcc.dg/vect/no-vfa-vect-depend-3.c: Likewise.
	* gcc.dg/vect/pr33953.c: XFAIL for vect_variable_length.
	* gcc.dg/vect/pr68445.c: Likewise.
	* gcc.dg/vect/slp-12a.c: Likewise.
	* gcc.dg/vect/slp-13-big-array.c: Likewise.
	* gcc.dg/vect/slp-13.c: Likewise.
	* gcc.dg/vect/slp-14.c: Likewise.
	* gcc.dg/vect/slp-15.c: Likewise.
	* gcc.dg/vect/slp-42.c: Likewise.
	* gcc.dg/vect/slp-multitypes-2.c: Likewise.
	* gcc.dg/vect/slp-multitypes-4.c: Likewise.
	* gcc.dg/vect/slp-multitypes-5.c: Likewise.
	* gcc.dg/vect/slp-reduc-4.c: Likewise.
	* gcc.dg/vect/slp-reduc-7.c: Likewise.
	* gcc.target/aarch64/sve_vec_perm_2.c: New test.
	* gcc.target/aarch64/sve_vec_perm_2_run.c: Likewise.
	* gcc.target/aarch64/sve_vec_perm_3.c: New test.
	* gcc.target/aarch64/sve_vec_perm_3_run.c: Likewise.
	* gcc.target/aarch64/sve_vec_perm_4.c: New test.
	* gcc.target/aarch64/sve_vec_perm_4_run.c: Likewise.

Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	2017-11-09 13:21:01.989917982 +0000
+++ gcc/doc/md.texi	2017-11-09 13:21:02.323463345 +0000
@@ -5017,6 +5017,46 @@ There is no need for a target to supply
 and @samp{vec_perm_const@var{m}} if the former can trivially implement
 the operation with, say, the vector constant loaded into a register.
 
+@cindex @code{vec_reverse_@var{m}} instruction pattern
+@item @samp{vec_reverse_@var{m}}
+Reverse the order of the elements in vector input operand 1 and store
+the result in vector output operand 0.  Both operands have mode @var{m}.
+
+This pattern is provided mainly for targets with variable-length vectors.
+Targets with fixed-length vectors can instead handle any reverse-specific
+optimizations in @samp{vec_perm_const@var{m}}.
+
+@cindex @code{vec_interleave_lo_@var{m}} instruction pattern
+@item @samp{vec_interleave_lo_@var{m}}
+Take the lowest-indexed halves of vector input operands 1 and 2 and
+interleave the elements, so that element @var{x} of operand 1 is followed by
+element @var{x} of operand 2.  Store the result in vector output operand 0.
+All three operands have mode @var{m}.
+
+This pattern is provided mainly for targets with variable-length
+vectors.  Targets with fixed-length vectors can instead handle any
+interleave-specific optimizations in @samp{vec_perm_const@var{m}}.
+
+@cindex @code{vec_interleave_hi_@var{m}} instruction pattern
+@item @samp{vec_interleave_hi_@var{m}}
+Like @samp{vec_interleave_lo_@var{m}}, but operate on the highest-indexed
+halves instead of the lowest-indexed halves.
+
+@cindex @code{vec_extract_even_@var{m}} instruction pattern
+@item @samp{vec_extract_even_@var{m}}
+Concatenate vector input operands 1 and 2, extract the elements with
+even-numbered indices, and store the result in vector output operand 0.
+All three operands have mode @var{m}.
+
+This pattern is provided mainly for targets with variable-length vectors.
+Targets with fixed-length vectors can instead handle any
+extract-specific optimizations in @samp{vec_perm_const@var{m}}.
+
+@cindex @code{vec_extract_odd_@var{m}} instruction pattern
+@item @samp{vec_extract_odd_@var{m}}
+Like @samp{vec_extract_even_@var{m}}, but extract the elements with
+odd-numbered indices.
+
 @cindex @code{push@var{m}1} instruction pattern
 @item @samp{push@var{m}1}
 Output a push instruction.  Operand 0 is value to push.  Used only when
Index: gcc/internal-fn.def
===================================================================
--- gcc/internal-fn.def	2017-11-09 13:21:01.989917982 +0000
+++ gcc/internal-fn.def	2017-11-09 13:21:02.323463345 +0000
@@ -102,6 +102,17 @@ DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_
 DEF_INTERNAL_OPTAB_FN (MASK_STORE_LANES, 0,
 		       vec_mask_store_lanes, mask_store_lanes)
 
+DEF_INTERNAL_OPTAB_FN (VEC_INTERLEAVE_LO, ECF_CONST | ECF_NOTHROW,
+		       vec_interleave_lo, binary)
+DEF_INTERNAL_OPTAB_FN (VEC_INTERLEAVE_HI, ECF_CONST | ECF_NOTHROW,
+		       vec_interleave_hi, binary)
+DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT_EVEN, ECF_CONST | ECF_NOTHROW,
+		       vec_extract_even, binary)
+DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT_ODD, ECF_CONST | ECF_NOTHROW,
+		       vec_extract_odd, binary)
+DEF_INTERNAL_OPTAB_FN (VEC_REVERSE, ECF_CONST | ECF_NOTHROW,
+		       vec_reverse, unary)
+
 DEF_INTERNAL_OPTAB_FN (RSQRT, ECF_CONST, rsqrt, unary)
 
 /* Unary math functions.  */
Index: gcc/optabs.def
===================================================================
--- gcc/optabs.def	2017-11-09 13:21:01.989917982 +0000
+++ gcc/optabs.def	2017-11-09 13:21:02.323463345 +0000
@@ -309,6 +309,11 @@ OPTAB_D (vec_perm_optab, "vec_perm$a")
 OPTAB_D (vec_realign_load_optab, "vec_realign_load_$a")
 OPTAB_D (vec_set_optab, "vec_set$a")
 OPTAB_D (vec_shr_optab, "vec_shr_$a")
+OPTAB_D (vec_interleave_lo_optab, "vec_interleave_lo_$a")
+OPTAB_D (vec_interleave_hi_optab, "vec_interleave_hi_$a")
+OPTAB_D (vec_extract_even_optab, "vec_extract_even_$a")
+OPTAB_D (vec_extract_odd_optab, "vec_extract_odd_$a")
+OPTAB_D (vec_reverse_optab, "vec_reverse_$a")
 OPTAB_D (vec_unpacks_float_hi_optab, "vec_unpacks_float_hi_$a")
 OPTAB_D (vec_unpacks_float_lo_optab, "vec_unpacks_float_lo_$a")
 OPTAB_D (vec_unpacks_hi_optab, "vec_unpacks_hi_$a")
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/tree-vect-data-refs.c	2017-11-09 13:21:02.326167766 +0000
@@ -52,6 +52,7 @@ Software Foundation; either version 3, o
 #include "params.h"
 #include "tree-cfg.h"
 #include "tree-hash-traits.h"
+#include "internal-fn.h"
 
 /* Return true if load- or store-lanes optab OPTAB is implemented for
    COUNT vectors of type VECTYPE.  NAME is the name of OPTAB.  */
@@ -4636,7 +4637,16 @@ vect_grouped_store_supported (tree vecty
       return false;
     }
 
-  /* Check that the permutation is supported.  */
+  /* Powers of 2 use a tree of interleaving operations.  See whether the
+     target supports them directly.  */
+  if (count != 3
+      && direct_internal_fn_supported_p (IFN_VEC_INTERLEAVE_LO, vectype,
+					 OPTIMIZE_FOR_SPEED)
+      && direct_internal_fn_supported_p (IFN_VEC_INTERLEAVE_HI, vectype,
+					 OPTIMIZE_FOR_SPEED))
+    return true;
+
+  /* Otherwise check for support in the form of general permutations.  */
   unsigned int nelt;
   if (VECTOR_MODE_P (mode) && GET_MODE_NUNITS (mode).is_constant (&nelt))
     {
@@ -4881,50 +4891,78 @@ vect_permute_store_chain (vec<tree> dr_c
       /* If length is not equal to 3 then only power of 2 is supported.  */
       gcc_assert (pow2p_hwi (length));
 
-      /* vect_grouped_store_supported ensures that this is constant.  */
-      unsigned int nelt = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
-      auto_vec_perm_indices sel (nelt);
-      sel.quick_grow (nelt);
-      for (i = 0, n = nelt / 2; i < n; i++)
+      if (direct_internal_fn_supported_p (IFN_VEC_INTERLEAVE_LO, vectype,
+					  OPTIMIZE_FOR_SPEED)
+	  && direct_internal_fn_supported_p (IFN_VEC_INTERLEAVE_HI, vectype,
+					     OPTIMIZE_FOR_SPEED))
+	{
+	  /* We could support the case where only one of the optabs is
+	     implemented, but that seems unlikely.  */
+	  perm_mask_low = NULL_TREE;
+	  perm_mask_high = NULL_TREE;
+	}
+      else
 	{
-	  sel[i * 2] = i;
-	  sel[i * 2 + 1] = i + nelt;
+	  /* vect_grouped_store_supported ensures that this is constant.  */
+	  unsigned int nelt = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
+	  auto_vec_perm_indices sel (nelt);
+	  sel.quick_grow (nelt);
+	  for (i = 0, n = nelt / 2; i < n; i++)
+	    {
+	      sel[i * 2] = i;
+	      sel[i * 2 + 1] = i + nelt;
+	    }
+	  perm_mask_low = vect_gen_perm_mask_checked (vectype, sel);
+
+	  for (i = 0; i < nelt; i++)
+	    sel[i] += nelt / 2;
+	  perm_mask_high = vect_gen_perm_mask_checked (vectype, sel);
 	}
-	perm_mask_high = vect_gen_perm_mask_checked (vectype, sel);
 
-	for (i = 0; i < nelt; i++)
-	  sel[i] += nelt / 2;
-	perm_mask_low = vect_gen_perm_mask_checked (vectype, sel);
+      for (i = 0, n = log_length; i < n; i++)
+	{
+	  for (j = 0; j < length / 2; j++)
+	    {
+	      vect1 = dr_chain[j];
+	      vect2 = dr_chain[j + length / 2];
 
-	for (i = 0, n = log_length; i < n; i++)
-	  {
-	    for (j = 0; j < length/2; j++)
-	      {
-		vect1 = dr_chain[j];
-		vect2 = dr_chain[j+length/2];
+	      /* Create interleaving stmt:
+		 high = VEC_PERM_EXPR <vect1, vect2,
+				       {0, nelt, 1, nelt + 1, ...}>  */
+	      low = make_temp_ssa_name (vectype, NULL, "vect_inter_low");
+	      if (perm_mask_low)
+		perm_stmt = gimple_build_assign (low, VEC_PERM_EXPR, vect1,
+						 vect2, perm_mask_low);
+	      else
+		{
+		  perm_stmt = gimple_build_call_internal
+		    (IFN_VEC_INTERLEAVE_LO, 2, vect1, vect2);
+		  gimple_set_lhs (perm_stmt, low);
+		}
+	      vect_finish_stmt_generation (stmt, perm_stmt, gsi);
+	      (*result_chain)[2 * j] = low;
 
-		/* Create interleaving stmt:
-		   high = VEC_PERM_EXPR <vect1, vect2, {0, nelt, 1, nelt+1,
-							...}>  */
-		high = make_temp_ssa_name (vectype, NULL, "vect_inter_high");
+	      /* Create interleaving stmt:
+		 high = VEC_PERM_EXPR <vect1, vect2,
+				      {nelt / 2, nelt * 3 / 2,
+				       nelt / 2 + 1, nelt * 3 / 2 + 1,
+				       ...}>  */
+	      high = make_temp_ssa_name (vectype, NULL, "vect_inter_high");
+	      if (perm_mask_high)
 		perm_stmt = gimple_build_assign (high, VEC_PERM_EXPR, vect1,
 						 vect2, perm_mask_high);
-		vect_finish_stmt_generation (stmt, perm_stmt, gsi);
-		(*result_chain)[2*j] = high;
-
-		/* Create interleaving stmt:
-		   low = VEC_PERM_EXPR <vect1, vect2,
-					{nelt/2, nelt*3/2, nelt/2+1, nelt*3/2+1,
-					 ...}>  */
-		low = make_temp_ssa_name (vectype, NULL, "vect_inter_low");
-		perm_stmt = gimple_build_assign (low, VEC_PERM_EXPR, vect1,
-						 vect2, perm_mask_low);
-		vect_finish_stmt_generation (stmt, perm_stmt, gsi);
-		(*result_chain)[2*j+1] = low;
-	      }
-	    memcpy (dr_chain.address (), result_chain->address (),
-		    length * sizeof (tree));
-	  }
+	      else
+		{
+		  perm_stmt = gimple_build_call_internal
+		    (IFN_VEC_INTERLEAVE_HI, 2, vect1, vect2);
+		  gimple_set_lhs (perm_stmt, high);
+		}
+	      vect_finish_stmt_generation (stmt, perm_stmt, gsi);
+	      (*result_chain)[2 * j + 1] = high;
+	    }
+	  memcpy (dr_chain.address (), result_chain->address (),
+		  length * sizeof (tree));
+	}
     }
 }
 
@@ -5235,7 +5273,16 @@ vect_grouped_load_supported (tree vectyp
       return false;
     }
 
-  /* Check that the permutation is supported.  */
+  /* Powers of 2 use a tree of extract operations.  See whether the
+     target supports them directly.  */
+  if (count != 3
+      && direct_internal_fn_supported_p (IFN_VEC_EXTRACT_EVEN, vectype,
+					 OPTIMIZE_FOR_SPEED)
+      && direct_internal_fn_supported_p (IFN_VEC_EXTRACT_ODD, vectype,
+					 OPTIMIZE_FOR_SPEED))
+    return true;
+
+  /* Otherwise check for support in the form of general permutations.  */
   unsigned int nelt;
   if (VECTOR_MODE_P (mode) && GET_MODE_NUNITS (mode).is_constant (&nelt))
     {
@@ -5464,17 +5511,30 @@ vect_permute_load_chain (vec<tree> dr_ch
       /* If length is not equal to 3 then only power of 2 is supported.  */
       gcc_assert (pow2p_hwi (length));
 
-      /* vect_grouped_load_supported ensures that this is constant.  */
-      unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
-      auto_vec_perm_indices sel (nelt);
-      sel.quick_grow (nelt);
-      for (i = 0; i < nelt; ++i)
-	sel[i] = i * 2;
-      perm_mask_even = vect_gen_perm_mask_checked (vectype, sel);
-
-      for (i = 0; i < nelt; ++i)
-	sel[i] = i * 2 + 1;
-      perm_mask_odd = vect_gen_perm_mask_checked (vectype, sel);
+      if (direct_internal_fn_supported_p (IFN_VEC_EXTRACT_EVEN, vectype,
+					  OPTIMIZE_FOR_SPEED)
+	  && direct_internal_fn_supported_p (IFN_VEC_EXTRACT_ODD, vectype,
+					     OPTIMIZE_FOR_SPEED))
+	{
+	  /* We could support the case where only one of the optabs is
+	     implemented, but that seems unlikely.  */
+	  perm_mask_even = NULL_TREE;
+	  perm_mask_odd = NULL_TREE;
+	}
+      else
+	{
+	  /* vect_grouped_load_supported ensures that this is constant.  */
+	  unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
+	  auto_vec_perm_indices sel (nelt);
+	  sel.quick_grow (nelt);
+	  for (i = 0; i < nelt; ++i)
+	    sel[i] = i * 2;
+	  perm_mask_even = vect_gen_perm_mask_checked (vectype, sel);
+
+	  for (i = 0; i < nelt; ++i)
+	    sel[i] = i * 2 + 1;
+	  perm_mask_odd = vect_gen_perm_mask_checked (vectype, sel);
+	}
 
       for (i = 0; i < log_length; i++)
 	{
@@ -5485,19 +5545,33 @@ vect_permute_load_chain (vec<tree> dr_ch
 
 	      /* data_ref = permute_even (first_data_ref, second_data_ref);  */
 	      data_ref = make_temp_ssa_name (vectype, NULL, "vect_perm_even");
-	      perm_stmt = gimple_build_assign (data_ref, VEC_PERM_EXPR,
-					       first_vect, second_vect,
-					       perm_mask_even);
+	      if (perm_mask_even)
+		perm_stmt = gimple_build_assign (data_ref, VEC_PERM_EXPR,
+						 first_vect, second_vect,
+						 perm_mask_even);
+	      else
+		{
+		  perm_stmt = gimple_build_call_internal
+		    (IFN_VEC_EXTRACT_EVEN, 2, first_vect, second_vect);
+		  gimple_set_lhs (perm_stmt, data_ref);
+		}
 	      vect_finish_stmt_generation (stmt, perm_stmt, gsi);
-	      (*result_chain)[j/2] = data_ref;
+	      (*result_chain)[j / 2] = data_ref;
 
 	      /* data_ref = permute_odd (first_data_ref, second_data_ref);  */
 	      data_ref = make_temp_ssa_name (vectype, NULL, "vect_perm_odd");
-	      perm_stmt = gimple_build_assign (data_ref, VEC_PERM_EXPR,
-					       first_vect, second_vect,
-					       perm_mask_odd);
+	      if (perm_mask_odd)
+		perm_stmt = gimple_build_assign (data_ref, VEC_PERM_EXPR,
+						 first_vect, second_vect,
+						 perm_mask_odd);
+	      else
+		{
+		  perm_stmt = gimple_build_call_internal
+		    (IFN_VEC_EXTRACT_ODD, 2, first_vect, second_vect);
+		  gimple_set_lhs (perm_stmt, data_ref);
+		}
 	      vect_finish_stmt_generation (stmt, perm_stmt, gsi);
-	      (*result_chain)[j/2+length/2] = data_ref;
+	      (*result_chain)[j / 2 + length / 2] = data_ref;
 	    }
 	  memcpy (dr_chain.address (), result_chain->address (),
 		  length * sizeof (tree));
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/tree-vect-stmts.c	2017-11-09 13:21:02.327069240 +0000
@@ -1796,6 +1796,46 @@ perm_mask_for_reverse (tree vectype)
   return vect_gen_perm_mask_checked (vectype, sel);
 }
 
+/* Return true if the target can reverse the elements in a vector of
+   type VECTOR_TYPE.  */
+
+static bool
+can_reverse_vector_p (tree vector_type)
+{
+  return (direct_internal_fn_supported_p (IFN_VEC_REVERSE, vector_type,
+					  OPTIMIZE_FOR_SPEED)
+	  || perm_mask_for_reverse (vector_type));
+}
+
+/* Generate a statement to reverse the elements in vector INPUT and
+   return the SSA name that holds the result.  GSI is a statement iterator
+   pointing to STMT, which is the scalar statement we're vectorizing.
+   VEC_DEST is the destination variable with which new SSA names
+   should be associated.  */
+
+static tree
+reverse_vector (tree vec_dest, tree input, gimple *stmt,
+		gimple_stmt_iterator *gsi)
+{
+  tree new_temp = make_ssa_name (vec_dest);
+  tree vector_type = TREE_TYPE (input);
+  gimple *perm_stmt;
+  if (direct_internal_fn_supported_p (IFN_VEC_REVERSE, vector_type,
+				      OPTIMIZE_FOR_SPEED))
+    {
+      perm_stmt = gimple_build_call_internal (IFN_VEC_REVERSE, 1, input);
+      gimple_set_lhs (perm_stmt, new_temp);
+    }
+  else
+    {
+      tree perm_mask = perm_mask_for_reverse (vector_type);
+      perm_stmt = gimple_build_assign (new_temp, VEC_PERM_EXPR,
+				       input, input, perm_mask);
+    }
+  vect_finish_stmt_generation (stmt, perm_stmt, gsi);
+  return new_temp;
+}
+
 /* A subroutine of get_load_store_type, with a subset of the same
    arguments.  Handle the case where STMT is part of a grouped load
    or store.
@@ -1999,7 +2039,7 @@ get_negative_load_store_type (gimple *st
       return VMAT_CONTIGUOUS_DOWN;
     }
 
-  if (!perm_mask_for_reverse (vectype))
+  if (!can_reverse_vector_p (vectype))
     {
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -6760,20 +6800,10 @@ vectorizable_store (gimple *stmt, gimple
 
 	      if (memory_access_type == VMAT_CONTIGUOUS_REVERSE)
 		{
-		  tree perm_mask = perm_mask_for_reverse (vectype);
 		  tree perm_dest 
 		    = vect_create_destination_var (gimple_assign_rhs1 (stmt),
 						   vectype);
-		  tree new_temp = make_ssa_name (perm_dest);
-
-		  /* Generate the permute statement.  */
-		  gimple *perm_stmt 
-		    = gimple_build_assign (new_temp, VEC_PERM_EXPR, vec_oprnd,
-					   vec_oprnd, perm_mask);
-		  vect_finish_stmt_generation (stmt, perm_stmt, gsi);
-
-		  perm_stmt = SSA_NAME_DEF_STMT (new_temp);
-		  vec_oprnd = new_temp;
+		  vec_oprnd = reverse_vector (perm_dest, vec_oprnd, stmt, gsi);
 		}
 
 	      /* Arguments are ready.  Create the new vector stmt.  */
@@ -7998,9 +8028,7 @@ vectorizable_load (gimple *stmt, gimple_
 
 	      if (memory_access_type == VMAT_CONTIGUOUS_REVERSE)
 		{
-		  tree perm_mask = perm_mask_for_reverse (vectype);
-		  new_temp = permute_vec_elements (new_temp, new_temp,
-						   perm_mask, stmt, gsi);
+		  new_temp = reverse_vector (vec_dest, new_temp, stmt, gsi);
 		  new_stmt = SSA_NAME_DEF_STMT (new_temp);
 		}
 
Index: gcc/config/aarch64/iterators.md
===================================================================
--- gcc/config/aarch64/iterators.md	2017-11-09 13:21:01.989917982 +0000
+++ gcc/config/aarch64/iterators.md	2017-11-09 13:21:02.322561871 +0000
@@ -1556,6 +1556,11 @@ (define_int_attr pauth_hint_num_a [(UNSP
 				    (UNSPEC_PACI1716 "8")
 				    (UNSPEC_AUTI1716 "12")])
 
+(define_int_attr perm_optab [(UNSPEC_ZIP1 "vec_interleave_lo")
+			     (UNSPEC_ZIP2 "vec_interleave_hi")
+			     (UNSPEC_UZP1 "vec_extract_even")
+			     (UNSPEC_UZP2 "vec_extract_odd")])
+
 (define_int_attr perm_insn [(UNSPEC_ZIP1 "zip") (UNSPEC_ZIP2 "zip")
 			    (UNSPEC_TRN1 "trn") (UNSPEC_TRN2 "trn")
 			    (UNSPEC_UZP1 "uzp") (UNSPEC_UZP2 "uzp")])
Index: gcc/config/aarch64/aarch64-sve.md
===================================================================
--- gcc/config/aarch64/aarch64-sve.md	2017-11-09 13:21:01.989917982 +0000
+++ gcc/config/aarch64/aarch64-sve.md	2017-11-09 13:21:02.320758923 +0000
@@ -630,6 +630,19 @@ (define_expand "vec_perm<mode>"
   }
 )
 
+(define_expand "<perm_optab>_<mode>"
+  [(set (match_operand:SVE_ALL 0 "register_operand")
+	(unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand")
+			 (match_operand:SVE_ALL 2 "register_operand")]
+			OPTAB_PERMUTE))]
+  "TARGET_SVE && !GET_MODE_NUNITS (<MODE>mode).is_constant ()")
+
+(define_expand "vec_reverse_<mode>"
+  [(set (match_operand:SVE_ALL 0 "register_operand")
+	(unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand")]
+			UNSPEC_REV))]
+  "TARGET_SVE && !GET_MODE_NUNITS (<MODE>mode).is_constant ()")
+
 (define_insn "*aarch64_sve_tbl<mode>"
   [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
 	(unspec:SVE_ALL
Index: gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c	2017-11-09 13:21:02.323463345 +0000
@@ -51,7 +51,4 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
-/* Requires reverse for variable-length SVE, which is implemented for
-   by a later patch.  Until then we report it twice, once for SVE and
-   once for 128-bit Advanced SIMD.  */
-/* { dg-final { scan-tree-dump-times "dependence distance negative" 1 "vect" { xfail { aarch64_sve && vect_variable_length } } } } */
+/* { dg-final { scan-tree-dump-times "dependence distance negative" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c	2017-11-09 13:21:02.323463345 +0000
@@ -183,7 +183,4 @@ int main ()
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" {xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
-/* f4 requires reverse for SVE, which is implemented by a later patch.
-   Until then we report it twice, once for SVE and once for 128-bit
-   Advanced SIMD.  */
-/* { dg-final { scan-tree-dump-times "dependence distance negative" 4 "vect" { xfail { aarch64_sve && vect_variable_length } } } } */
+/* { dg-final { scan-tree-dump-times "dependence distance negative" 4 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/pr33953.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr33953.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/pr33953.c	2017-11-09 13:21:02.323463345 +0000
@@ -29,6 +29,6 @@ void blockmove_NtoN_blend_noremap32 (con
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { { vect_no_align && { ! vect_hw_misalign } } || vect_variable_length } } } } */
 
 
Index: gcc/testsuite/gcc.dg/vect/pr68445.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr68445.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/pr68445.c	2017-11-09 13:21:02.323463345 +0000
@@ -16,4 +16,4 @@ void IMB_double_fast_x (int *destf, int
     }
 }
 
-/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { xfail vect_variable_length } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-12a.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-12a.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-12a.c	2017-11-09 13:21:02.323463345 +0000
@@ -75,5 +75,5 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_strided8 && vect_int_mult } xfail vect_variable_length } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-13-big-array.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-13-big-array.c	2017-11-09 13:21:02.324364818 +0000
@@ -134,4 +134,4 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && { ! vect_pack_trunc } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { ! vect_pack_trunc } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && vect_pack_trunc } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc xfail vect_variable_length } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-13.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-13.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-13.c	2017-11-09 13:21:02.324364818 +0000
@@ -128,4 +128,4 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && { ! vect_pack_trunc } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { ! vect_pack_trunc } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && vect_pack_trunc } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc xfail vect_variable_length } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-14.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-14.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-14.c	2017-11-09 13:21:02.324364818 +0000
@@ -111,5 +111,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_int_mult } } }  */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_int_mult } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_int_mult xfail vect_variable_length } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-15.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-15.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-15.c	2017-11-09 13:21:02.324364818 +0000
@@ -112,6 +112,6 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {target vect_int_mult } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target  { ! { vect_int_mult } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target vect_int_mult } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_int_mult xfail vect_variable_length } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" {target { ! { vect_int_mult } } } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-42.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-42.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-42.c	2017-11-09 13:21:02.324364818 +0000
@@ -15,5 +15,5 @@ void foo (int n)
     }
 }
 
-/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { xfail vect_variable_length } } } */
 /* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c	2017-11-09 13:21:02.324364818 +0000
@@ -77,5 +77,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail vect_variable_length } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c	2017-11-09 13:21:02.324364818 +0000
@@ -52,5 +52,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_unpack } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target vect_unpack } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_unpack xfail vect_variable_length } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c	2017-11-09 13:21:02.324364818 +0000
@@ -52,5 +52,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_pack_trunc } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_pack_trunc } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_pack_trunc xfail vect_variable_length } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-reduc-4.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-reduc-4.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-reduc-4.c	2017-11-09 13:21:02.325266292 +0000
@@ -57,5 +57,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_int_min_max } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail vect_no_int_min_max } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { vect_no_int_min_max || vect_variable_length } } } } */
 
Index: gcc/testsuite/gcc.dg/vect/slp-reduc-7.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-reduc-7.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-reduc-7.c	2017-11-09 13:21:02.325266292 +0000
@@ -55,5 +55,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_int_add } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail vect_no_int_add } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { vect_no_int_add || vect_variable_length } } } } */
 
Index: gcc/testsuite/gcc.target/aarch64/sve_vec_perm_2.c
===================================================================
--- /dev/null	2017-11-09 12:47:20.377612760 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_vec_perm_2.c	2017-11-09 13:21:02.325266292 +0000
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
+
+#include <stdint.h>
+
+#define VEC_PERM(TYPE)						\
+TYPE __attribute__ ((noinline, noclone))			\
+vec_reverse_##TYPE (TYPE *restrict a, TYPE *restrict b, int n)	\
+{								\
+  for (int i = 0; i < n; ++i)					\
+    a[i] = b[n - i - 1];					\
+}
+
+#define TEST_ALL(T)				\
+  T (int8_t)					\
+  T (uint8_t)					\
+  T (int16_t)					\
+  T (uint16_t)					\
+  T (int32_t)					\
+  T (uint32_t)					\
+  T (int64_t)					\
+  T (uint64_t)					\
+  T (float)					\
+  T (double)
+
+TEST_ALL (VEC_PERM)
+
+/* { dg-final { scan-assembler-times {\trev\tz[0-9]+\.b, z[0-9]+\.b\n} 2 } } */
+/* { dg-final { scan-assembler-times {\trev\tz[0-9]+\.h, z[0-9]+\.h\n} 2 } } */
+/* { dg-final { scan-assembler-times {\trev\tz[0-9]+\.s, z[0-9]+\.s\n} 3 } } */
+/* { dg-final { scan-assembler-times {\trev\tz[0-9]+\.d, z[0-9]+\.d\n} 3 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_vec_perm_2_run.c
===================================================================
--- /dev/null	2017-11-09 12:47:20.377612760 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_vec_perm_2_run.c	2017-11-09 13:21:02.325266292 +0000
@@ -0,0 +1,29 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
+
+#include "sve_vec_perm_2.c"
+
+#define N 153
+
+#define HARNESS(TYPE)						\
+  {								\
+    TYPE a[N], b[N];						\
+    for (unsigned int i = 0; i < N; ++i)			\
+      {								\
+	b[i] = i * 2 + i % 5;					\
+	asm volatile ("" ::: "memory");				\
+      }								\
+    vec_reverse_##TYPE (a, b, N);				\
+    for (unsigned int i = 0; i < N; ++i)			\
+      {								\
+	TYPE expected = (N - i - 1) * 2 + (N - i - 1) % 5;	\
+	if (a[i] != expected)					\
+	  __builtin_abort ();					\
+      }								\
+  }
+
+int __attribute__ ((optimize (1)))
+main (void)
+{
+  TEST_ALL (HARNESS)
+}
Index: gcc/testsuite/gcc.target/aarch64/sve_vec_perm_3.c
===================================================================
--- /dev/null	2017-11-09 12:47:20.377612760 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_vec_perm_3.c	2017-11-09 13:21:02.325266292 +0000
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=scalable" } */
+
+#include <stdint.h>
+
+#define VEC_PERM(TYPE)					\
+TYPE __attribute__ ((noinline, noclone))		\
+vec_zip_##TYPE (TYPE *restrict a, TYPE *restrict b,	\
+		TYPE *restrict c, long n)		\
+{							\
+  for (long i = 0; i < n; ++i)				\
+    {							\
+      a[i * 8] = c[i * 4];				\
+      a[i * 8 + 1] = b[i * 4];				\
+      a[i * 8 + 2] = c[i * 4 + 1];			\
+      a[i * 8 + 3] = b[i * 4 + 1];			\
+      a[i * 8 + 4] = c[i * 4 + 2];			\
+      a[i * 8 + 5] = b[i * 4 + 2];			\
+      a[i * 8 + 6] = c[i * 4 + 3];			\
+      a[i * 8 + 7] = b[i * 4 + 3];			\
+    }							\
+}
+
+#define TEST_ALL(T)				\
+  T (int8_t)					\
+  T (uint8_t)					\
+  T (int16_t)					\
+  T (uint16_t)					\
+  T (int32_t)					\
+  T (uint32_t)					\
+  T (int64_t)					\
+  T (uint64_t)					\
+  T (float)					\
+  T (double)
+
+TEST_ALL (VEC_PERM)
+
+/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 24 } } */
+/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 24 } } */
+/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 24 } } */
+/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 24 } } */
+/* Currently we can't use SLP for groups bigger than 128 bits.  */
+/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 36 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 36 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 36 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 36 { xfail *-*-* } } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_vec_perm_3_run.c
===================================================================
--- /dev/null	2017-11-09 12:47:20.377612760 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_vec_perm_3_run.c	2017-11-09 13:21:02.325266292 +0000
@@ -0,0 +1,31 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
+
+#include "sve_vec_perm_3.c"
+
+#define N (43 * 8)
+
+#define HARNESS(TYPE)						\
+  {								\
+    TYPE a[N], b[N], c[N];					\
+    for (unsigned int i = 0; i < N; ++i)			\
+      {								\
+	b[i] = i * 2 + i % 5;					\
+	c[i] = i * 3;						\
+	asm volatile ("" ::: "memory");				\
+      }								\
+    vec_zip_##TYPE (a, b, c, N / 8);				\
+    for (unsigned int i = 0; i < N / 2; ++i)			\
+      {								\
+	TYPE expected1 = i * 3;					\
+	TYPE expected2 = i * 2 + i % 5;				\
+	if (a[i * 2] != expected1 || a[i * 2 + 1] != expected2)	\
+	  __builtin_abort ();					\
+      }								\
+  }
+
+int __attribute__ ((optimize (1)))
+main (void)
+{
+  TEST_ALL (HARNESS)
+}
Index: gcc/testsuite/gcc.target/aarch64/sve_vec_perm_4.c
===================================================================
--- /dev/null	2017-11-09 12:47:20.377612760 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_vec_perm_4.c	2017-11-09 13:21:02.325266292 +0000
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=scalable" } */
+
+#include <stdint.h>
+
+#define VEC_PERM(TYPE)					\
+TYPE __attribute__ ((noinline, noclone))		\
+vec_uzp_##TYPE (TYPE *restrict a, TYPE *restrict b,	\
+		 TYPE *restrict c, long n)		\
+{							\
+  for (long i = 0; i < n; ++i)				\
+    {							\
+      a[i * 4] = c[i * 8];				\
+      b[i * 4] = c[i * 8 + 1];				\
+      a[i * 4 + 1] = c[i * 8 + 2];			\
+      b[i * 4 + 1] = c[i * 8 + 3];			\
+      a[i * 4 + 2] = c[i * 8 + 4];			\
+      b[i * 4 + 2] = c[i * 8 + 5];			\
+      a[i * 4 + 3] = c[i * 8 + 6];			\
+      b[i * 4 + 3] = c[i * 8 + 7];			\
+    }							\
+}
+
+#define TEST_ALL(T)				\
+  T (int8_t)					\
+  T (uint8_t)					\
+  T (int16_t)					\
+  T (uint16_t)					\
+  T (int32_t)					\
+  T (uint32_t)					\
+  T (int64_t)					\
+  T (uint64_t)					\
+  T (float)					\
+  T (double)
+
+TEST_ALL (VEC_PERM)
+
+/* We could use a single uzp1 and uzp2 per function by implementing
+   SLP load permutation for variable width.  XFAIL until then.  */
+/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 2 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tuzp2\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 2 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 2 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tuzp2\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 2 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 3 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tuzp2\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 3 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tuzp2\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 { xfail *-*-* } } } */
+/* Delete these if the tests above start passing instead.  */
+/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 24 } } */
+/* { dg-final { scan-assembler-times {\tuzp2\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 24 } } */
+/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 24 } } */
+/* { dg-final { scan-assembler-times {\tuzp2\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 24 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_vec_perm_4_run.c
===================================================================
--- /dev/null	2017-11-09 12:47:20.377612760 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_vec_perm_4_run.c	2017-11-09 13:21:02.325266292 +0000
@@ -0,0 +1,29 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
+
+#include "sve_vec_perm_4.c"
+
+#define N (43 * 8)
+
+#define HARNESS(TYPE)					\
+  {							\
+    TYPE a[N], b[N], c[N];				\
+    for (unsigned int i = 0; i < N; ++i)		\
+      {							\
+	c[i] = i * 2 + i % 5;				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+    vec_uzp_##TYPE (a, b, c, N / 8);			\
+    for (unsigned int i = 0; i < N; ++i)		\
+      {							\
+	TYPE expected = i * 2 + i % 5;			\
+	if ((i & 1 ? b[i / 2] : a[i / 2]) != expected)	\
+	  __builtin_abort ();				\
+      }							\
+  }
+
+int __attribute__ ((optimize (1)))
+main (void)
+{
+  TEST_ALL (HARNESS)
+}
Follow-Ups:
- Re: Add optabs for common types of permutation
  - From: Jeff Law
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]