This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH, rs6000] Add support for vector element-reversal built-ins

From: Bill Schmidt <wschmidt at linux dot vnet dot ibm dot com>
To: Segher Boessenkool <segher at kernel dot crashing dot org>
Cc: gcc-patches at gcc dot gnu dot org, dje dot gcc at gmail dot com
Date: Wed, 27 Apr 2016 13:30:54 -0500
Subject: Re: [PATCH, rs6000] Add support for vector element-reversal built-ins
Authentication-results: sourceware.org; auth=none
References: <1461524807 dot 29088 dot 12 dot camel at oc8801110288 dot ibm dot com> <20160424205221 dot GA2935 at gate dot crashing dot org>
Hi,

While looking into documenting the new built-ins, I realized that these
instructions provide correct support for the vec_xl and vec_xst
built-ins required by the vector API.  I've therefore reworked the patch
to provide those as overloaded built-ins, rather than the separate
per-mode built-ins in the original patch, and then documented those.
(Note that vec_xl and vec_xst were previously incorrectly aliased to
vec_vsx_ld and vec_vsx_st, which does not provide the proper
element-reversal semantics.)

This in turn required support for the RS6000_BTM_P9_VECTOR and
MASK_P9_VECTOR macros.  This currently exists on the ibm/pre-gcc7 branch
but not upstream, so I've copied the necessary pieces of that into this
patch to avoid future conflicts.  Other than changing the test cases,
the rest of the patch is pretty much as before.  As a reminder:

ISA 3.0 adds the lxvh8x, lxvb16x, stxvh8x, and stxvb16x instructions,
which perform vector loads in big-endian order, regardless of the target
endianness.  These join the similar lxvd2x, lxvw4x, stxvd2x, and stxvw4x
instructions introduced in 2.6.  These existing instructions have been
used in several ways, but we don't yet have built-ins to allow them to
be specifically generated for little-endian.  This patch corrects that,
and adds built-ins for the new ISA 3.0 instructions as well.

Note that the behavior of lxvd2x, lxvw4x, lxvh8x, and lxvb16x are
indistinguishable from one another in big-endian mode, and similarly for
the stores.  So we can treat these as simple moves that will generate
any applicable load or store (such as lxvx and stxvx for ISA 3.0).  For
little-endian, however, we require separate patterns for each of these
loads and stores to ensure that we get the correct element-reversal
semantics for each of them, depending on the vector mode.

I've added four new tests to demonstrate correct behavior of the new
built-in functions.  These include variants for big- and little-endian,
and variants for -mcpu=power8 and -mcpu=power9.

Bootstrapped and tested on powerpc64-unknown-linux-gnu and
powerpc64le-unknown-linux-gnu with no regressions.  Is this revised
version ok for trunk?

Thanks!
Bill


[gcc]

2016-04-27  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/altivec.h: Change definitions of vec_xl and
	vec_xst.
	* config/rs6000/rs6000-builtin.def (LD_ELEMREV_V2DF): New.
	(LD_ELEMREV_V2DI): New.
	(LD_ELEMREV_V4SF): New.
	(LD_ELEMREV_V4SI): New.
	(LD_ELEMREV_V8HI): New.
	(LD_ELEMREV_V16QI): New.
	(ST_ELEMREV_V2DF): New.
	(ST_ELEMREV_V2DI): New.
	(ST_ELEMREV_V4SF): New.
	(ST_ELEMREV_V4SI): New.
	(ST_ELEMREV_V8HI): New.
	(ST_ELEMREV_V16QI): New.
	(XL): New.
	(XST): New.
	* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
	descriptions for VSX_BUILTIN_VEC_XL and VSX_BUILTIN_VEC_XST.
	* config/rs6000/rs6000.c (rs6000_builtin_mask_calculate): Map from
	TARGET_P9_VECTOR to RS6000_BTM_P9_VECTOR.
	(altivec_expand_builtin): Add handling for
	VSX_BUILTIN_ST_ELEMREV_<MODE> and VSX_BUILTIN_LD_ELEMREV_<MODE>.
	(rs6000_invalid_builtin): Add error-checking for
	RS6000_BTM_P9_VECTOR.
	(altivec_init_builtins): Define builtins used to implement vec_xl
	and vec_xst.
	(rs6000_builtin_mask_names): Define power9-vector.
	* config/rs6000/rs6000.h (MASK_P9_VECTOR): Define.
	(RS6000_BTM_P9_VECTOR): Define.
	(RS6000_BTM_COMMON): Include RS6000_BTM_P9_VECTOR.
	* config/rs6000/vsx.md (vsx_ld_elemrev_v2di): New define_insn.
	(vsx_ld_elemrev_v2df): Likewise.
	(vsx_ld_elemrev_v4sf): Likewise.
	(vsx_ld_elemrev_v4si): Likewise.
	(vsx_ld_elemrev_v8hi): Likewise.
	(vsx_ld_elemrev_v16qi): Likewise.
	(vsx_st_elemrev_v2df): Likewise.
	(vsx_st_elemrev_v2di): Likewise.
	(vsx_st_elemrev_v4sf): Likewise.
	(vsx_st_elemrev_v4si): Likewise.
	(vsx_st_elemrev_v8hi): Likewise.
	(vsx_st_elemrev_v16qi): Likewise.
	* doc/extend.texi: Add prototypes for vec_xl and vec_xst.  Correct
	grammar.

[gcc/testsuite]

2016-04-27  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* gcc.target/powerpc/vsx-elemrev-1.c: New.
	* gcc.target/powerpc/vsx-elemrev-2.c: New.
	* gcc.target/powerpc/vsx-elemrev-3.c: New.
	* gcc.target/powerpc/vsx-elemrev-4.c: New.


diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index ea6af8d..5fc1cce 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -327,8 +327,8 @@
 #define vec_sqrt __builtin_vec_sqrt
 #define vec_vsx_ld __builtin_vec_vsx_ld
 #define vec_vsx_st __builtin_vec_vsx_st
-#define vec_xl __builtin_vec_vsx_ld
-#define vec_xst __builtin_vec_vsx_st
+#define vec_xl __builtin_vec_xl
+#define vec_xst __builtin_vec_xst
 
 /* Note, xxsldi and xxpermdi were added as __builtin_vsx_<xxx> functions
    instead of __builtin_vec_<xxx>  */
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index 891d240..6f33278 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1398,6 +1398,18 @@ BU_VSX_X (STXVW4X_V4SF,	      "stxvw4x_v4sf",	MEM)
 BU_VSX_X (STXVW4X_V4SI,	      "stxvw4x_v4si",	MEM)
 BU_VSX_X (STXVW4X_V8HI,	      "stxvw4x_v8hi",	MEM)
 BU_VSX_X (STXVW4X_V16QI,      "stxvw4x_v16qi",	MEM)
+BU_VSX_X (LD_ELEMREV_V2DF,    "ld_elemrev_v2df",  MEM)
+BU_VSX_X (LD_ELEMREV_V2DI,    "ld_elemrev_v2di",  MEM)
+BU_VSX_X (LD_ELEMREV_V4SF,    "ld_elemrev_v4sf",  MEM)
+BU_VSX_X (LD_ELEMREV_V4SI,    "ld_elemrev_v4si",  MEM)
+BU_VSX_X (LD_ELEMREV_V8HI,    "ld_elemrev_v8hi",  MEM)
+BU_VSX_X (LD_ELEMREV_V16QI,   "ld_elemrev_v16qi", MEM)
+BU_VSX_X (ST_ELEMREV_V2DF,    "st_elemrev_v2df",  MEM)
+BU_VSX_X (ST_ELEMREV_V2DI,    "st_elemrev_v2di",  MEM)
+BU_VSX_X (ST_ELEMREV_V4SF,    "st_elemrev_v4sf",  MEM)
+BU_VSX_X (ST_ELEMREV_V4SI,    "st_elemrev_v4si",  MEM)
+BU_VSX_X (ST_ELEMREV_V8HI,    "st_elemrev_v8hi",  MEM)
+BU_VSX_X (ST_ELEMREV_V16QI,   "st_elemrev_v16qi", MEM)
 BU_VSX_X (XSABSDP,	      "xsabsdp",	CONST)
 BU_VSX_X (XSADDDP,	      "xsadddp",	FP)
 BU_VSX_X (XSCMPODP,	      "xscmpodp",	FP)
@@ -1455,6 +1467,8 @@ BU_VSX_OVERLOAD_1 (DOUBLE,   "double")
 /* VSX builtins that are handled as special cases.  */
 BU_VSX_OVERLOAD_X (LD,	     "ld")
 BU_VSX_OVERLOAD_X (ST,	     "st")
+BU_VSX_OVERLOAD_X (XL,	     "xl")
+BU_VSX_OVERLOAD_X (XST,	     "xst")
 
 /* 1 argument VSX instructions added in ISA 2.07.  */
 BU_P8V_VSX_1 (XSCVSPDPN,      "xscvspdpn",	CONST,	vsx_xscvspdpn)
diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index ceb80b2..0985bb7 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -2726,6 +2726,49 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_SUMS, ALTIVEC_BUILTIN_VSUMSWS,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V2DF,
+    RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V2DF,
+    RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_V2DI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_long_long, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_V4SI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V4SI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTSI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_V8HI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_INTHI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V8HI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTHI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_V16QI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_V16QI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI, 0 },
   { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
     RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
@@ -3475,6 +3518,55 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_void, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V16QI },
   { ALTIVEC_BUILTIN_VEC_STVRXL, ALTIVEC_BUILTIN_STVRXL,
     RS6000_BTI_void, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI },
+  { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V2DF,
+    RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF },
+  { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V2DF,
+    RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double },
+  { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V2DI,
+    RS6000_BTI_void, RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI },
+  { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V2DI,
+    RS6000_BTI_void, RS6000_BTI_V2DI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_long_long },
+  { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V2DI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_V2DI },
+  { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V2DI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_long_long },
+  { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V4SF,
+    RS6000_BTI_void, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF },
+  { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V4SF,
+    RS6000_BTI_void, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float },
+  { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V4SI,
+    RS6000_BTI_void, RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_V4SI },
+  { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V4SI,
+    RS6000_BTI_void, RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI },
+  { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V4SI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_V4SI },
+  { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V4SI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_UINTSI },
+  { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V8HI,
+    RS6000_BTI_void, RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_V8HI },
+  { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V8HI,
+    RS6000_BTI_void, RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_INTHI },
+  { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V8HI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_V8HI },
+  { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V8HI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_UINTHI },
+  { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V16QI,
+    RS6000_BTI_void, RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_V16QI },
+  { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V16QI,
+    RS6000_BTI_void, RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI },
+  { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V16QI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_unsigned_V16QI },
+  { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V16QI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI,
+    ~RS6000_BTI_UINTQI },
   { VSX_BUILTIN_VEC_XXSLDWI, VSX_BUILTIN_XXSLDWI_16QI,
     RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_NOT_OPAQUE },
   { VSX_BUILTIN_VEC_XXSLDWI, VSX_BUILTIN_XXSLDWI_16QI,
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 1d0076c..776fe1b 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -3621,6 +3621,7 @@ rs6000_builtin_mask_calculate (void)
 	  | ((TARGET_POPCNTD)		    ? RS6000_BTM_POPCNTD   : 0)
 	  | ((rs6000_cpu == PROCESSOR_CELL) ? RS6000_BTM_CELL      : 0)
 	  | ((TARGET_P8_VECTOR)		    ? RS6000_BTM_P8_VECTOR : 0)
+	  | ((TARGET_P9_VECTOR)		    ? RS6000_BTM_P9_VECTOR : 0)
 	  | ((TARGET_CRYPTO)		    ? RS6000_BTM_CRYPTO	   : 0)
 	  | ((TARGET_HTM)		    ? RS6000_BTM_HTM	   : 0)
 	  | ((TARGET_DFP)		    ? RS6000_BTM_DFP	   : 0)
@@ -14129,6 +14130,47 @@ altivec_expand_builtin (tree exp, rtx target, bool *expandedp)
     case VSX_BUILTIN_STXVW4X_V16QI:
       return altivec_expand_stv_builtin (CODE_FOR_vsx_store_v16qi, exp);
 
+    /* For the following on big endian, it's ok to use any appropriate
+       unaligned-supporting store, so use a generic expander.  For
+       little-endian, the exact element-reversing instruction must
+       be used.  */
+    case VSX_BUILTIN_ST_ELEMREV_V2DF:
+      {
+	enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v2df
+			       : CODE_FOR_vsx_st_elemrev_v2df);
+	return altivec_expand_stv_builtin (code, exp);
+      }
+    case VSX_BUILTIN_ST_ELEMREV_V2DI:
+      {
+	enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v2di
+			       : CODE_FOR_vsx_st_elemrev_v2di);
+	return altivec_expand_stv_builtin (code, exp);
+      }
+    case VSX_BUILTIN_ST_ELEMREV_V4SF:
+      {
+	enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v4sf
+			       : CODE_FOR_vsx_st_elemrev_v4sf);
+	return altivec_expand_stv_builtin (code, exp);
+      }
+    case VSX_BUILTIN_ST_ELEMREV_V4SI:
+      {
+	enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v4si
+			       : CODE_FOR_vsx_st_elemrev_v4si);
+	return altivec_expand_stv_builtin (code, exp);
+      }
+    case VSX_BUILTIN_ST_ELEMREV_V8HI:
+      {
+	enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v8hi
+			       : CODE_FOR_vsx_st_elemrev_v8hi);
+	return altivec_expand_stv_builtin (code, exp);
+      }
+    case VSX_BUILTIN_ST_ELEMREV_V16QI:
+      {
+	enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v16qi
+			       : CODE_FOR_vsx_st_elemrev_v16qi);
+	return altivec_expand_stv_builtin (code, exp);
+      }
+
     case ALTIVEC_BUILTIN_MFVSCR:
       icode = CODE_FOR_altivec_mfvscr;
       tmode = insn_data[icode].operand[0].mode;
@@ -14323,6 +14365,46 @@ altivec_expand_builtin (tree exp, rtx target, bool *expandedp)
     case VSX_BUILTIN_LXVW4X_V16QI:
       return altivec_expand_lv_builtin (CODE_FOR_vsx_load_v16qi,
 					exp, target, false);
+    /* For the following on big endian, it's ok to use any appropriate
+       unaligned-supporting load, so use a generic expander.  For
+       little-endian, the exact element-reversing instruction must
+       be used.  */
+    case VSX_BUILTIN_LD_ELEMREV_V2DF:
+      {
+	enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v2df
+			       : CODE_FOR_vsx_ld_elemrev_v2df);
+	return altivec_expand_lv_builtin (code, exp, target, false);
+      }
+    case VSX_BUILTIN_LD_ELEMREV_V2DI:
+      {
+	enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v2di
+			       : CODE_FOR_vsx_ld_elemrev_v2di);
+	return altivec_expand_lv_builtin (code, exp, target, false);
+      }
+    case VSX_BUILTIN_LD_ELEMREV_V4SF:
+      {
+	enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v4sf
+			       : CODE_FOR_vsx_ld_elemrev_v4sf);
+	return altivec_expand_lv_builtin (code, exp, target, false);
+      }
+    case VSX_BUILTIN_LD_ELEMREV_V4SI:
+      {
+	enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v4si
+			       : CODE_FOR_vsx_ld_elemrev_v4si);
+	return altivec_expand_lv_builtin (code, exp, target, false);
+      }
+    case VSX_BUILTIN_LD_ELEMREV_V8HI:
+      {
+	enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v8hi
+			       : CODE_FOR_vsx_ld_elemrev_v8hi);
+	return altivec_expand_lv_builtin (code, exp, target, false);
+      }
+    case VSX_BUILTIN_LD_ELEMREV_V16QI:
+      {
+	enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v16qi
+			       : CODE_FOR_vsx_ld_elemrev_v16qi);
+	return altivec_expand_lv_builtin (code, exp, target, false);
+      }
       break;
     default:
       break;
@@ -14792,6 +14874,8 @@ rs6000_invalid_builtin (enum rs6000_builtins fncode)
     error ("Builtin function %s requires the -mhard-dfp option", name);
   else if ((fnmask & RS6000_BTM_P8_VECTOR) != 0)
     error ("Builtin function %s requires the -mpower8-vector option", name);
+  else if ((fnmask & RS6000_BTM_P9_VECTOR) != 0)
+    error ("Builtin function %s requires the -mpower9-vector option", name);
   else if ((fnmask & (RS6000_BTM_HARD_FLOAT | RS6000_BTM_LDBL128))
 	   == (RS6000_BTM_HARD_FLOAT | RS6000_BTM_LDBL128))
     error ("Builtin function %s requires the -mhard-float and"
@@ -15816,10 +15900,44 @@ altivec_init_builtins (void)
 	       VSX_BUILTIN_STXVW4X_V8HI);
   def_builtin ("__builtin_vsx_stxvw4x_v16qi", void_ftype_v16qi_long_pvoid,
 	       VSX_BUILTIN_STXVW4X_V16QI);
+
+  def_builtin ("__builtin_vsx_ld_elemrev_v2df", v2df_ftype_long_pcvoid,
+	       VSX_BUILTIN_LD_ELEMREV_V2DF);
+  def_builtin ("__builtin_vsx_ld_elemrev_v2di", v2di_ftype_long_pcvoid,
+	       VSX_BUILTIN_LD_ELEMREV_V2DI);
+  def_builtin ("__builtin_vsx_ld_elemrev_v4sf", v4sf_ftype_long_pcvoid,
+	       VSX_BUILTIN_LD_ELEMREV_V4SF);
+  def_builtin ("__builtin_vsx_ld_elemrev_v4si", v4si_ftype_long_pcvoid,
+	       VSX_BUILTIN_LD_ELEMREV_V4SI);
+  def_builtin ("__builtin_vsx_st_elemrev_v2df", void_ftype_v2df_long_pvoid,
+	       VSX_BUILTIN_ST_ELEMREV_V2DF);
+  def_builtin ("__builtin_vsx_st_elemrev_v2di", void_ftype_v2di_long_pvoid,
+	       VSX_BUILTIN_ST_ELEMREV_V2DI);
+  def_builtin ("__builtin_vsx_st_elemrev_v4sf", void_ftype_v4sf_long_pvoid,
+	       VSX_BUILTIN_ST_ELEMREV_V4SF);
+  def_builtin ("__builtin_vsx_st_elemrev_v4si", void_ftype_v4si_long_pvoid,
+	       VSX_BUILTIN_ST_ELEMREV_V4SI);
+
+  if (TARGET_P9_VECTOR)
+    {
+      def_builtin ("__builtin_vsx_ld_elemrev_v8hi", v8hi_ftype_long_pcvoid,
+		   VSX_BUILTIN_LD_ELEMREV_V8HI);
+      def_builtin ("__builtin_vsx_ld_elemrev_v16qi", v16qi_ftype_long_pcvoid,
+		   VSX_BUILTIN_LD_ELEMREV_V16QI);
+      def_builtin ("__builtin_vsx_st_elemrev_v8hi",
+		   void_ftype_v8hi_long_pvoid, VSX_BUILTIN_ST_ELEMREV_V8HI);
+      def_builtin ("__builtin_vsx_st_elemrev_v16qi",
+		   void_ftype_v16qi_long_pvoid, VSX_BUILTIN_ST_ELEMREV_V16QI);
+    }
+
   def_builtin ("__builtin_vec_vsx_ld", opaque_ftype_long_pcvoid,
 	       VSX_BUILTIN_VEC_LD);
   def_builtin ("__builtin_vec_vsx_st", void_ftype_opaque_long_pvoid,
 	       VSX_BUILTIN_VEC_ST);
+  def_builtin ("__builtin_vec_xl", opaque_ftype_long_pcvoid,
+	       VSX_BUILTIN_VEC_XL);
+  def_builtin ("__builtin_vec_xst", void_ftype_opaque_long_pvoid,
+	       VSX_BUILTIN_VEC_XST);
 
   def_builtin ("__builtin_vec_step", int_ftype_opaque, ALTIVEC_BUILTIN_VEC_STEP);
   def_builtin ("__builtin_vec_splats", opaque_ftype_opaque, ALTIVEC_BUILTIN_VEC_SPLATS);
@@ -34474,6 +34592,7 @@ static struct rs6000_opt_mask const rs6000_builtin_mask_names[] =
   { "popcntd",		 RS6000_BTM_POPCNTD,	false, false },
   { "cell",		 RS6000_BTM_CELL,	false, false },
   { "power8-vector",	 RS6000_BTM_P8_VECTOR,	false, false },
+  { "power9-vector",	 RS6000_BTM_P9_VECTOR,	false, false },
   { "crypto",		 RS6000_BTM_CRYPTO,	false, false },
   { "htm",		 RS6000_BTM_HTM,	false, false },
   { "hard-dfp",		 RS6000_BTM_DFP,	false, false },
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 8c6bd07..20a765a 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -615,6 +615,7 @@ extern int rs6000_vector_align[];
 #define MASK_MULTIPLE			OPTION_MASK_MULTIPLE
 #define MASK_NO_UPDATE			OPTION_MASK_NO_UPDATE
 #define MASK_P8_VECTOR			OPTION_MASK_P8_VECTOR
+#define MASK_P9_VECTOR			OPTION_MASK_P9_VECTOR
 #define MASK_POPCNTB			OPTION_MASK_POPCNTB
 #define MASK_POPCNTD			OPTION_MASK_POPCNTD
 #define MASK_PPC_GFXOPT			OPTION_MASK_PPC_GFXOPT
@@ -2660,6 +2661,7 @@ extern int frame_pointer_needed;
 #define RS6000_BTM_ALTIVEC	MASK_ALTIVEC	/* VMX/altivec vectors.  */
 #define RS6000_BTM_VSX		MASK_VSX	/* VSX (vector/scalar).  */
 #define RS6000_BTM_P8_VECTOR	MASK_P8_VECTOR	/* ISA 2.07 vector.  */
+#define RS6000_BTM_P9_VECTOR	MASK_P9_VECTOR	/* ISA 3.00 vector.  */
 #define RS6000_BTM_CRYPTO	MASK_CRYPTO	/* crypto funcs.  */
 #define RS6000_BTM_HTM		MASK_HTM	/* hardware TM funcs.  */
 #define RS6000_BTM_SPE		MASK_STRING	/* E500 */
@@ -2677,6 +2679,7 @@ extern int frame_pointer_needed;
 #define RS6000_BTM_COMMON	(RS6000_BTM_ALTIVEC			\
 				 | RS6000_BTM_VSX			\
 				 | RS6000_BTM_P8_VECTOR			\
+				 | RS6000_BTM_P9_VECTOR			\
 				 | RS6000_BTM_CRYPTO			\
 				 | RS6000_BTM_FRE			\
 				 | RS6000_BTM_FRES			\
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 45af233..508eeac 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -887,6 +887,140 @@
   "VECTOR_MEM_VSX_P (<MODE>mode)"
   "")
 
+;; Explicit load/store expanders for the builtin functions for lxvd2x, etc.,
+;; when you really want their element-reversing behavior.
+(define_insn "vsx_ld_elemrev_v2di"
+  [(set (match_operand:V2DI 0 "vsx_register_operand" "=wa")
+        (vec_select:V2DI
+	  (match_operand:V2DI 1 "memory_operand" "Z")
+	  (parallel [(const_int 1) (const_int 0)])))]
+  "VECTOR_MEM_VSX_P (V2DImode) && !BYTES_BIG_ENDIAN"
+  "lxvd2x %x0,%y1"
+  [(set_attr "type" "vecload")])
+
+(define_insn "vsx_ld_elemrev_v2df"
+  [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa")
+        (vec_select:V2DF
+	  (match_operand:V2DF 1 "memory_operand" "Z")
+	  (parallel [(const_int 1) (const_int 0)])))]
+  "VECTOR_MEM_VSX_P (V2DFmode) && !BYTES_BIG_ENDIAN"
+  "lxvd2x %x0,%y1"
+  [(set_attr "type" "vecload")])
+
+(define_insn "vsx_ld_elemrev_v4si"
+  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa")
+        (vec_select:V4SI
+	  (match_operand:V4SI 1 "memory_operand" "Z")
+	  (parallel [(const_int 3) (const_int 2)
+	             (const_int 1) (const_int 0)])))]
+  "VECTOR_MEM_VSX_P (V4SImode) && !BYTES_BIG_ENDIAN"
+  "lxvw4x %x0,%y1"
+  [(set_attr "type" "vecload")])
+
+(define_insn "vsx_ld_elemrev_v4sf"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa")
+        (vec_select:V4SF
+	  (match_operand:V4SF 1 "memory_operand" "Z")
+	  (parallel [(const_int 3) (const_int 2)
+	             (const_int 1) (const_int 0)])))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && !BYTES_BIG_ENDIAN"
+  "lxvw4x %x0,%y1"
+  [(set_attr "type" "vecload")])
+
+(define_insn "vsx_ld_elemrev_v8hi"
+  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
+        (vec_select:V8HI
+	  (match_operand:V8HI 1 "memory_operand" "Z")
+	  (parallel [(const_int 7) (const_int 6)
+	             (const_int 5) (const_int 4)
+		     (const_int 3) (const_int 2)
+	             (const_int 1) (const_int 0)])))]
+  "VECTOR_MEM_VSX_P (V8HImode) && !BYTES_BIG_ENDIAN && TARGET_P9_VECTOR"
+  "lxvh8x %x0,%y1"
+  [(set_attr "type" "vecload")])
+
+(define_insn "vsx_ld_elemrev_v16qi"
+  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
+        (vec_select:V16QI
+	  (match_operand:V16QI 1 "memory_operand" "Z")
+	  (parallel [(const_int 15) (const_int 14)
+	             (const_int 13) (const_int 12)
+		     (const_int 11) (const_int 10)
+		     (const_int  9) (const_int  8)
+		     (const_int  7) (const_int  6)
+	             (const_int  5) (const_int  4)
+		     (const_int  3) (const_int  2)
+	             (const_int  1) (const_int  0)])))]
+  "VECTOR_MEM_VSX_P (V16QImode) && !BYTES_BIG_ENDIAN && TARGET_P9_VECTOR"
+  "lxvb16x %x0,%y1"
+  [(set_attr "type" "vecload")])
+
+(define_insn "vsx_st_elemrev_v2df"
+  [(set (match_operand:V2DF 0 "memory_operand" "=Z")
+        (vec_select:V2DF
+	  (match_operand:V2DF 1 "vsx_register_operand" "wa")
+	  (parallel [(const_int 1) (const_int 0)])))]
+  "VECTOR_MEM_VSX_P (V2DFmode) && !BYTES_BIG_ENDIAN"
+  "stxvd2x %x1,%y0"
+  [(set_attr "type" "vecstore")])
+
+(define_insn "vsx_st_elemrev_v2di"
+  [(set (match_operand:V2DI 0 "memory_operand" "=Z")
+        (vec_select:V2DI
+	  (match_operand:V2DI 1 "vsx_register_operand" "wa")
+	  (parallel [(const_int 1) (const_int 0)])))]
+  "VECTOR_MEM_VSX_P (V2DImode) && !BYTES_BIG_ENDIAN"
+  "stxvd2x %x1,%y0"
+  [(set_attr "type" "vecstore")])
+
+(define_insn "vsx_st_elemrev_v4sf"
+  [(set (match_operand:V4SF 0 "memory_operand" "=Z")
+        (vec_select:V4SF
+	  (match_operand:V4SF 1 "vsx_register_operand" "wa")
+	  (parallel [(const_int 3) (const_int 2)
+	             (const_int 1) (const_int 0)])))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && !BYTES_BIG_ENDIAN"
+  "stxvw4x %x1,%y0"
+  [(set_attr "type" "vecstore")])
+
+(define_insn "vsx_st_elemrev_v4si"
+  [(set (match_operand:V4SI 0 "memory_operand" "=Z")
+        (vec_select:V4SI
+	  (match_operand:V4SI 1 "vsx_register_operand" "wa")
+	  (parallel [(const_int 3) (const_int 2)
+	             (const_int 1) (const_int 0)])))]
+  "VECTOR_MEM_VSX_P (V4SImode) && !BYTES_BIG_ENDIAN"
+  "stxvw4x %x1,%y0"
+  [(set_attr "type" "vecstore")])
+
+(define_insn "vsx_st_elemrev_v8hi"
+  [(set (match_operand:V8HI 0 "memory_operand" "=Z")
+        (vec_select:V8HI
+	  (match_operand:V8HI 1 "vsx_register_operand" "wa")
+	  (parallel [(const_int 7) (const_int 6)
+	             (const_int 5) (const_int 4)
+		     (const_int 3) (const_int 2)
+	             (const_int 1) (const_int 0)])))]
+  "VECTOR_MEM_VSX_P (V8HImode) && !BYTES_BIG_ENDIAN && TARGET_P9_VECTOR"
+  "stxvh8x %x1,%y0"
+  [(set_attr "type" "vecstore")])
+
+(define_insn "vsx_st_elemrev_v16qi"
+  [(set (match_operand:V16QI 0 "memory_operand" "=Z")
+        (vec_select:V16QI
+	  (match_operand:V16QI 1 "vsx_register_operand" "wa")
+	  (parallel [(const_int 15) (const_int 14)
+	             (const_int 13) (const_int 12)
+		     (const_int 11) (const_int 10)
+		     (const_int  9) (const_int  8)
+	             (const_int  7) (const_int  6)
+	             (const_int  5) (const_int  4)
+		     (const_int  3) (const_int  2)
+	             (const_int  1) (const_int  0)])))]
+  "VECTOR_MEM_VSX_P (V16QImode) && !BYTES_BIG_ENDIAN && TARGET_P9_VECTOR"
+  "stxvb16x %x1,%y0"
+  [(set_attr "type" "vecstore")])
+
 
 ;; VSX vector floating point arithmetic instructions.  The VSX scalar
 ;; instructions are now combined with the insn for the traditional floating
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index a5a8b23..dc2570f 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -15932,6 +15932,18 @@ void vec_st (vector double, int, vector double *);
 void vec_st (vector double, int, double *);
 vector double vec_sub (vector double, vector double);
 vector double vec_trunc (vector double);
+vector double vec_xl (int, vector double *);
+vector double vec_xl (int, double *);
+vector long long vec_xl (int, vector long long *);
+vector long long vec_xl (int, long long *);
+vector unsigned long long vec_xl (int, vector unsigned long long *);
+vector unsigned long long vec_xl (int, unsigned long long *);
+vector float vec_xl (int, vector float *);
+vector float vec_xl (int, float *);
+vector int vec_xl (int, vector int *);
+vector int vec_xl (int, int *);
+vector unsigned int vec_xl (int, vector unsigned int *);
+vector unsigned int vec_xl (int, unsigned int *);
 vector double vec_xor (vector double, vector double);
 vector double vec_xor (vector double, vector bool long);
 vector double vec_xor (vector bool long, vector double);
@@ -15941,6 +15953,18 @@ vector long vec_xor (vector bool long, vector long);
 vector unsigned long vec_xor (vector unsigned long, vector unsigned long);
 vector unsigned long vec_xor (vector unsigned long, vector bool long);
 vector unsigned long vec_xor (vector bool long, vector unsigned long);
+void vec_xst (vector double, int, vector double *);
+void vec_xst (vector double, int, double *);
+void vec_xst (vector long long, int, vector long long *);
+void vec_xst (vector long long, int, long long *);
+void vec_xst (vector unsigned long long, int, vector unsigned long long *);
+void vec_xst (vector unsigned long long, int, unsigned long long *);
+void vec_xst (vector float, int, vector float *);
+void vec_xst (vector float, int, float *);
+void vec_xst (vector int, int, vector int *);
+void vec_xst (vector int, int, int *);
+void vec_xst (vector unsigned int, int, vector unsigned int *);
+void vec_xst (vector unsigned int, int, unsigned int *);
 int vec_all_eq (vector double, vector double);
 int vec_all_ge (vector double, vector double);
 int vec_all_gt (vector double, vector double);
@@ -16055,7 +16079,7 @@ if the VSX instruction set is available.  The @samp{vec_vsx_ld} and
 @samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
 
 If the ISA 2.07 additions to the vector/scalar (power8-vector)
-instruction set is available, the following additional functions are
+instruction set are available, the following additional functions are
 available for both 32-bit and 64-bit targets.  For 64-bit targets, you
 can use @var{vector long} instead of @var{vector long long},
 @var{vector bool long} instead of @var{vector bool long long}, and
@@ -16368,7 +16392,7 @@ vector unsigned long long vec_vupklsw (vector int);
 @end smallexample
 
 If the ISA 2.07 additions to the vector/scalar (power8-vector)
-instruction set is available, the following additional functions are
+instruction set are available, the following additional functions are
 available for 64-bit targets.  New vector types
 (@var{vector __int128_t} and @var{vector __uint128_t}) are available
 to hold the @var{__int128_t} and @var{__uint128_t} types to use these
@@ -16483,6 +16507,28 @@ The second argument to the @var{__builtin_crypto_vshasigmad} and
 integer that is 0 or 1.  The third argument to these builtin functions
 must be a constant integer in the range of 0 to 15.
 
+If the ISA 3.00 additions to the vector/scalar (power9-vector)
+instruction set are available, the following additional functions are
+available for both 32-bit and 64-bit targets.
+
+vector short vec_xl (int, vector short *);
+vector short vec_xl (int, short *);
+vector unsigned short vec_xl (int, vector unsigned short *);
+vector unsigned short vec_xl (int, unsigned short *);
+vector char vec_xl (int, vector char *);
+vector char vec_xl (int, char *);
+vector unsigned char vec_xl (int, vector unsigned char *);
+vector unsigned char vec_xl (int, unsigned char *);
+
+void vec_xst (vector short, int, vector short *);
+void vec_xst (vector short, int, short *);
+void vec_xst (vector unsigned short, int, vector unsigned short *);
+void vec_xst (vector unsigned short, int, unsigned short *);
+void vec_xst (vector char, int, vector char *);
+void vec_xst (vector char, int, char *);
+void vec_xst (vector unsigned char, int, vector unsigned char *);
+void vec_xst (vector unsigned char, int, unsigned char *);
+
 @node PowerPC Hardware Transactional Memory Built-in Functions
 @subsection PowerPC Hardware Transactional Memory Built-in Functions
 GCC provides two interfaces for accessing the Hardware Transactional
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-elemrev-1.c b/gcc/testsuite/gcc.target/powerpc/vsx-elemrev-1.c
new file mode 100644
index 0000000..7ab6d44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-elemrev-1.c
@@ -0,0 +1,143 @@
+/* { dg-do compile { target { powerpc64le*-*-* } } } */
+/* { dg-skip-if "do not override mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O0" } */
+/* { dg-final { scan-assembler-times "lxvd2x" 18 } } */
+/* { dg-final { scan-assembler-times "lxvw4x" 6 } } */
+/* { dg-final { scan-assembler-times "stxvd2x" 18 } } */
+/* { dg-final { scan-assembler-times "stxvw4x" 6 } } */
+/* { dg-final { scan-assembler-times "xxpermdi" 24 } } */
+
+#include <altivec.h>
+
+extern vector double vd, *vdp;
+extern vector signed long long vsll, *vsllp;
+extern vector unsigned long long vull, *vullp;
+extern vector float vf, *vfp;
+extern vector signed int vsi, *vsip;
+extern vector unsigned int vui, *vuip;
+extern double *dp;
+extern signed long long *sllp;
+extern unsigned long long *ullp;
+extern float *fp;
+extern signed int *sip;
+extern unsigned int *uip;
+
+void foo0 (void)
+{
+  vd = vec_xl (0, vdp);
+}
+
+void foo1 (void)
+{
+  vsll = vec_xl (0, vsllp);
+}
+
+void foo2 (void)
+{
+  vull = vec_xl (0, vullp);
+}
+
+void foo3 (void)
+{
+  vf = vec_xl (0, vfp);
+}
+
+void foo4 (void)
+{
+  vsi = vec_xl (0, vsip);
+}
+
+void foo5 (void)
+{
+  vui = vec_xl (0, vuip);
+}
+
+void foo6 (void)
+{
+  vec_xst (vd, 0, vdp);
+}
+
+void foo7 (void)
+{
+  vec_xst (vsll, 0, vsllp);
+}
+
+void foo8 (void)
+{
+  vec_xst (vull, 0, vullp);
+}
+
+void foo9 (void)
+{
+  vec_xst (vf, 0, vfp);
+}
+
+void foo10 (void)
+{
+  vec_xst (vsi, 0, vsip);
+}
+
+void foo11 (void)
+{
+  vec_xst (vui, 0, vuip);
+}
+
+void foo20 (void)
+{
+  vd = vec_xl (0, dp);
+}
+
+void foo21 (void)
+{
+  vsll = vec_xl (0, sllp);
+}
+
+void foo22 (void)
+{
+  vull = vec_xl (0, ullp);
+}
+
+void foo23 (void)
+{
+  vf = vec_xl (0, fp);
+}
+
+void foo24 (void)
+{
+  vsi = vec_xl (0, sip);
+}
+
+void foo25 (void)
+{
+  vui = vec_xl (0, uip);
+}
+
+void foo26 (void)
+{
+  vec_xst (vd, 0, dp);
+}
+
+void foo27 (void)
+{
+  vec_xst (vsll, 0, sllp);
+}
+
+void foo28 (void)
+{
+  vec_xst (vull, 0, ullp);
+}
+
+void foo29 (void)
+{
+  vec_xst (vf, 0, fp);
+}
+
+void foo30 (void)
+{
+  vec_xst (vsi, 0, sip);
+}
+
+void foo31 (void)
+{
+  vec_xst (vui, 0, uip);
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-elemrev-2.c b/gcc/testsuite/gcc.target/powerpc/vsx-elemrev-2.c
new file mode 100644
index 0000000..f1c4403
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-elemrev-2.c
@@ -0,0 +1,234 @@
+/* { dg-do compile { target { powerpc64le*-*-* } } } */
+/* { dg-skip-if "do not override mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O0" } */
+/* { dg-final { scan-assembler-times "lxvd2x" 6 } } */
+/* { dg-final { scan-assembler-times "lxvw4x" 6 } } */
+/* { dg-final { scan-assembler-times "lxvh8x" 4 } } */
+/* { dg-final { scan-assembler-times "lxvb16x" 4 } } */
+/* { dg-final { scan-assembler-times "stxvd2x" 6 } } */
+/* { dg-final { scan-assembler-times "stxvw4x" 6 } } */
+/* { dg-final { scan-assembler-times "stxvh8x" 4 } } */
+/* { dg-final { scan-assembler-times "stxvb16x" 4 } } */
+
+#include <altivec.h>
+
+extern vector double vd, *vdp;
+extern vector signed long long vsll, *vsllp;
+extern vector unsigned long long vull, *vullp;
+extern vector float vf, *vfp;
+extern vector signed int vsi, *vsip;
+extern vector unsigned int vui, *vuip;
+extern vector signed short vss, *vssp;
+extern vector unsigned short vus, *vusp;
+extern vector signed char vsc, *vscp;
+extern vector unsigned char vuc, *vucp;
+extern double *dp;
+extern signed long long *sllp;
+extern unsigned long long *ullp;
+extern float *fp;
+extern signed int *sip;
+extern unsigned int *uip;
+extern signed short *ssp;
+extern unsigned short *usp;
+extern signed char *scp;
+extern unsigned char *ucp;
+
+void foo0 (void)
+{
+  vd = vec_xl (0, vdp);
+}
+
+void foo1 (void)
+{
+  vsll = vec_xl (0, vsllp);
+}
+
+void foo2 (void)
+{
+  vull = vec_xl (0, vullp);
+}
+
+void foo3 (void)
+{
+  vf = vec_xl (0, vfp);
+}
+
+void foo4 (void)
+{
+  vsi = vec_xl (0, vsip);
+}
+
+void foo5 (void)
+{
+  vui = vec_xl (0, vuip);
+}
+
+void foo6 (void)
+{
+  vss = vec_xl (0, vssp);
+}
+
+void foo7 (void)
+{
+  vus = vec_xl (0, vusp);
+}
+
+void foo8 (void)
+{
+  vsc = vec_xl (0, vscp);
+}
+
+void foo9 (void)
+{
+  vuc = vec_xl (0, vucp);
+}
+
+void foo10 (void)
+{
+  vec_xst (vd, 0, vdp);
+}
+
+void foo11 (void)
+{
+  vec_xst (vsll, 0, vsllp);
+}
+
+void foo12 (void)
+{
+  vec_xst (vull, 0, vullp);
+}
+
+void foo13 (void)
+{
+  vec_xst (vf, 0, vfp);
+}
+
+void foo14 (void)
+{
+  vec_xst (vsi, 0, vsip);
+}
+
+void foo15 (void)
+{
+  vec_xst (vui, 0, vuip);
+}
+
+void foo16 (void)
+{
+  vec_xst (vss, 0, vssp);
+}
+
+void foo17 (void)
+{
+  vec_xst (vus, 0, vusp);
+}
+
+void foo18 (void)
+{
+  vec_xst (vsc, 0, vscp);
+}
+
+void foo19 (void)
+{
+  vec_xst (vuc, 0, vucp);
+}
+
+void foo20 (void)
+{
+  vd = vec_xl (0, dp);
+}
+
+void foo21 (void)
+{
+  vsll = vec_xl (0, sllp);
+}
+
+void foo22 (void)
+{
+  vull = vec_xl (0, ullp);
+}
+
+void foo23 (void)
+{
+  vf = vec_xl (0, fp);
+}
+
+void foo24 (void)
+{
+  vsi = vec_xl (0, sip);
+}
+
+void foo25 (void)
+{
+  vui = vec_xl (0, uip);
+}
+
+void foo26 (void)
+{
+  vss = vec_xl (0, ssp);
+}
+
+void foo27 (void)
+{
+  vus = vec_xl (0, usp);
+}
+
+void foo28 (void)
+{
+  vsc = vec_xl (0, scp);
+}
+
+void foo29 (void)
+{
+  vuc = vec_xl (0, ucp);
+}
+
+void foo30 (void)
+{
+  vec_xst (vd, 0, dp);
+}
+
+void foo31 (void)
+{
+  vec_xst (vsll, 0, sllp);
+}
+
+void foo32 (void)
+{
+  vec_xst (vull, 0, ullp);
+}
+
+void foo33 (void)
+{
+  vec_xst (vf, 0, fp);
+}
+
+void foo34 (void)
+{
+  vec_xst (vsi, 0, sip);
+}
+
+void foo35 (void)
+{
+  vec_xst (vui, 0, uip);
+}
+
+void foo36 (void)
+{
+  vec_xst (vss, 0, ssp);
+}
+
+void foo37 (void)
+{
+  vec_xst (vus, 0, usp);
+}
+
+void foo38 (void)
+{
+  vec_xst (vsc, 0, scp);
+}
+
+void foo39 (void)
+{
+  vec_xst (vuc, 0, ucp);
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-elemrev-3.c b/gcc/testsuite/gcc.target/powerpc/vsx-elemrev-3.c
new file mode 100644
index 0000000..2888c17
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-elemrev-3.c
@@ -0,0 +1,142 @@
+/* { dg-do compile { target { powerpc64-*-* } } } */
+/* { dg-skip-if "do not override mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O0" } */
+/* { dg-final { scan-assembler-times "lxvd2x" 16 } } */
+/* { dg-final { scan-assembler-times "lxvw4x" 8 } } */
+/* { dg-final { scan-assembler-times "stxvd2x" 16 } } */
+/* { dg-final { scan-assembler-times "stxvw4x" 8 } } */
+
+#include <altivec.h>
+
+extern vector double vd, *vdp;
+extern vector signed long long vsll, *vsllp;
+extern vector unsigned long long vull, *vullp;
+extern vector float vf, *vfp;
+extern vector signed int vsi, *vsip;
+extern vector unsigned int vui, *vuip;
+extern double *dp;
+extern signed long long *sllp;
+extern unsigned long long *ullp;
+extern float *fp;
+extern signed int *sip;
+extern unsigned int *uip;
+
+void foo0 (void)
+{
+  vd = vec_xl (0, vdp);
+}
+
+void foo1 (void)
+{
+  vsll = vec_xl (0, vsllp);
+}
+
+void foo2 (void)
+{
+  vull = vec_xl (0, vullp);
+}
+
+void foo3 (void)
+{
+  vf = vec_xl (0, vfp);
+}
+
+void foo4 (void)
+{
+  vsi = vec_xl (0, vsip);
+}
+
+void foo5 (void)
+{
+  vui = vec_xl (0, vuip);
+}
+
+void foo6 (void)
+{
+  vec_xst (vd, 0, vdp);
+}
+
+void foo7 (void)
+{
+  vec_xst (vsll, 0, vsllp);
+}
+
+void foo8 (void)
+{
+  vec_xst (vull, 0, vullp);
+}
+
+void foo9 (void)
+{
+  vec_xst (vf, 0, vfp);
+}
+
+void foo10 (void)
+{
+  vec_xst (vsi, 0, vsip);
+}
+
+void foo11 (void)
+{
+  vec_xst (vui, 0, vuip);
+}
+
+void foo20 (void)
+{
+  vd = vec_xl (0, dp);
+}
+
+void foo21 (void)
+{
+  vsll = vec_xl (0, sllp);
+}
+
+void foo22 (void)
+{
+  vull = vec_xl (0, ullp);
+}
+
+void foo23 (void)
+{
+  vf = vec_xl (0, fp);
+}
+
+void foo24 (void)
+{
+  vsi = vec_xl (0, sip);
+}
+
+void foo25 (void)
+{
+  vui = vec_xl (0, uip);
+}
+
+void foo26 (void)
+{
+  vec_xst (vd, 0, dp);
+}
+
+void foo27 (void)
+{
+  vec_xst (vsll, 0, sllp);
+}
+
+void foo28 (void)
+{
+  vec_xst (vull, 0, ullp);
+}
+
+void foo29 (void)
+{
+  vec_xst (vf, 0, fp);
+}
+
+void foo30 (void)
+{
+  vec_xst (vsi, 0, sip);
+}
+
+void foo31 (void)
+{
+  vec_xst (vui, 0, uip);
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-elemrev-4.c b/gcc/testsuite/gcc.target/powerpc/vsx-elemrev-4.c
new file mode 100644
index 0000000..ef84581
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-elemrev-4.c
@@ -0,0 +1,228 @@
+/* { dg-do compile { target { powerpc64-*-* } } } */
+/* { dg-skip-if "do not override mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O0" } */
+/* { dg-final { scan-assembler-times "lxvx" 40 } } */
+/* { dg-final { scan-assembler-times "stxvx" 40 } } */
+
+#include <altivec.h>
+
+extern vector double vd, *vdp;
+extern vector signed long long vsll, *vsllp;
+extern vector unsigned long long vull, *vullp;
+extern vector float vf, *vfp;
+extern vector signed int vsi, *vsip;
+extern vector unsigned int vui, *vuip;
+extern vector signed short vss, *vssp;
+extern vector unsigned short vus, *vusp;
+extern vector signed char vsc, *vscp;
+extern vector unsigned char vuc, *vucp;
+extern double *dp;
+extern signed long long *sllp;
+extern unsigned long long *ullp;
+extern float *fp;
+extern signed int *sip;
+extern unsigned int *uip;
+extern signed short *ssp;
+extern unsigned short *usp;
+extern signed char *scp;
+extern unsigned char *ucp;
+
+void foo0 (void)
+{
+  vd = vec_xl (0, vdp);
+}
+
+void foo1 (void)
+{
+  vsll = vec_xl (0, vsllp);
+}
+
+void foo2 (void)
+{
+  vull = vec_xl (0, vullp);
+}
+
+void foo3 (void)
+{
+  vf = vec_xl (0, vfp);
+}
+
+void foo4 (void)
+{
+  vsi = vec_xl (0, vsip);
+}
+
+void foo5 (void)
+{
+  vui = vec_xl (0, vuip);
+}
+
+void foo6 (void)
+{
+  vss = vec_xl (0, vssp);
+}
+
+void foo7 (void)
+{
+  vus = vec_xl (0, vusp);
+}
+
+void foo8 (void)
+{
+  vsc = vec_xl (0, vscp);
+}
+
+void foo9 (void)
+{
+  vuc = vec_xl (0, vucp);
+}
+
+void foo10 (void)
+{
+  vec_xst (vd, 0, vdp);
+}
+
+void foo11 (void)
+{
+  vec_xst (vsll, 0, vsllp);
+}
+
+void foo12 (void)
+{
+  vec_xst (vull, 0, vullp);
+}
+
+void foo13 (void)
+{
+  vec_xst (vf, 0, vfp);
+}
+
+void foo14 (void)
+{
+  vec_xst (vsi, 0, vsip);
+}
+
+void foo15 (void)
+{
+  vec_xst (vui, 0, vuip);
+}
+
+void foo16 (void)
+{
+  vec_xst (vss, 0, vssp);
+}
+
+void foo17 (void)
+{
+  vec_xst (vus, 0, vusp);
+}
+
+void foo18 (void)
+{
+  vec_xst (vsc, 0, vscp);
+}
+
+void foo19 (void)
+{
+  vec_xst (vuc, 0, vucp);
+}
+
+void foo20 (void)
+{
+  vd = vec_xl (0, dp);
+}
+
+void foo21 (void)
+{
+  vsll = vec_xl (0, sllp);
+}
+
+void foo22 (void)
+{
+  vull = vec_xl (0, ullp);
+}
+
+void foo23 (void)
+{
+  vf = vec_xl (0, fp);
+}
+
+void foo24 (void)
+{
+  vsi = vec_xl (0, sip);
+}
+
+void foo25 (void)
+{
+  vui = vec_xl (0, uip);
+}
+
+void foo26 (void)
+{
+  vss = vec_xl (0, ssp);
+}
+
+void foo27 (void)
+{
+  vus = vec_xl (0, usp);
+}
+
+void foo28 (void)
+{
+  vsc = vec_xl (0, scp);
+}
+
+void foo29 (void)
+{
+  vuc = vec_xl (0, ucp);
+}
+
+void foo30 (void)
+{
+  vec_xst (vd, 0, dp);
+}
+
+void foo31 (void)
+{
+  vec_xst (vsll, 0, sllp);
+}
+
+void foo32 (void)
+{
+  vec_xst (vull, 0, ullp);
+}
+
+void foo33 (void)
+{
+  vec_xst (vf, 0, fp);
+}
+
+void foo34 (void)
+{
+  vec_xst (vsi, 0, sip);
+}
+
+void foo35 (void)
+{
+  vec_xst (vui, 0, uip);
+}
+
+void foo36 (void)
+{
+  vec_xst (vss, 0, ssp);
+}
+
+void foo37 (void)
+{
+  vec_xst (vus, 0, usp);
+}
+
+void foo38 (void)
+{
+  vec_xst (vsc, 0, scp);
+}
+
+void foo39 (void)
+{
+  vec_xst (vuc, 0, ucp);
+}
Follow-Ups:
- Re: [PATCH, rs6000] Add support for vector element-reversal built-ins
  - From: Segher Boessenkool
References:
- [PATCH, rs6000] Add support for vector element-reversal built-ins
  - From: Bill Schmidt
- Re: [PATCH, rs6000] Add support for vector element-reversal built-ins
  - From: Segher Boessenkool
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]