This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, rs6000] Repair vec_xl, vec_xst, vec_xl_be, vec_xst_be built-in functions
- From: Bill Schmidt <wschmidt at linux dot vnet dot ibm dot com>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Cc: Segher Boessenkool <segher at kernel dot crashing dot org>, David Edelsohn <dje dot gcc at gmail dot com>, cel at linux dot vnet dot ibm dot com
- Date: Tue, 14 Nov 2017 14:24:13 -0600
- Subject: Re: [PATCH, rs6000] Repair vec_xl, vec_xst, vec_xl_be, vec_xst_be built-in functions
- Authentication-results: sourceware.org; auth=none
- References: <49ed24c9-7edb-428a-a960-fb7dc443d82a@linux.vnet.ibm.com> <1470EC8C-700D-445E-9DEA-10CF5F2CAF0A@linux.vnet.ibm.com>
Hi,
I had a pasto in the function prototype for vec_xst_be. Fixed patch is below.
Thanks,
Bill
On 11/14/17 8:11 AM, Bill Schmidt wrote:
> Please hold review on this until I investigate something that Carl brought up. Thanks,
> and sorry for the noise!
>
> -- Bill
>> On Nov 13, 2017, at 10:30 AM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote:
>>
>> Hi,
>>
>> Some previous patches to add support for vec_xl_be and fill in gaps in testing
>> for vec_xl and vec_xst resulted in incorrect semantics for these built-in
>> functions. My fault for not reviewing them in detail. Essentially the vec_xl
>> and vec_xl_be semantics were reversed, and vec_xst received semantics that
>> should be associated with vec_xst_be. Also, vec_xst_be has not yet been
>> implemented. Also, my attempt to add P8-specific code for the element-
>> reversing loads and stores had the same problem with wrong semantics. This
>> patch addresses these issues in a uniform manner.
>>
>> Most of the work is done by adjusting the rs6000-c.c mapping between overloaded
>> function names and type-specific implementations, and adding missing interfaces
>> there. I've also removed the previous implementation function
>> altivec_expand_xl_be_builtin, and moved the byte-reversal into define_expands
>> for the code gen patterns with some simplification. These generate single
>> instructions for P9 (lxvh8x, lxvq16x, etc.) and lvxw4x + permute for P8.
>>
>> I've verified the code generation by hand, but have not included test cases
>> in this patch because Carl Love has been independently working on adding a
>> full set of test cases for these built-ins. The code does pass those relevant
>> tests that are already in the test suite. I hope it's okay to install this
>> patch while waiting for the full tests; I'll of course give high priority to
>> fixing any possible fallout from those tests.
>>
>> Bootstrapped and tested on powerpc64le-linux-gnu (POWER8 and POWER9) with no
>> regressions. Is this okay for trunk?
>>
>> Thanks,
>> Bill
>>
[gcc]
2017-11-14 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
* config/rs6000/altivec.h (vec_xst_be): New #define.
* config/rs6000/altivec.md (altivec_vperm_<mode>_direct): Rename
and externalize from *altivec_vperm_<mode>_internal.
* config/rs6000/rs6000-builtin.def (XL_BE_V16QI): Remove macro
instantiation.
(XL_BE_V8HI): Likewise.
(XL_BE_V4SI): Likewise.
(XL_BE_V4SI): Likewise.
(XL_BE_V2DI): Likewise.
(XL_BE_V4SF): Likewise.
(XL_BE_V2DF): Likewise.
(XST_BE): Add BU_VSX_OVERLOAD_X macro instantiation.
* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Correct
all array entries with these keys: VSX_BUILTIN_VEC_XL,
VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_VEC_XST. Add entries for key
VSX_BUILTIN_VEC_XST_BE.
* config/rs6000/rs6000.c (altivec_expand_xl_be_builtin): Remove.
(altivec_expand_builtin): Remove handling for VSX_BUILTIN_XL_BE_*
built-ins.
(altivec_init_builtins): Replace conditional calls to def_builtin
for __builtin_vsx_ld_elemrev_{v8hi,v16qi} and
__builtin_vsx_st_elemrev_{v8hi,v16qi} based on TARGET_P9_VECTOR
with unconditional calls. Remove calls to def_builtin for
__builtin_vsx_le_be_<mode>. Add a call to def_builtin for
__builtin_vec_xst_be.
* config/rs6000/vsx.md (vsx_ld_elemrev_v8hi): Convert define_insn
to define_expand, and add alternate RTL generation for P8.
(*vsx_ld_elemrev_v8hi_internal): New define_insn based on
vsx_ld_elemrev_v8hi.
(vsx_ld_elemrev_v16qi): Convert define_insn to define_expand, and
add alternate RTL generation for P8.
(*vsx_ld_elemrev_v16qi_internal): New define_insn based on
vsx_ld_elemrev_v16qi.
(vsx_st_elemrev_v8hi): Convert define_insn
to define_expand, and add alternate RTL generation for P8.
(*vsx_st_elemrev_v8hi_internal): New define_insn based on
vsx_st_elemrev_v8hi.
(vsx_st_elemrev_v16qi): Convert define_insn to define_expand, and
add alternate RTL generation for P8.
(*vsx_st_elemrev_v16qi_internal): New define_insn based on
vsx_st_elemrev_v16qi.
[gcc/testsuite]
2017-11-14 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
* gcc.target/powerpc/swaps-p8-26.c: Modify expected code
generation.
Index: gcc/config/rs6000/altivec.h
===================================================================
--- gcc/config/rs6000/altivec.h (revision 254629)
+++ gcc/config/rs6000/altivec.h (working copy)
@@ -357,6 +357,7 @@
#define vec_xl __builtin_vec_vsx_ld
#define vec_xl_be __builtin_vec_xl_be
#define vec_xst __builtin_vec_vsx_st
+#define vec_xst_be __builtin_vec_xst_be
/* Note, xxsldi and xxpermdi were added as __builtin_vsx_<xxx> functions
instead of __builtin_vec_<xxx> */
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md (revision 254629)
+++ gcc/config/rs6000/altivec.md (working copy)
@@ -2130,7 +2130,7 @@
})
;; Slightly prefer vperm, since the target does not overlap the source
-(define_insn "*altivec_vperm_<mode>_internal"
+(define_insn "altivec_vperm_<mode>_direct"
[(set (match_operand:VM 0 "register_operand" "=v,?wo")
(unspec:VM [(match_operand:VM 1 "register_operand" "v,wo")
(match_operand:VM 2 "register_operand" "v,0")
Index: gcc/config/rs6000/rs6000-builtin.def
===================================================================
--- gcc/config/rs6000/rs6000-builtin.def (revision 254629)
+++ gcc/config/rs6000/rs6000-builtin.def (working copy)
@@ -1728,14 +1728,6 @@ BU_VSX_X (LXVW4X_V4SF, "lxvw4x_v4sf", MEM)
BU_VSX_X (LXVW4X_V4SI, "lxvw4x_v4si", MEM)
BU_VSX_X (LXVW4X_V8HI, "lxvw4x_v8hi", MEM)
BU_VSX_X (LXVW4X_V16QI, "lxvw4x_v16qi", MEM)
-
-BU_VSX_X (XL_BE_V16QI, "xl_be_v16qi", MEM)
-BU_VSX_X (XL_BE_V8HI, "xl_be_v8hi", MEM)
-BU_VSX_X (XL_BE_V4SI, "xl_be_v4si", MEM)
-BU_VSX_X (XL_BE_V2DI, "xl_be_v2di", MEM)
-BU_VSX_X (XL_BE_V4SF, "xl_be_v4sf", MEM)
-BU_VSX_X (XL_BE_V2DF, "xl_be_v2df", MEM)
-
BU_VSX_X (STXSDX, "stxsdx", MEM)
BU_VSX_X (STXVD2X_V1TI, "stxvd2x_v1ti", MEM)
BU_VSX_X (STXVD2X_V2DF, "stxvd2x_v2df", MEM)
@@ -1838,6 +1830,7 @@ BU_VSX_OVERLOAD_X (ST, "st")
BU_VSX_OVERLOAD_X (XL, "xl")
BU_VSX_OVERLOAD_X (XL_BE, "xl_be")
BU_VSX_OVERLOAD_X (XST, "xst")
+BU_VSX_OVERLOAD_X (XST_BE, "xst_be")
/* 1 argument builtins pre ISA 2.04. */
BU_FP_MISC_1 (FCTID, "fctid", CONST, lrintdfdi2)
Index: gcc/config/rs6000/rs6000-c.c
===================================================================
--- gcc/config/rs6000/rs6000-c.c (revision 254629)
+++ gcc/config/rs6000/rs6000-c.c (working copy)
@@ -3055,69 +3055,94 @@ const struct altivec_builtin_types altivec_overloa
RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
{ ALTIVEC_BUILTIN_VEC_SUMS, ALTIVEC_BUILTIN_VSUMSWS,
RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
- { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V2DF,
+
+ { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DF,
RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 },
- { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V2DF,
+ { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DF,
RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double, 0 },
- { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V2DI,
+ { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DI,
RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI, 0 },
- { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V2DI,
+ { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DI,
RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 },
- { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V2DI,
+ { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DI,
RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
~RS6000_BTI_unsigned_V2DI, 0 },
- { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V2DI,
+ { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DI,
RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
~RS6000_BTI_unsigned_long_long, 0 },
- { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V4SF,
+ { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V4SF,
RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF, 0 },
- { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V4SF,
+ { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V4SF,
RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float, 0 },
- { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V4SI,
+ { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V4SI,
RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_V4SI, 0 },
- { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V4SI,
+ { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V4SI,
RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI, 0 },
- { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V4SI,
+ { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V4SI,
RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V4SI, 0 },
- { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V4SI,
+ { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V4SI,
RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTSI, 0 },
- { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V8HI,
+ { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V8HI,
RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_V8HI, 0 },
- { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V8HI,
+ { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V8HI,
RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_INTHI, 0 },
- { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V8HI,
+ { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V8HI,
RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V8HI, 0 },
- { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V8HI,
+ { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V8HI,
RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTHI, 0 },
- { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V16QI,
+ { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V16QI,
RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_V16QI, 0 },
- { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V16QI,
+ { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V16QI,
RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI, 0 },
- { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V16QI,
+ { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V16QI,
RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI,
~RS6000_BTI_unsigned_V16QI, 0 },
- { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V16QI,
+ { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V16QI,
RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI, 0 },
- { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_XL_BE_V16QI,
- RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI, 0 },
- { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_XL_BE_V16QI,
- RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI, 0 },
- { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_XL_BE_V8HI,
+
+ { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V2DF,
+ RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 },
+ { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V2DF,
+ RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double, 0 },
+ { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V2DI,
+ RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI, 0 },
+ { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V2DI,
+ RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 },
+ { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V2DI,
+ RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
+ ~RS6000_BTI_unsigned_V2DI, 0 },
+ { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V2DI,
+ RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
+ ~RS6000_BTI_unsigned_long_long, 0 },
+ { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V4SF,
+ RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF, 0 },
+ { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V4SF,
+ RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float, 0 },
+ { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V4SI,
+ RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_V4SI, 0 },
+ { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V4SI,
+ RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI, 0 },
+ { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V4SI,
+ RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V4SI, 0 },
+ { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V4SI,
+ RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTSI, 0 },
+ { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V8HI,
+ RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_V8HI, 0 },
+ { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V8HI,
RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_INTHI, 0 },
- { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_XL_BE_V8HI,
+ { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V8HI,
+ RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V8HI, 0 },
+ { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V8HI,
RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTHI, 0 },
- { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_XL_BE_V4SI,
- RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI, 0 },
- { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_XL_BE_V4SI,
- RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTSI, 0 },
- { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_XL_BE_V2DI,
- RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 },
- { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_XL_BE_V2DI,
- RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_long_long, 0 },
- { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_XL_BE_V4SF,
- RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float, 0 },
- { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_XL_BE_V2DF,
- RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double, 0 },
+ { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V16QI,
+ RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_V16QI, 0 },
+ { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V16QI,
+ RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI, 0 },
+ { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V16QI,
+ RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI,
+ ~RS6000_BTI_unsigned_V16QI, 0 },
+ { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V16QI,
+ RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI, 0 },
{ ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
{ ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR,
@@ -3893,53 +3918,111 @@ const struct altivec_builtin_types altivec_overloa
RS6000_BTI_void, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V16QI },
{ ALTIVEC_BUILTIN_VEC_STVRXL, ALTIVEC_BUILTIN_STVRXL,
RS6000_BTI_void, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI },
- { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V2DF,
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVD2X_V2DF,
RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF },
- { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V2DF,
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVD2X_V2DI,
+ RS6000_BTI_void, RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVD2X_V2DI,
+ RS6000_BTI_void, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
+ ~RS6000_BTI_unsigned_V2DI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVD2X_V2DI,
+ RS6000_BTI_void, RS6000_BTI_bool_V2DI, RS6000_BTI_INTSI,
+ ~RS6000_BTI_bool_V2DI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V4SF,
+ RS6000_BTI_void, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V4SF,
+ RS6000_BTI_void, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V4SI,
+ RS6000_BTI_void, RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_V4SI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V4SI,
+ RS6000_BTI_void, RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V4SI,
+ RS6000_BTI_void, RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V4SI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V4SI,
+ RS6000_BTI_void, RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTSI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V4SI,
+ RS6000_BTI_void, RS6000_BTI_bool_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V4SI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V4SI,
+ RS6000_BTI_void, RS6000_BTI_bool_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTSI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V4SI,
+ RS6000_BTI_void, RS6000_BTI_bool_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V8HI,
+ RS6000_BTI_void, RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_V8HI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V8HI,
+ RS6000_BTI_void, RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_INTHI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V8HI,
+ RS6000_BTI_void, RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V8HI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V8HI,
+ RS6000_BTI_void, RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTHI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V8HI,
+ RS6000_BTI_void, RS6000_BTI_bool_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V8HI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V8HI,
+ RS6000_BTI_void, RS6000_BTI_bool_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTHI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V8HI,
+ RS6000_BTI_void, RS6000_BTI_bool_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_INTHI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V16QI,
+ RS6000_BTI_void, RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_V16QI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V16QI,
+ RS6000_BTI_void, RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V16QI,
+ RS6000_BTI_void, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V16QI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V16QI,
+ RS6000_BTI_void, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V16QI,
+ RS6000_BTI_void, RS6000_BTI_bool_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V16QI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V16QI,
+ RS6000_BTI_void, RS6000_BTI_bool_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V16QI,
+ RS6000_BTI_void, RS6000_BTI_bool_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI },
+ { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V8HI,
+ RS6000_BTI_void, RS6000_BTI_pixel_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_pixel_V8HI },
+ { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V2DF,
+ RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF },
+ { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V2DF,
RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double },
- { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V2DI,
+ { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V2DI,
RS6000_BTI_void, RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI },
- { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V2DI,
+ { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V2DI,
RS6000_BTI_void, RS6000_BTI_V2DI, RS6000_BTI_INTSI,
~RS6000_BTI_long_long },
- { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V2DI,
+ { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V2DI,
RS6000_BTI_void, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
~RS6000_BTI_unsigned_V2DI },
- { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V2DI,
+ { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V2DI,
RS6000_BTI_void, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
~RS6000_BTI_unsigned_long_long },
- { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V4SF,
+ { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V4SF,
RS6000_BTI_void, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF },
- { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V4SF,
+ { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V4SF,
RS6000_BTI_void, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float },
- { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V4SI,
+ { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V4SI,
RS6000_BTI_void, RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_V4SI },
- { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V4SI,
+ { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V4SI,
RS6000_BTI_void, RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI },
- { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V4SI,
+ { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V4SI,
RS6000_BTI_void, RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI,
~RS6000_BTI_unsigned_V4SI },
- { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V4SI,
+ { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V4SI,
RS6000_BTI_void, RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI,
~RS6000_BTI_UINTSI },
- { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V8HI,
+ { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V8HI,
RS6000_BTI_void, RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_V8HI },
- { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V8HI,
+ { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V8HI,
RS6000_BTI_void, RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_INTHI },
- { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V8HI,
+ { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V8HI,
RS6000_BTI_void, RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI,
~RS6000_BTI_unsigned_V8HI },
- { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V8HI,
+ { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V8HI,
RS6000_BTI_void, RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI,
~RS6000_BTI_UINTHI },
- { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V16QI,
+ { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V16QI,
RS6000_BTI_void, RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_V16QI },
- { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V16QI,
+ { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V16QI,
RS6000_BTI_void, RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI },
- { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V16QI,
+ { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V16QI,
RS6000_BTI_void, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI,
~RS6000_BTI_unsigned_V16QI },
- { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V16QI,
+ { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V16QI,
RS6000_BTI_void, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI,
~RS6000_BTI_UINTQI },
{ VSX_BUILTIN_VEC_XXSLDWI, VSX_BUILTIN_XXSLDWI_16QI,
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c (revision 254629)
+++ gcc/config/rs6000/rs6000.c (working copy)
@@ -14510,58 +14510,6 @@ altivec_expand_lv_builtin (enum insn_code icode, t
}
static rtx
-altivec_expand_xl_be_builtin (enum insn_code icode, tree exp, rtx target, bool blk)
-{
- rtx pat, addr;
- tree arg0 = CALL_EXPR_ARG (exp, 0);
- tree arg1 = CALL_EXPR_ARG (exp, 1);
- machine_mode tmode = insn_data[icode].operand[0].mode;
- machine_mode mode0 = Pmode;
- machine_mode mode1 = Pmode;
- rtx op0 = expand_normal (arg0);
- rtx op1 = expand_normal (arg1);
-
- if (icode == CODE_FOR_nothing)
- /* Builtin not supported on this processor. */
- return 0;
-
- /* If we got invalid arguments bail out before generating bad rtl. */
- if (arg0 == error_mark_node || arg1 == error_mark_node)
- return const0_rtx;
-
- if (target == 0
- || GET_MODE (target) != tmode
- || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
- target = gen_reg_rtx (tmode);
-
- op1 = copy_to_mode_reg (mode1, op1);
-
- if (op0 == const0_rtx)
- addr = gen_rtx_MEM (blk ? BLKmode : tmode, op1);
- else
- {
- op0 = copy_to_mode_reg (mode0, op0);
- addr = gen_rtx_MEM (blk ? BLKmode : tmode,
- gen_rtx_PLUS (Pmode, op1, op0));
- }
-
- pat = GEN_FCN (icode) (target, addr);
- if (!pat)
- return 0;
-
- emit_insn (pat);
- /* Reverse element order of elements if in LE mode */
- if (!VECTOR_ELT_ORDER_BIG)
- {
- rtx sel = swap_selector_for_mode (tmode);
- rtx vperm = gen_rtx_UNSPEC (tmode, gen_rtvec (3, target, target, sel),
- UNSPEC_VPERM);
- emit_insn (gen_rtx_SET (target, vperm));
- }
- return target;
-}
-
-static rtx
paired_expand_stv_builtin (enum insn_code icode, tree exp)
{
tree arg0 = CALL_EXPR_ARG (exp, 0);
@@ -15957,50 +15905,6 @@ altivec_expand_builtin (tree exp, rtx target, bool
/* Fall through. */
}
- /* XL_BE We initialized them to always load in big endian order. */
- switch (fcode)
- {
- case VSX_BUILTIN_XL_BE_V2DI:
- {
- enum insn_code code = CODE_FOR_vsx_load_v2di;
- return altivec_expand_xl_be_builtin (code, exp, target, false);
- }
- break;
- case VSX_BUILTIN_XL_BE_V4SI:
- {
- enum insn_code code = CODE_FOR_vsx_load_v4si;
- return altivec_expand_xl_be_builtin (code, exp, target, false);
- }
- break;
- case VSX_BUILTIN_XL_BE_V8HI:
- {
- enum insn_code code = CODE_FOR_vsx_load_v8hi;
- return altivec_expand_xl_be_builtin (code, exp, target, false);
- }
- break;
- case VSX_BUILTIN_XL_BE_V16QI:
- {
- enum insn_code code = CODE_FOR_vsx_load_v16qi;
- return altivec_expand_xl_be_builtin (code, exp, target, false);
- }
- break;
- case VSX_BUILTIN_XL_BE_V2DF:
- {
- enum insn_code code = CODE_FOR_vsx_load_v2df;
- return altivec_expand_xl_be_builtin (code, exp, target, false);
- }
- break;
- case VSX_BUILTIN_XL_BE_V4SF:
- {
- enum insn_code code = CODE_FOR_vsx_load_v4sf;
- return altivec_expand_xl_be_builtin (code, exp, target, false);
- }
- break;
- default:
- break;
- /* Fall through. */
- }
-
*expandedp = false;
return NULL_RTX;
}
@@ -17618,6 +17522,10 @@ altivec_init_builtins (void)
VSX_BUILTIN_LD_ELEMREV_V4SF);
def_builtin ("__builtin_vsx_ld_elemrev_v4si", v4si_ftype_long_pcvoid,
VSX_BUILTIN_LD_ELEMREV_V4SI);
+ def_builtin ("__builtin_vsx_ld_elemrev_v8hi", v8hi_ftype_long_pcvoid,
+ VSX_BUILTIN_LD_ELEMREV_V8HI);
+ def_builtin ("__builtin_vsx_ld_elemrev_v16qi", v16qi_ftype_long_pcvoid,
+ VSX_BUILTIN_LD_ELEMREV_V16QI);
def_builtin ("__builtin_vsx_st_elemrev_v2df", void_ftype_v2df_long_pvoid,
VSX_BUILTIN_ST_ELEMREV_V2DF);
def_builtin ("__builtin_vsx_st_elemrev_v2di", void_ftype_v2di_long_pvoid,
@@ -17626,43 +17534,11 @@ altivec_init_builtins (void)
VSX_BUILTIN_ST_ELEMREV_V4SF);
def_builtin ("__builtin_vsx_st_elemrev_v4si", void_ftype_v4si_long_pvoid,
VSX_BUILTIN_ST_ELEMREV_V4SI);
+ def_builtin ("__builtin_vsx_st_elemrev_v8hi", void_ftype_v8hi_long_pvoid,
+ VSX_BUILTIN_ST_ELEMREV_V8HI);
+ def_builtin ("__builtin_vsx_st_elemrev_v16qi", void_ftype_v16qi_long_pvoid,
+ VSX_BUILTIN_ST_ELEMREV_V16QI);
- def_builtin ("__builtin_vsx_le_be_v8hi", v8hi_ftype_long_pcvoid,
- VSX_BUILTIN_XL_BE_V8HI);
- def_builtin ("__builtin_vsx_le_be_v4si", v4si_ftype_long_pcvoid,
- VSX_BUILTIN_XL_BE_V4SI);
- def_builtin ("__builtin_vsx_le_be_v2di", v2di_ftype_long_pcvoid,
- VSX_BUILTIN_XL_BE_V2DI);
- def_builtin ("__builtin_vsx_le_be_v4sf", v4sf_ftype_long_pcvoid,
- VSX_BUILTIN_XL_BE_V4SF);
- def_builtin ("__builtin_vsx_le_be_v2df", v2df_ftype_long_pcvoid,
- VSX_BUILTIN_XL_BE_V2DF);
- def_builtin ("__builtin_vsx_le_be_v16qi", v16qi_ftype_long_pcvoid,
- VSX_BUILTIN_XL_BE_V16QI);
-
- if (TARGET_P9_VECTOR)
- {
- def_builtin ("__builtin_vsx_ld_elemrev_v8hi", v8hi_ftype_long_pcvoid,
- VSX_BUILTIN_LD_ELEMREV_V8HI);
- def_builtin ("__builtin_vsx_ld_elemrev_v16qi", v16qi_ftype_long_pcvoid,
- VSX_BUILTIN_LD_ELEMREV_V16QI);
- def_builtin ("__builtin_vsx_st_elemrev_v8hi",
- void_ftype_v8hi_long_pvoid, VSX_BUILTIN_ST_ELEMREV_V8HI);
- def_builtin ("__builtin_vsx_st_elemrev_v16qi",
- void_ftype_v16qi_long_pvoid, VSX_BUILTIN_ST_ELEMREV_V16QI);
- }
- else
- {
- rs6000_builtin_decls[(int) VSX_BUILTIN_LD_ELEMREV_V8HI]
- = rs6000_builtin_decls[(int) VSX_BUILTIN_LXVW4X_V8HI];
- rs6000_builtin_decls[(int) VSX_BUILTIN_LD_ELEMREV_V16QI]
- = rs6000_builtin_decls[(int) VSX_BUILTIN_LXVW4X_V16QI];
- rs6000_builtin_decls[(int) VSX_BUILTIN_ST_ELEMREV_V8HI]
- = rs6000_builtin_decls[(int) VSX_BUILTIN_STXVW4X_V8HI];
- rs6000_builtin_decls[(int) VSX_BUILTIN_ST_ELEMREV_V16QI]
- = rs6000_builtin_decls[(int) VSX_BUILTIN_STXVW4X_V16QI];
- }
-
def_builtin ("__builtin_vec_vsx_ld", opaque_ftype_long_pcvoid,
VSX_BUILTIN_VEC_LD);
def_builtin ("__builtin_vec_vsx_st", void_ftype_opaque_long_pvoid,
@@ -17673,6 +17549,8 @@ altivec_init_builtins (void)
VSX_BUILTIN_VEC_XL_BE);
def_builtin ("__builtin_vec_xst", void_ftype_opaque_long_pvoid,
VSX_BUILTIN_VEC_XST);
+ def_builtin ("__builtin_vec_xst_be", void_ftype_opaque_long_pvoid,
+ VSX_BUILTIN_VEC_XST_BE);
def_builtin ("__builtin_vec_step", int_ftype_opaque, ALTIVEC_BUILTIN_VEC_STEP);
def_builtin ("__builtin_vec_splats", opaque_ftype_opaque, ALTIVEC_BUILTIN_VEC_SPLATS);
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md (revision 254629)
+++ gcc/config/rs6000/vsx.md (working copy)
@@ -1118,7 +1118,7 @@
"lxvw4x %x0,%y1"
[(set_attr "type" "vecload")])
-(define_insn "vsx_ld_elemrev_v8hi"
+(define_expand "vsx_ld_elemrev_v8hi"
[(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
(vec_select:V8HI
(match_operand:V8HI 1 "memory_operand" "Z")
@@ -1126,11 +1126,45 @@
(const_int 5) (const_int 4)
(const_int 3) (const_int 2)
(const_int 1) (const_int 0)])))]
+ "VECTOR_MEM_VSX_P (V8HImode) && !BYTES_BIG_ENDIAN"
+{
+ if (!TARGET_P9_VECTOR)
+ {
+ rtx tmp = gen_reg_rtx (V4SImode);
+ rtx subreg, subreg2, perm[16], pcv;
+ /* 2 is leftmost element in register */
+ unsigned int reorder[16] = {13,12,15,14,9,8,11,10,5,4,7,6,1,0,3,2};
+ int i;
+
+ subreg = simplify_gen_subreg (V4SImode, operands[1], V8HImode, 0);
+ emit_insn (gen_vsx_ld_elemrev_v4si (tmp, subreg));
+ subreg2 = simplify_gen_subreg (V8HImode, tmp, V4SImode, 0);
+
+ for (i = 0; i < 16; ++i)
+ perm[i] = GEN_INT (reorder[i]);
+
+ pcv = force_reg (V16QImode,
+ gen_rtx_CONST_VECTOR (V16QImode,
+ gen_rtvec_v (16, perm)));
+ emit_insn (gen_altivec_vperm_v8hi_direct (operands[0], subreg2,
+ subreg2, pcv));
+ DONE;
+ }
+})
+
+(define_insn "*vsx_ld_elemrev_v8hi_internal"
+ [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
+ (vec_select:V8HI
+ (match_operand:V8HI 1 "memory_operand" "Z")
+ (parallel [(const_int 7) (const_int 6)
+ (const_int 5) (const_int 4)
+ (const_int 3) (const_int 2)
+ (const_int 1) (const_int 0)])))]
"VECTOR_MEM_VSX_P (V8HImode) && !BYTES_BIG_ENDIAN && TARGET_P9_VECTOR"
"lxvh8x %x0,%y1"
[(set_attr "type" "vecload")])
-(define_insn "vsx_ld_elemrev_v16qi"
+(define_expand "vsx_ld_elemrev_v16qi"
[(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
(vec_select:V16QI
(match_operand:V16QI 1 "memory_operand" "Z")
@@ -1142,6 +1176,44 @@
(const_int 5) (const_int 4)
(const_int 3) (const_int 2)
(const_int 1) (const_int 0)])))]
+ "VECTOR_MEM_VSX_P (V16QImode) && !BYTES_BIG_ENDIAN"
+{
+ if (!TARGET_P9_VECTOR)
+ {
+ rtx tmp = gen_reg_rtx (V4SImode);
+ rtx subreg, subreg2, perm[16], pcv;
+ /* 3 is leftmost element in register */
+ unsigned int reorder[16] = {12,13,14,15,8,9,10,11,4,5,6,7,0,1,2,3};
+ int i;
+
+ subreg = simplify_gen_subreg (V4SImode, operands[1], V16QImode, 0);
+ emit_insn (gen_vsx_ld_elemrev_v4si (tmp, subreg));
+ subreg2 = simplify_gen_subreg (V16QImode, tmp, V4SImode, 0);
+
+ for (i = 0; i < 16; ++i)
+ perm[i] = GEN_INT (reorder[i]);
+
+ pcv = force_reg (V16QImode,
+ gen_rtx_CONST_VECTOR (V16QImode,
+ gen_rtvec_v (16, perm)));
+ emit_insn (gen_altivec_vperm_v16qi_direct (operands[0], subreg2,
+ subreg2, pcv));
+ DONE;
+ }
+})
+
+(define_insn "*vsx_ld_elemrev_v16qi_internal"
+ [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
+ (vec_select:V16QI
+ (match_operand:V16QI 1 "memory_operand" "Z")
+ (parallel [(const_int 15) (const_int 14)
+ (const_int 13) (const_int 12)
+ (const_int 11) (const_int 10)
+ (const_int 9) (const_int 8)
+ (const_int 7) (const_int 6)
+ (const_int 5) (const_int 4)
+ (const_int 3) (const_int 2)
+ (const_int 1) (const_int 0)])))]
"VECTOR_MEM_VSX_P (V16QImode) && !BYTES_BIG_ENDIAN && TARGET_P9_VECTOR"
"lxvb16x %x0,%y1"
[(set_attr "type" "vecload")])
@@ -1184,7 +1256,7 @@
"stxvw4x %x1,%y0"
[(set_attr "type" "vecstore")])
-(define_insn "vsx_st_elemrev_v8hi"
+(define_expand "vsx_st_elemrev_v8hi"
[(set (match_operand:V8HI 0 "memory_operand" "=Z")
(vec_select:V8HI
(match_operand:V8HI 1 "vsx_register_operand" "wa")
@@ -1192,11 +1264,43 @@
(const_int 5) (const_int 4)
(const_int 3) (const_int 2)
(const_int 1) (const_int 0)])))]
+ "VECTOR_MEM_VSX_P (V8HImode) && !BYTES_BIG_ENDIAN"
+{
+ if (!TARGET_P9_VECTOR)
+ {
+ rtx subreg, perm[16], pcv;
+ rtx tmp = gen_reg_rtx (V8HImode);
+ /* 2 is leftmost element in register */
+ unsigned int reorder[16] = {13,12,15,14,9,8,11,10,5,4,7,6,1,0,3,2};
+ int i;
+
+ for (i = 0; i < 16; ++i)
+ perm[i] = GEN_INT (reorder[i]);
+
+ pcv = force_reg (V16QImode,
+ gen_rtx_CONST_VECTOR (V16QImode,
+ gen_rtvec_v (16, perm)));
+ emit_insn (gen_altivec_vperm_v8hi_direct (tmp, operands[1],
+ operands[1], pcv));
+ subreg = simplify_gen_subreg (V4SImode, tmp, V8HImode, 0);
+ emit_insn (gen_vsx_st_elemrev_v4si (subreg, operands[0]));
+ DONE;
+ }
+})
+
+(define_insn "*vsx_st_elemrev_v8hi_internal"
+ [(set (match_operand:V8HI 0 "memory_operand" "=Z")
+ (vec_select:V8HI
+ (match_operand:V8HI 1 "vsx_register_operand" "wa")
+ (parallel [(const_int 7) (const_int 6)
+ (const_int 5) (const_int 4)
+ (const_int 3) (const_int 2)
+ (const_int 1) (const_int 0)])))]
"VECTOR_MEM_VSX_P (V8HImode) && !BYTES_BIG_ENDIAN && TARGET_P9_VECTOR"
"stxvh8x %x1,%y0"
[(set_attr "type" "vecstore")])
-(define_insn "vsx_st_elemrev_v16qi"
+(define_expand "vsx_st_elemrev_v16qi"
[(set (match_operand:V16QI 0 "memory_operand" "=Z")
(vec_select:V16QI
(match_operand:V16QI 1 "vsx_register_operand" "wa")
@@ -1208,6 +1312,42 @@
(const_int 5) (const_int 4)
(const_int 3) (const_int 2)
(const_int 1) (const_int 0)])))]
+ "VECTOR_MEM_VSX_P (V16QImode) && !BYTES_BIG_ENDIAN"
+{
+ if (!TARGET_P9_VECTOR)
+ {
+ rtx subreg, perm[16], pcv;
+ rtx tmp = gen_reg_rtx (V16QImode);
+ /* 3 is leftmost element in register */
+ unsigned int reorder[16] = {12,13,14,15,8,9,10,11,4,5,6,7,0,1,2,3};
+ int i;
+
+ for (i = 0; i < 16; ++i)
+ perm[i] = GEN_INT (reorder[i]);
+
+ pcv = force_reg (V16QImode,
+ gen_rtx_CONST_VECTOR (V16QImode,
+ gen_rtvec_v (16, perm)));
+ emit_insn (gen_altivec_vperm_v16qi_direct (tmp, operands[1],
+ operands[1], pcv));
+ subreg = simplify_gen_subreg (V4SImode, tmp, V16QImode, 0);
+ emit_insn (gen_vsx_st_elemrev_v4si (subreg, operands[0]));
+ DONE;
+ }
+})
+
+(define_insn "*vsx_st_elemrev_v16qi_internal"
+ [(set (match_operand:V16QI 0 "memory_operand" "=Z")
+ (vec_select:V16QI
+ (match_operand:V16QI 1 "vsx_register_operand" "wa")
+ (parallel [(const_int 15) (const_int 14)
+ (const_int 13) (const_int 12)
+ (const_int 11) (const_int 10)
+ (const_int 9) (const_int 8)
+ (const_int 7) (const_int 6)
+ (const_int 5) (const_int 4)
+ (const_int 3) (const_int 2)
+ (const_int 1) (const_int 0)])))]
"VECTOR_MEM_VSX_P (V16QImode) && !BYTES_BIG_ENDIAN && TARGET_P9_VECTOR"
"stxvb16x %x1,%y0"
[(set_attr "type" "vecstore")])
Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-26.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/swaps-p8-26.c (revision 254629)
+++ gcc/testsuite/gcc.target/powerpc/swaps-p8-26.c (working copy)
@@ -1,11 +1,11 @@
/* { dg-do compile { target { powerpc64le-*-* } } } */
/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
/* { dg-options "-mcpu=power8 -O3 " } */
-/* { dg-final { scan-assembler-times "lxvw4x" 2 } } */
-/* { dg-final { scan-assembler "stxvw4x" } } */
+/* { dg-final { scan-assembler-times "lxvd2x" 2 } } */
+/* { dg-final { scan-assembler "stxvd2x" } } */
/* { dg-final { scan-assembler-not "xxpermdi" } } */
-/* Verify that swap optimization does not interfere with element-reversing
+/* Verify that swap optimization does not interfere with unaligned
loads and stores. */
/* Test case to resolve PR79044. */