[4.8, PATCH 7/26] Backport Power8 and LE support: Vector LE
Richard Biener
rguenther@suse.de
Mon Mar 24 10:17:00 GMT 2014
On Wed, 19 Mar 2014, Bill Schmidt wrote:
> Hi,
>
> This patch (diff-le-vector) backports the changes to support vector
> infrastructure on powerpc64le. Copying Richard and Jakub for the libcpp
> bits.
The libcpp bits are fine.
Thanks,
Richard.
> Thanks,
> Bill
>
>
> [gcc]
>
> 2014-03-29 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> Backport from mainline r205333
> 2013-11-24 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/rs6000.c (rs6000_expand_vec_perm_const_1): Correct
> for little endian.
>
> Backport from mainline r205241
> 2013-11-21 Bill Schmidt <wschmidt@vnet.ibm.com>
>
> * config/rs6000/vector.md (vec_pack_trunc_v2df): Revert previous
> little endian change.
> (vec_pack_sfix_trunc_v2df): Likewise.
> (vec_pack_ufix_trunc_v2df): Likewise.
> * config/rs6000/rs6000.c (rs6000_expand_interleave): Correct
> double checking of endianness.
>
> Backport from mainline r205146
> 2013-11-20 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/vsx.md (vsx_set_<mode>): Adjust for little endian.
> (vsx_extract_<mode>): Likewise.
> (*vsx_extract_<mode>_one_le): New LE variant on
> *vsx_extract_<mode>_zero.
> (vsx_extract_v4sf): Adjust for little endian.
>
> Backport from mainline r205080
> 2013-11-19 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Adjust
> V16QI vector splat case for little endian.
>
> Backport from mainline r205045:
>
> 2013-11-19 Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
>
> * config/rs6000/vector.md ("mov<mode>"): Do not call
> rs6000_emit_le_vsx_move to move into or out of GPRs.
> * config/rs6000/rs6000.c (rs6000_emit_le_vsx_move): Assert
> source and destination are not GPR hard regs.
>
> Backport from mainline r204920
> 2011-11-17 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/rs6000.c (rs6000_frame_related): Add split_reg
> parameter and use it in REG_FRAME_RELATED_EXPR note.
> (emit_frame_save): Call rs6000_frame_related with extra NULL_RTX
> parameter.
> (rs6000_emit_prologue): Likewise, but for little endian VSX
> stores, pass the source register of the store instead.
>
> Backport from mainline r204862
> 2013-11-15 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/altivec.md (UNSPEC_VPERM_X, UNSPEC_VPERM_UNS_X):
> Remove.
> (altivec_vperm_<mode>): Revert earlier little endian change.
> (*altivec_vperm_<mode>_internal): Remove.
> (altivec_vperm_<mode>_uns): Revert earlier little endian change.
> (*altivec_vperm_<mode>_uns_internal): Remove.
> * config/rs6000/vector.md (vec_realign_load_<mode>): Revise
> commentary.
>
> Backport from mainline r204441
> 2013-11-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/rs6000.c (rs6000_option_override_internal):
> Remove restriction against use of VSX instructions when generating
> code for little endian mode.
>
> Backport from mainline r204440
> 2013-11-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/altivec.md (mulv4si3): Ensure we generate vmulouh
> for both big and little endian.
> (mulv8hi3): Swap input operands for merge high and merge low
> instructions for little endian.
>
> Backport from mainline r204439
> 2013-11-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/altivec.md (vec_widen_umult_even_v16qi): Change
> define_insn to define_expand that uses even patterns for big
> endian and odd patterns for little endian.
> (vec_widen_smult_even_v16qi): Likewise.
> (vec_widen_umult_even_v8hi): Likewise.
> (vec_widen_smult_even_v8hi): Likewise.
> (vec_widen_umult_odd_v16qi): Likewise.
> (vec_widen_smult_odd_v16qi): Likewise.
> (vec_widen_umult_odd_v8hi): Likewise.
> (vec_widen_smult_odd_v8hi): Likewise.
> (altivec_vmuleub): New define_insn.
> (altivec_vmuloub): Likewise.
> (altivec_vmulesb): Likewise.
> (altivec_vmulosb): Likewise.
> (altivec_vmuleuh): Likewise.
> (altivec_vmulouh): Likewise.
> (altivec_vmulesh): Likewise.
> (altivec_vmulosh): Likewise.
>
> Backport from mainline r204395
> 2013-11-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/vector.md (vec_pack_sfix_trunc_v2df): Adjust for
> little endian.
> (vec_pack_ufix_trunc_v2df): Likewise.
>
> Backport from mainline r204363
> 2013-11-04 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/altivec.md (vec_widen_umult_hi_v16qi): Swap
> arguments to merge instruction for little endian.
> (vec_widen_umult_lo_v16qi): Likewise.
> (vec_widen_smult_hi_v16qi): Likewise.
> (vec_widen_smult_lo_v16qi): Likewise.
> (vec_widen_umult_hi_v8hi): Likewise.
> (vec_widen_umult_lo_v8hi): Likewise.
> (vec_widen_smult_hi_v8hi): Likewise.
> (vec_widen_smult_lo_v8hi): Likewise.
>
> Backport from mainline r204350
> 2013-11-04 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/vsx.md (*vsx_le_perm_store_<mode> for VSX_D):
> Replace the define_insn_and_split with a define_insn and two
> define_splits, with the split after reload re-permuting the source
> register to its original value.
> (*vsx_le_perm_store_<mode> for VSX_W): Likewise.
> (*vsx_le_perm_store_v8hi): Likewise.
> (*vsx_le_perm_store_v16qi): Likewise.
>
> Backport from mainline r204321
> 2013-11-04 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/vector.md (vec_pack_trunc_v2df): Adjust for
> little endian.
>
> Backport from mainline r204321
> 2013-11-02 Bill Schmidt <wschmidt@vnet.linux.ibm.com>
>
> * config/rs6000/rs6000.c (rs6000_expand_vector_set): Adjust for
> little endian.
>
> Backport from mainline r203980
> 2013-10-23 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/altivec.md (mulv8hi3): Adjust for little endian.
>
> Backport from mainline r203930
> 2013-10-22 Bill Schmidt <wschmidt@vnet.ibm.com>
>
> * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse
> meaning of merge-high and merge-low masks for little endian; avoid
> use of vector-pack masks for little endian for mismatched modes.
>
> Backport from mainline r203877
> 2013-10-20 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Adjust for
> little endian.
> (vec_unpacku_hi_v8hi): Likewise.
> (vec_unpacku_lo_v16qi): Likewise.
> (vec_unpacku_lo_v8hi): Likewise.
>
> Backport from mainline r203863
> 2013-10-19 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/rs6000.c (vspltis_constant): Make sure we check
> all elements for both endian flavors.
>
> Backport from mainline r203714
> 2013-10-16 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * gcc/config/rs6000/vector.md (vec_unpacks_hi_v4sf): Correct for
> endianness.
> (vec_unpacks_lo_v4sf): Likewise.
> (vec_unpacks_float_hi_v4si): Likewise.
> (vec_unpacks_float_lo_v4si): Likewise.
> (vec_unpacku_float_hi_v4si): Likewise.
> (vec_unpacku_float_lo_v4si): Likewise.
>
> Backport from mainline r203713
> 2013-10-16 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/vsx.md (vsx_concat_<mode>): Adjust output for LE.
> (vsx_concat_v2sf): Likewise.
>
> Backport from mainline r203458
> 2013-10-11 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/vsx.md (*vsx_le_perm_load_v2di): Generalize to
> handle vector float as well.
> (*vsx_le_perm_load_v4si): Likewise.
> (*vsx_le_perm_store_v2di): Likewise.
> (*vsx_le_perm_store_v4si): Likewise.
>
> Backport from mainline r203457
> 2013-10-11 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/vector.md (vec_realign_load<mode>): Generate vperm
> directly to circumvent subtract from splat{31} workaround.
> * config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_le): New
> prototype.
> * config/rs6000/rs6000.c (altivec_expand_vec_perm_le): New.
> * config/rs6000/altivec.md (define_c_enum "unspec"): Add
> UNSPEC_VPERM_X and UNSPEC_VPERM_UNS_X.
> (altivec_vperm_<mode>): Convert to define_insn_and_split to
> separate big and little endian logic.
> (*altivec_vperm_<mode>_internal): New define_insn.
> (altivec_vperm_<mode>_uns): Convert to define_insn_and_split to
> separate big and little endian logic.
> (*altivec_vperm_<mode>_uns_internal): New define_insn.
> (vec_permv16qi): Add little endian logic.
>
> Backport from mainline r203247
> 2013-10-07 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/rs6000.c (altivec_expand_vec_perm_const_le): New.
> (altivec_expand_vec_perm_const): Call it.
>
> Backport from mainline r203246
> 2013-10-07 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * config/rs6000/vector.md (mov<mode>): Emit permuted move
> sequences for LE VSX loads and stores at expand time.
> * config/rs6000/rs6000-protos.h (rs6000_emit_le_vsx_move): New
> prototype.
> * config/rs6000/rs6000.c (rs6000_const_vec): New.
> (rs6000_gen_le_vsx_permute): New.
> (rs6000_gen_le_vsx_load): New.
> (rs6000_gen_le_vsx_store): New.
> (rs6000_gen_le_vsx_move): New.
> * config/rs6000/vsx.md (*vsx_le_perm_load_v2di): New.
> (*vsx_le_perm_load_v4si): New.
> (*vsx_le_perm_load_v8hi): New.
> (*vsx_le_perm_load_v16qi): New.
> (*vsx_le_perm_store_v2di): New.
> (*vsx_le_perm_store_v4si): New.
> (*vsx_le_perm_store_v8hi): New.
> (*vsx_le_perm_store_v16qi): New.
> (*vsx_xxpermdi2_le_<mode>): New.
> (*vsx_xxpermdi4_le_<mode>): New.
> (*vsx_xxpermdi8_le_V8HI): New.
> (*vsx_xxpermdi16_le_V16QI): New.
> (*vsx_lxvd2x2_le_<mode>): New.
> (*vsx_lxvd2x4_le_<mode>): New.
> (*vsx_lxvd2x8_le_V8HI): New.
> (*vsx_lxvd2x16_le_V16QI): New.
> (*vsx_stxvd2x2_le_<mode>): New.
> (*vsx_stxvd2x4_le_<mode>): New.
> (*vsx_stxvd2x8_le_V8HI): New.
> (*vsx_stxvd2x16_le_V16QI): New.
>
> Backport from mainline r201235
> 2013-07-24 Bill Schmidt <wschmidt@linux.ibm.com>
> Anton Blanchard <anton@au1.ibm.com>
>
> * config/rs6000/altivec.md (altivec_vpkpx): Handle little endian.
> (altivec_vpks<VI_char>ss): Likewise.
> (altivec_vpks<VI_char>us): Likewise.
> (altivec_vpku<VI_char>us): Likewise.
> (altivec_vpku<VI_char>um): Likewise.
>
> Backport from mainline r201208
> 2013-07-24 Bill Schmidt <wschmidt@vnet.linux.ibm.com>
> Anton Blanchard <anton@au1.ibm.com>
>
> * config/rs6000/vector.md (vec_realign_load_<mode>): Reorder input
> operands to vperm for little endian.
> * config/rs6000/rs6000.c (rs6000_expand_builtin): Use lvsr instead
> of lvsl to create the control mask for a vperm for little endian.
>
> Backport from mainline r201195
> 2013-07-23 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
> Anton Blanchard <anton@au1.ibm.com>
>
> * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse
> two operands for little-endian.
>
> Backport from mainline r201193
> 2013-07-23 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
> Anton Blanchard <anton@au1.ibm.com>
>
> * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Correct
> selection of field for vector splat in little endian mode.
>
> Backport from mainline r201149
> 2013-07-22 Bill Schmidt <wschmidt@vnet.linux.ibm.com>
> Anton Blanchard <anton@au1.ibm.com>
>
> * config/rs6000/rs6000.c (rs6000_expand_vector_init): Fix
> endianness when selecting field to splat.
>
> [gcc/testsuite]
>
> 2014-03-29 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> Backport from mainline r205638
> 2013-12-03 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c: Skip for little
> endian.
>
> Backport from mainline r205146
> 2013-11-20 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * gcc.target/powerpc/pr48258-1.c: Skip for little endian.
>
> Backport from mainline r204862
> 2013-11-15 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * gcc.dg/vmx/3b-15.c: Revise for little endian.
>
> Backport from mainline r204321
> 2013-11-02 Bill Schmidt <wschmidt@vnet.linux.ibm.com>
>
> * gcc.dg/vmx/vec-set.c: New.
>
> Backport from mainline r204138
> 2013-10-28 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * gcc.dg/vmx/gcc-bug-i.c: Add little endian variant.
> * gcc.dg/vmx/eg-5.c: Likewise.
>
> Backport from mainline r203930
> 2013-10-22 Bill Schmidt <wschmidt@vnet.ibm.com>
>
> * gcc.target/powerpc/altivec-perm-1.c: Move the two vector pack
> tests into...
> * gcc.target/powerpc/altivec-perm-3.c: ...this new test, which is
> restricted to big-endian targets.
>
> Backport from mainline r203246
> 2013-10-07 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * gcc.target/powerpc/pr43154.c: Skip for ppc64 little endian.
> * gcc.target/powerpc/fusion.c: Likewise.
>
> [libcpp]
>
> 2014-03-29 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> Backport from mainline
> 2013-11-18 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> * lex.c (search_line_fast): Correct for little endian.
>
>
> Index: gcc-4_8-test/gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.c
> +++ gcc-4_8-test/gcc/config/rs6000/rs6000.c
> @@ -3216,11 +3216,6 @@ rs6000_option_override_internal (bool gl
> }
> else if (TARGET_PAIRED_FLOAT)
> msg = N_("-mvsx and -mpaired are incompatible");
> - /* The hardware will allow VSX and little endian, but until we make sure
> - things like vector select, etc. work don't allow VSX on little endian
> - systems at this point. */
> - else if (!BYTES_BIG_ENDIAN)
> - msg = N_("-mvsx used with little endian code");
> else if (TARGET_AVOID_XFORM > 0)
> msg = N_("-mvsx needs indexed addressing");
> else if (!TARGET_ALTIVEC && (rs6000_isa_flags_explicit
> @@ -4991,15 +4986,16 @@ vspltis_constant (rtx op, unsigned step,
>
> /* Check if VAL is present in every STEP-th element, and the
> other elements are filled with its most significant bit. */
> - for (i = 0; i < nunits - 1; ++i)
> + for (i = 1; i < nunits; ++i)
> {
> HOST_WIDE_INT desired_val;
> - if (((BYTES_BIG_ENDIAN ? i + 1 : i) & (step - 1)) == 0)
> + unsigned elt = BYTES_BIG_ENDIAN ? nunits - 1 - i : i;
> + if ((i & (step - 1)) == 0)
> desired_val = val;
> else
> desired_val = msb_val;
>
> - if (desired_val != const_vector_elt_as_int (op, i))
> + if (desired_val != const_vector_elt_as_int (op, elt))
> return false;
> }
>
> @@ -5446,6 +5442,7 @@ rs6000_expand_vector_init (rtx target, r
> of 64-bit items is not supported on Altivec. */
> if (all_same && GET_MODE_SIZE (inner_mode) <= 4)
> {
> + rtx field;
> mem = assign_stack_temp (mode, GET_MODE_SIZE (inner_mode));
> emit_move_insn (adjust_address_nv (mem, inner_mode, 0),
> XVECEXP (vals, 0, 0));
> @@ -5456,9 +5453,11 @@ rs6000_expand_vector_init (rtx target, r
> gen_rtx_SET (VOIDmode,
> target, mem),
> x)));
> + field = (BYTES_BIG_ENDIAN ? const0_rtx
> + : GEN_INT (GET_MODE_NUNITS (mode) - 1));
> x = gen_rtx_VEC_SELECT (inner_mode, target,
> gen_rtx_PARALLEL (VOIDmode,
> - gen_rtvec (1, const0_rtx)));
> + gen_rtvec (1, field)));
> emit_insn (gen_rtx_SET (VOIDmode, target,
> gen_rtx_VEC_DUPLICATE (mode, x)));
> return;
> @@ -5531,10 +5530,27 @@ rs6000_expand_vector_set (rtx target, rt
> XVECEXP (mask, 0, elt*width + i)
> = GEN_INT (i + 0x10);
> x = gen_rtx_CONST_VECTOR (V16QImode, XVEC (mask, 0));
> - x = gen_rtx_UNSPEC (mode,
> - gen_rtvec (3, target, reg,
> - force_reg (V16QImode, x)),
> - UNSPEC_VPERM);
> +
> + if (BYTES_BIG_ENDIAN)
> + x = gen_rtx_UNSPEC (mode,
> + gen_rtvec (3, target, reg,
> + force_reg (V16QImode, x)),
> + UNSPEC_VPERM);
> + else
> + {
> + /* Invert selector. */
> + rtx splat = gen_rtx_VEC_DUPLICATE (V16QImode,
> + gen_rtx_CONST_INT (QImode, -1));
> + rtx tmp = gen_reg_rtx (V16QImode);
> + emit_move_insn (tmp, splat);
> + x = gen_rtx_MINUS (V16QImode, tmp, force_reg (V16QImode, x));
> + emit_move_insn (tmp, x);
> +
> + /* Permute with operands reversed and adjusted selector. */
> + x = gen_rtx_UNSPEC (mode, gen_rtvec (3, reg, target, tmp),
> + UNSPEC_VPERM);
> + }
> +
> emit_insn (gen_rtx_SET (VOIDmode, target, x));
> }
>
> @@ -7830,6 +7846,107 @@ rs6000_eliminate_indexed_memrefs (rtx op
> copy_addr_to_reg (XEXP (operands[1], 0)));
> }
>
> +/* Generate a vector of constants to permute MODE for a little-endian
> + storage operation by swapping the two halves of a vector. */
> +static rtvec
> +rs6000_const_vec (enum machine_mode mode)
> +{
> + int i, subparts;
> + rtvec v;
> +
> + switch (mode)
> + {
> + case V2DFmode:
> + case V2DImode:
> + subparts = 2;
> + break;
> + case V4SFmode:
> + case V4SImode:
> + subparts = 4;
> + break;
> + case V8HImode:
> + subparts = 8;
> + break;
> + case V16QImode:
> + subparts = 16;
> + break;
> + default:
> + gcc_unreachable();
> + }
> +
> + v = rtvec_alloc (subparts);
> +
> + for (i = 0; i < subparts / 2; ++i)
> + RTVEC_ELT (v, i) = gen_rtx_CONST_INT (DImode, i + subparts / 2);
> + for (i = subparts / 2; i < subparts; ++i)
> + RTVEC_ELT (v, i) = gen_rtx_CONST_INT (DImode, i - subparts / 2);
> +
> + return v;
> +}
> +
> +/* Generate a permute rtx that represents an lxvd2x, stxvd2x, or xxpermdi
> + for a VSX load or store operation. */
> +rtx
> +rs6000_gen_le_vsx_permute (rtx source, enum machine_mode mode)
> +{
> + rtx par = gen_rtx_PARALLEL (VOIDmode, rs6000_const_vec (mode));
> + return gen_rtx_VEC_SELECT (mode, source, par);
> +}
> +
> +/* Emit a little-endian load from vector memory location SOURCE to VSX
> + register DEST in mode MODE. The load is done with two permuting
> + insn's that represent an lxvd2x and xxpermdi. */
> +void
> +rs6000_emit_le_vsx_load (rtx dest, rtx source, enum machine_mode mode)
> +{
> + rtx tmp = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (dest) : dest;
> + rtx permute_mem = rs6000_gen_le_vsx_permute (source, mode);
> + rtx permute_reg = rs6000_gen_le_vsx_permute (tmp, mode);
> + emit_insn (gen_rtx_SET (VOIDmode, tmp, permute_mem));
> + emit_insn (gen_rtx_SET (VOIDmode, dest, permute_reg));
> +}
> +
> +/* Emit a little-endian store to vector memory location DEST from VSX
> + register SOURCE in mode MODE. The store is done with two permuting
> + insn's that represent an xxpermdi and an stxvd2x. */
> +void
> +rs6000_emit_le_vsx_store (rtx dest, rtx source, enum machine_mode mode)
> +{
> + rtx tmp = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (source) : source;
> + rtx permute_src = rs6000_gen_le_vsx_permute (source, mode);
> + rtx permute_tmp = rs6000_gen_le_vsx_permute (tmp, mode);
> + emit_insn (gen_rtx_SET (VOIDmode, tmp, permute_src));
> + emit_insn (gen_rtx_SET (VOIDmode, dest, permute_tmp));
> +}
> +
> +/* Emit a sequence representing a little-endian VSX load or store,
> + moving data from SOURCE to DEST in mode MODE. This is done
> + separately from rs6000_emit_move to ensure it is called only
> + during expand. LE VSX loads and stores introduced later are
> + handled with a split. The expand-time RTL generation allows
> + us to optimize away redundant pairs of register-permutes. */
> +void
> +rs6000_emit_le_vsx_move (rtx dest, rtx source, enum machine_mode mode)
> +{
> + gcc_assert (!BYTES_BIG_ENDIAN
> + && VECTOR_MEM_VSX_P (mode)
> + && mode != TImode
> + && !gpr_or_gpr_p (dest, source)
> + && (MEM_P (source) ^ MEM_P (dest)));
> +
> + if (MEM_P (source))
> + {
> + gcc_assert (REG_P (dest));
> + rs6000_emit_le_vsx_load (dest, source, mode);
> + }
> + else
> + {
> + if (!REG_P (source))
> + source = force_reg (mode, source);
> + rs6000_emit_le_vsx_store (dest, source, mode);
> + }
> +}
> +
> /* Emit a move from SOURCE to DEST in mode MODE. */
> void
> rs6000_emit_move (rtx dest, rtx source, enum machine_mode mode)
> @@ -12589,7 +12706,8 @@ rs6000_expand_builtin (tree exp, rtx tar
> case ALTIVEC_BUILTIN_MASK_FOR_LOAD:
> case ALTIVEC_BUILTIN_MASK_FOR_STORE:
> {
> - int icode = (int) CODE_FOR_altivec_lvsr;
> + int icode = (BYTES_BIG_ENDIAN ? (int) CODE_FOR_altivec_lvsr
> + : (int) CODE_FOR_altivec_lvsl);
> enum machine_mode tmode = insn_data[icode].operand[0].mode;
> enum machine_mode mode = insn_data[icode].operand[1].mode;
> tree arg;
> @@ -20880,7 +20998,7 @@ output_probe_stack_range (rtx reg1, rtx
>
> static rtx
> rs6000_frame_related (rtx insn, rtx reg, HOST_WIDE_INT val,
> - rtx reg2, rtx rreg)
> + rtx reg2, rtx rreg, rtx split_reg)
> {
> rtx real, temp;
>
> @@ -20971,6 +21089,11 @@ rs6000_frame_related (rtx insn, rtx reg,
> }
> }
>
> + /* If a store insn has been split into multiple insns, the
> + true source register is given by split_reg. */
> + if (split_reg != NULL_RTX)
> + real = gen_rtx_SET (VOIDmode, SET_DEST (real), split_reg);
> +
> RTX_FRAME_RELATED_P (insn) = 1;
> add_reg_note (insn, REG_FRAME_RELATED_EXPR, real);
>
> @@ -21078,7 +21201,7 @@ emit_frame_save (rtx frame_reg, enum mac
> reg = gen_rtx_REG (mode, regno);
> insn = emit_insn (gen_frame_store (reg, frame_reg, offset));
> return rs6000_frame_related (insn, frame_reg, frame_reg_to_sp,
> - NULL_RTX, NULL_RTX);
> + NULL_RTX, NULL_RTX, NULL_RTX);
> }
>
> /* Emit an offset memory reference suitable for a frame store, while
> @@ -21599,7 +21722,7 @@ rs6000_emit_prologue (void)
>
> insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, p));
> rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
> - treg, GEN_INT (-info->total_size));
> + treg, GEN_INT (-info->total_size), NULL_RTX);
> sp_off = frame_off = info->total_size;
> }
>
> @@ -21684,7 +21807,7 @@ rs6000_emit_prologue (void)
>
> insn = emit_move_insn (mem, reg);
> rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
> - NULL_RTX, NULL_RTX);
> + NULL_RTX, NULL_RTX, NULL_RTX);
> END_USE (0);
> }
> }
> @@ -21752,7 +21875,7 @@ rs6000_emit_prologue (void)
> info->lr_save_offset,
> DFmode, sel);
> rs6000_frame_related (insn, ptr_reg, sp_off,
> - NULL_RTX, NULL_RTX);
> + NULL_RTX, NULL_RTX, NULL_RTX);
> if (lr)
> END_USE (0);
> }
> @@ -21831,7 +21954,7 @@ rs6000_emit_prologue (void)
> SAVRES_SAVE | SAVRES_GPR);
>
> rs6000_frame_related (insn, spe_save_area_ptr, sp_off - save_off,
> - NULL_RTX, NULL_RTX);
> + NULL_RTX, NULL_RTX, NULL_RTX);
> }
>
> /* Move the static chain pointer back. */
> @@ -21881,7 +22004,7 @@ rs6000_emit_prologue (void)
> info->lr_save_offset + ptr_off,
> reg_mode, sel);
> rs6000_frame_related (insn, ptr_reg, sp_off - ptr_off,
> - NULL_RTX, NULL_RTX);
> + NULL_RTX, NULL_RTX, NULL_RTX);
> if (lr)
> END_USE (0);
> }
> @@ -21897,7 +22020,7 @@ rs6000_emit_prologue (void)
> info->gp_save_offset + frame_off + reg_size * i);
> insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, p));
> rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
> - NULL_RTX, NULL_RTX);
> + NULL_RTX, NULL_RTX, NULL_RTX);
> }
> else if (!WORLD_SAVE_P (info))
> {
> @@ -22124,7 +22247,7 @@ rs6000_emit_prologue (void)
> info->altivec_save_offset + ptr_off,
> 0, V4SImode, SAVRES_SAVE | SAVRES_VR);
> rs6000_frame_related (insn, scratch_reg, sp_off - ptr_off,
> - NULL_RTX, NULL_RTX);
> + NULL_RTX, NULL_RTX, NULL_RTX);
> if (REGNO (frame_reg_rtx) == REGNO (scratch_reg))
> {
> /* The oddity mentioned above clobbered our frame reg. */
> @@ -22140,7 +22263,7 @@ rs6000_emit_prologue (void)
> for (i = info->first_altivec_reg_save; i <= LAST_ALTIVEC_REGNO; ++i)
> if (info->vrsave_mask & ALTIVEC_REG_BIT (i))
> {
> - rtx areg, savereg, mem;
> + rtx areg, savereg, mem, split_reg;
> int offset;
>
> offset = (info->altivec_save_offset + frame_off
> @@ -22158,8 +22281,18 @@ rs6000_emit_prologue (void)
>
> insn = emit_move_insn (mem, savereg);
>
> + /* When we split a VSX store into two insns, we need to make
> + sure the DWARF info knows which register we are storing.
> + Pass it in to be used on the appropriate note. */
> + if (!BYTES_BIG_ENDIAN
> + && GET_CODE (PATTERN (insn)) == SET
> + && GET_CODE (SET_SRC (PATTERN (insn))) == VEC_SELECT)
> + split_reg = savereg;
> + else
> + split_reg = NULL_RTX;
> +
> rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
> - areg, GEN_INT (offset));
> + areg, GEN_INT (offset), split_reg);
> }
> }
>
> @@ -28813,6 +28946,136 @@ rs6000_emit_parity (rtx dst, rtx src)
> }
> }
>
> +/* Expand an Altivec constant permutation for little endian mode.
> + There are two issues: First, the two input operands must be
> + swapped so that together they form a double-wide array in LE
> + order. Second, the vperm instruction has surprising behavior
> + in LE mode: it interprets the elements of the source vectors
> + in BE mode ("left to right") and interprets the elements of
> + the destination vector in LE mode ("right to left"). To
> + correct for this, we must subtract each element of the permute
> + control vector from 31.
> +
> + For example, suppose we want to concatenate vr10 = {0, 1, 2, 3}
> + with vr11 = {4, 5, 6, 7} and extract {0, 2, 4, 6} using a vperm.
> + We place {0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27} in vr12 to
> + serve as the permute control vector. Then, in BE mode,
> +
> + vperm 9,10,11,12
> +
> + places the desired result in vr9. However, in LE mode the
> + vector contents will be
> +
> + vr10 = 00000003 00000002 00000001 00000000
> + vr11 = 00000007 00000006 00000005 00000004
> +
> + The result of the vperm using the same permute control vector is
> +
> + vr9 = 05000000 07000000 01000000 03000000
> +
> + That is, the leftmost 4 bytes of vr10 are interpreted as the
> + source for the rightmost 4 bytes of vr9, and so on.
> +
> + If we change the permute control vector to
> +
> + vr12 = {31,20,29,28,23,22,21,20,15,14,13,12,7,6,5,4}
> +
> + and issue
> +
> + vperm 9,11,10,12
> +
> + we get the desired
> +
> + vr9 = 00000006 00000004 00000002 00000000. */
> +
> +void
> +altivec_expand_vec_perm_const_le (rtx operands[4])
> +{
> + unsigned int i;
> + rtx perm[16];
> + rtx constv, unspec;
> + rtx target = operands[0];
> + rtx op0 = operands[1];
> + rtx op1 = operands[2];
> + rtx sel = operands[3];
> +
> + /* Unpack and adjust the constant selector. */
> + for (i = 0; i < 16; ++i)
> + {
> + rtx e = XVECEXP (sel, 0, i);
> + unsigned int elt = 31 - (INTVAL (e) & 31);
> + perm[i] = GEN_INT (elt);
> + }
> +
> + /* Expand to a permute, swapping the inputs and using the
> + adjusted selector. */
> + if (!REG_P (op0))
> + op0 = force_reg (V16QImode, op0);
> + if (!REG_P (op1))
> + op1 = force_reg (V16QImode, op1);
> +
> + constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
> + constv = force_reg (V16QImode, constv);
> + unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, constv),
> + UNSPEC_VPERM);
> + if (!REG_P (target))
> + {
> + rtx tmp = gen_reg_rtx (V16QImode);
> + emit_move_insn (tmp, unspec);
> + unspec = tmp;
> + }
> +
> + emit_move_insn (target, unspec);
> +}
> +
> +/* Similarly to altivec_expand_vec_perm_const_le, we must adjust the
> + permute control vector. But here it's not a constant, so we must
> + generate a vector splat/subtract to do the adjustment. */
> +
> +void
> +altivec_expand_vec_perm_le (rtx operands[4])
> +{
> + rtx splat, unspec;
> + rtx target = operands[0];
> + rtx op0 = operands[1];
> + rtx op1 = operands[2];
> + rtx sel = operands[3];
> + rtx tmp = target;
> +
> + /* Get everything in regs so the pattern matches. */
> + if (!REG_P (op0))
> + op0 = force_reg (V16QImode, op0);
> + if (!REG_P (op1))
> + op1 = force_reg (V16QImode, op1);
> + if (!REG_P (sel))
> + sel = force_reg (V16QImode, sel);
> + if (!REG_P (target))
> + tmp = gen_reg_rtx (V16QImode);
> +
> + /* SEL = splat(31) - SEL. */
> + /* We want to subtract from 31, but we can't vspltisb 31 since
> + it's out of range. -1 works as well because only the low-order
> + five bits of the permute control vector elements are used. */
> + splat = gen_rtx_VEC_DUPLICATE (V16QImode,
> + gen_rtx_CONST_INT (QImode, -1));
> + emit_move_insn (tmp, splat);
> + sel = gen_rtx_MINUS (V16QImode, tmp, sel);
> + emit_move_insn (tmp, sel);
> +
> + /* Permute with operands reversed and adjusted selector. */
> + unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, tmp),
> + UNSPEC_VPERM);
> +
> + /* Copy into target, possibly by way of a register. */
> + if (!REG_P (target))
> + {
> + emit_move_insn (tmp, unspec);
> + unspec = tmp;
> + }
> +
> + emit_move_insn (target, unspec);
> +}
> +
> /* Expand an Altivec constant permutation. Return true if we match
> an efficient implementation; false to fall back to VPERM. */
>
> @@ -28829,17 +29092,23 @@ altivec_expand_vec_perm_const (rtx opera
> { 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 } },
> { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum,
> { 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 } },
> - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb,
> + { OPTION_MASK_ALTIVEC,
> + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb : CODE_FOR_altivec_vmrglb,
> { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } },
> - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh,
> + { OPTION_MASK_ALTIVEC,
> + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh : CODE_FOR_altivec_vmrglh,
> { 0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23 } },
> - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw,
> + { OPTION_MASK_ALTIVEC,
> + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw : CODE_FOR_altivec_vmrglw,
> { 0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23 } },
> - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb,
> + { OPTION_MASK_ALTIVEC,
> + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb : CODE_FOR_altivec_vmrghb,
> { 8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31 } },
> - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh,
> + { OPTION_MASK_ALTIVEC,
> + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh : CODE_FOR_altivec_vmrghh,
> { 8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31 } },
> - { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw,
> + { OPTION_MASK_ALTIVEC,
> + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw : CODE_FOR_altivec_vmrghw,
> { 8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } },
> { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew,
> { 0, 1, 2, 3, 16, 17, 18, 19, 8, 9, 10, 11, 24, 25, 26, 27 } },
> @@ -28901,6 +29170,8 @@ altivec_expand_vec_perm_const (rtx opera
> break;
> if (i == 16)
> {
> + if (!BYTES_BIG_ENDIAN)
> + elt = 15 - elt;
> emit_insn (gen_altivec_vspltb (target, op0, GEN_INT (elt)));
> return true;
> }
> @@ -28912,9 +29183,10 @@ altivec_expand_vec_perm_const (rtx opera
> break;
> if (i == 16)
> {
> + int field = BYTES_BIG_ENDIAN ? elt / 2 : 7 - elt / 2;
> x = gen_reg_rtx (V8HImode);
> emit_insn (gen_altivec_vsplth (x, gen_lowpart (V8HImode, op0),
> - GEN_INT (elt / 2)));
> + GEN_INT (field)));
> emit_move_insn (target, gen_lowpart (V16QImode, x));
> return true;
> }
> @@ -28930,9 +29202,10 @@ altivec_expand_vec_perm_const (rtx opera
> break;
> if (i == 16)
> {
> + int field = BYTES_BIG_ENDIAN ? elt / 4 : 3 - elt / 4;
> x = gen_reg_rtx (V4SImode);
> emit_insn (gen_altivec_vspltw (x, gen_lowpart (V4SImode, op0),
> - GEN_INT (elt / 4)));
> + GEN_INT (field)));
> emit_move_insn (target, gen_lowpart (V16QImode, x));
> return true;
> }
> @@ -28970,7 +29243,30 @@ altivec_expand_vec_perm_const (rtx opera
> enum machine_mode omode = insn_data[icode].operand[0].mode;
> enum machine_mode imode = insn_data[icode].operand[1].mode;
>
> - if (swapped)
> + /* For little-endian, don't use vpkuwum and vpkuhum if the
> + underlying vector type is not V4SI and V8HI, respectively.
> + For example, using vpkuwum with a V8HI picks up the even
> + halfwords (BE numbering) when the even halfwords (LE
> + numbering) are what we need. */
> + if (!BYTES_BIG_ENDIAN
> + && icode == CODE_FOR_altivec_vpkuwum
> + && ((GET_CODE (op0) == REG
> + && GET_MODE (op0) != V4SImode)
> + || (GET_CODE (op0) == SUBREG
> + && GET_MODE (XEXP (op0, 0)) != V4SImode)))
> + continue;
> + if (!BYTES_BIG_ENDIAN
> + && icode == CODE_FOR_altivec_vpkuhum
> + && ((GET_CODE (op0) == REG
> + && GET_MODE (op0) != V8HImode)
> + || (GET_CODE (op0) == SUBREG
> + && GET_MODE (XEXP (op0, 0)) != V8HImode)))
> + continue;
> +
> + /* For little-endian, the two input operands must be swapped
> + (or swapped back) to ensure proper right-to-left numbering
> + from 0 to 2N-1. */
> + if (swapped ^ !BYTES_BIG_ENDIAN)
> x = op0, op0 = op1, op1 = x;
> if (imode != V16QImode)
> {
> @@ -28988,6 +29284,12 @@ altivec_expand_vec_perm_const (rtx opera
> }
> }
>
> + if (!BYTES_BIG_ENDIAN)
> + {
> + altivec_expand_vec_perm_const_le (operands);
> + return true;
> + }
> +
> return false;
> }
>
> @@ -29037,6 +29339,21 @@ rs6000_expand_vec_perm_const_1 (rtx targ
> gcc_assert (GET_MODE_NUNITS (vmode) == 2);
> dmode = mode_for_vector (GET_MODE_INNER (vmode), 4);
>
> + /* For little endian, swap operands and invert/swap selectors
> + to get the correct xxpermdi. The operand swap sets up the
> + inputs as a little endian array. The selectors are swapped
> + because they are defined to use big endian ordering. The
> + selectors are inverted to get the correct doublewords for
> + little endian ordering. */
> + if (!BYTES_BIG_ENDIAN)
> + {
> + int n;
> + perm0 = 3 - perm0;
> + perm1 = 3 - perm1;
> + n = perm0, perm0 = perm1, perm1 = n;
> + x = op0, op0 = op1, op1 = x;
> + }
> +
> x = gen_rtx_VEC_CONCAT (dmode, op0, op1);
> v = gen_rtvec (2, GEN_INT (perm0), GEN_INT (perm1));
> x = gen_rtx_VEC_SELECT (vmode, x, gen_rtx_PARALLEL (VOIDmode, v));
> @@ -29132,7 +29449,7 @@ rs6000_expand_interleave (rtx target, rt
> unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
> rtx perm[16];
>
> - high = (highp == BYTES_BIG_ENDIAN ? 0 : nelt / 2);
> + high = (highp ? 0 : nelt / 2);
> for (i = 0; i < nelt / 2; i++)
> {
> perm[i * 2] = GEN_INT (i + high);
> Index: gcc-4_8-test/gcc/config/rs6000/vector.md
> ===================================================================
> --- gcc-4_8-test.orig/gcc/config/rs6000/vector.md
> +++ gcc-4_8-test/gcc/config/rs6000/vector.md
> @@ -88,7 +88,8 @@
> (smax "smax")])
>
>
> -;; Vector move instructions.
> +;; Vector move instructions. Little-endian VSX loads and stores require
> +;; special handling to circumvent "element endianness."
> (define_expand "mov<mode>"
> [(set (match_operand:VEC_M 0 "nonimmediate_operand" "")
> (match_operand:VEC_M 1 "any_operand" ""))]
> @@ -104,6 +105,16 @@
> && !vlogical_operand (operands[1], <MODE>mode))
> operands[1] = force_reg (<MODE>mode, operands[1]);
> }
> + if (!BYTES_BIG_ENDIAN
> + && VECTOR_MEM_VSX_P (<MODE>mode)
> + && <MODE>mode != TImode
> + && !gpr_or_gpr_p (operands[0], operands[1])
> + && (memory_operand (operands[0], <MODE>mode)
> + ^ memory_operand (operands[1], <MODE>mode)))
> + {
> + rs6000_emit_le_vsx_move (operands[0], operands[1], <MODE>mode);
> + DONE;
> + }
> })
>
> ;; Generic vector floating point load/store instructions. These will match
> @@ -862,7 +873,7 @@
> {
> rtx reg = gen_reg_rtx (V4SFmode);
>
> - rs6000_expand_interleave (reg, operands[1], operands[1], true);
> + rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN);
> emit_insn (gen_vsx_xvcvspdp (operands[0], reg));
> DONE;
> })
> @@ -874,7 +885,7 @@
> {
> rtx reg = gen_reg_rtx (V4SFmode);
>
> - rs6000_expand_interleave (reg, operands[1], operands[1], false);
> + rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN);
> emit_insn (gen_vsx_xvcvspdp (operands[0], reg));
> DONE;
> })
> @@ -886,7 +897,7 @@
> {
> rtx reg = gen_reg_rtx (V4SImode);
>
> - rs6000_expand_interleave (reg, operands[1], operands[1], true);
> + rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN);
> emit_insn (gen_vsx_xvcvsxwdp (operands[0], reg));
> DONE;
> })
> @@ -898,7 +909,7 @@
> {
> rtx reg = gen_reg_rtx (V4SImode);
>
> - rs6000_expand_interleave (reg, operands[1], operands[1], false);
> + rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN);
> emit_insn (gen_vsx_xvcvsxwdp (operands[0], reg));
> DONE;
> })
> @@ -910,7 +921,7 @@
> {
> rtx reg = gen_reg_rtx (V4SImode);
>
> - rs6000_expand_interleave (reg, operands[1], operands[1], true);
> + rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN);
> emit_insn (gen_vsx_xvcvuxwdp (operands[0], reg));
> DONE;
> })
> @@ -922,7 +933,7 @@
> {
> rtx reg = gen_reg_rtx (V4SImode);
>
> - rs6000_expand_interleave (reg, operands[1], operands[1], false);
> + rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN);
> emit_insn (gen_vsx_xvcvuxwdp (operands[0], reg));
> DONE;
> })
> @@ -936,8 +947,19 @@
> (match_operand:V16QI 3 "vlogical_operand" "")]
> "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
> {
> - emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1], operands[2],
> - operands[3]));
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1],
> + operands[2], operands[3]));
> + else
> + {
> + /* We have changed lvsr to lvsl, so to complete the transformation
> + of vperm for LE, we must swap the inputs. */
> + rtx unspec = gen_rtx_UNSPEC (<MODE>mode,
> + gen_rtvec (3, operands[2],
> + operands[1], operands[3]),
> + UNSPEC_VPERM);
> + emit_move_insn (operands[0], unspec);
> + }
> DONE;
> })
>
> Index: gcc-4_8-test/gcc/config/rs6000/altivec.md
> ===================================================================
> --- gcc-4_8-test.orig/gcc/config/rs6000/altivec.md
> +++ gcc-4_8-test/gcc/config/rs6000/altivec.md
> @@ -649,7 +649,7 @@
> convert_move (small_swap, swap, 0);
>
> low_product = gen_reg_rtx (V4SImode);
> - emit_insn (gen_vec_widen_umult_odd_v8hi (low_product, one, two));
> + emit_insn (gen_altivec_vmulouh (low_product, one, two));
>
> high_product = gen_reg_rtx (V4SImode);
> emit_insn (gen_altivec_vmsumuhm (high_product, one, small_swap, zero));
> @@ -676,10 +676,18 @@
> emit_insn (gen_vec_widen_smult_even_v8hi (even, operands[1], operands[2]));
> emit_insn (gen_vec_widen_smult_odd_v8hi (odd, operands[1], operands[2]));
>
> - emit_insn (gen_altivec_vmrghw (high, even, odd));
> - emit_insn (gen_altivec_vmrglw (low, even, odd));
> -
> - emit_insn (gen_altivec_vpkuwum (operands[0], high, low));
> + if (BYTES_BIG_ENDIAN)
> + {
> + emit_insn (gen_altivec_vmrghw (high, even, odd));
> + emit_insn (gen_altivec_vmrglw (low, even, odd));
> + emit_insn (gen_altivec_vpkuwum (operands[0], high, low));
> + }
> + else
> + {
> + emit_insn (gen_altivec_vmrghw (high, odd, even));
> + emit_insn (gen_altivec_vmrglw (low, odd, even));
> + emit_insn (gen_altivec_vpkuwum (operands[0], low, high));
> + }
>
> DONE;
> }")
> @@ -967,7 +975,111 @@
> "vmrgow %0,%1,%2"
> [(set_attr "type" "vecperm")])
>
> -(define_insn "vec_widen_umult_even_v16qi"
> +(define_expand "vec_widen_umult_even_v16qi"
> + [(use (match_operand:V8HI 0 "register_operand" ""))
> + (use (match_operand:V16QI 1 "register_operand" ""))
> + (use (match_operand:V16QI 2 "register_operand" ""))]
> + "TARGET_ALTIVEC"
> +{
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vmuleub (operands[0], operands[1], operands[2]));
> + else
> + emit_insn (gen_altivec_vmuloub (operands[0], operands[1], operands[2]));
> + DONE;
> +})
> +
> +(define_expand "vec_widen_smult_even_v16qi"
> + [(use (match_operand:V8HI 0 "register_operand" ""))
> + (use (match_operand:V16QI 1 "register_operand" ""))
> + (use (match_operand:V16QI 2 "register_operand" ""))]
> + "TARGET_ALTIVEC"
> +{
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vmulesb (operands[0], operands[1], operands[2]));
> + else
> + emit_insn (gen_altivec_vmulosb (operands[0], operands[1], operands[2]));
> + DONE;
> +})
> +
> +(define_expand "vec_widen_umult_even_v8hi"
> + [(use (match_operand:V4SI 0 "register_operand" ""))
> + (use (match_operand:V8HI 1 "register_operand" ""))
> + (use (match_operand:V8HI 2 "register_operand" ""))]
> + "TARGET_ALTIVEC"
> +{
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vmuleuh (operands[0], operands[1], operands[2]));
> + else
> + emit_insn (gen_altivec_vmulouh (operands[0], operands[1], operands[2]));
> + DONE;
> +})
> +
> +(define_expand "vec_widen_smult_even_v8hi"
> + [(use (match_operand:V4SI 0 "register_operand" ""))
> + (use (match_operand:V8HI 1 "register_operand" ""))
> + (use (match_operand:V8HI 2 "register_operand" ""))]
> + "TARGET_ALTIVEC"
> +{
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vmulesh (operands[0], operands[1], operands[2]));
> + else
> + emit_insn (gen_altivec_vmulosh (operands[0], operands[1], operands[2]));
> + DONE;
> +})
> +
> +(define_expand "vec_widen_umult_odd_v16qi"
> + [(use (match_operand:V8HI 0 "register_operand" ""))
> + (use (match_operand:V16QI 1 "register_operand" ""))
> + (use (match_operand:V16QI 2 "register_operand" ""))]
> + "TARGET_ALTIVEC"
> +{
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vmuloub (operands[0], operands[1], operands[2]));
> + else
> + emit_insn (gen_altivec_vmuleub (operands[0], operands[1], operands[2]));
> + DONE;
> +})
> +
> +(define_expand "vec_widen_smult_odd_v16qi"
> + [(use (match_operand:V8HI 0 "register_operand" ""))
> + (use (match_operand:V16QI 1 "register_operand" ""))
> + (use (match_operand:V16QI 2 "register_operand" ""))]
> + "TARGET_ALTIVEC"
> +{
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vmulosb (operands[0], operands[1], operands[2]));
> + else
> + emit_insn (gen_altivec_vmulesb (operands[0], operands[1], operands[2]));
> + DONE;
> +})
> +
> +(define_expand "vec_widen_umult_odd_v8hi"
> + [(use (match_operand:V4SI 0 "register_operand" ""))
> + (use (match_operand:V8HI 1 "register_operand" ""))
> + (use (match_operand:V8HI 2 "register_operand" ""))]
> + "TARGET_ALTIVEC"
> +{
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vmulouh (operands[0], operands[1], operands[2]));
> + else
> + emit_insn (gen_altivec_vmuleuh (operands[0], operands[1], operands[2]));
> + DONE;
> +})
> +
> +(define_expand "vec_widen_smult_odd_v8hi"
> + [(use (match_operand:V4SI 0 "register_operand" ""))
> + (use (match_operand:V8HI 1 "register_operand" ""))
> + (use (match_operand:V8HI 2 "register_operand" ""))]
> + "TARGET_ALTIVEC"
> +{
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vmulosh (operands[0], operands[1], operands[2]));
> + else
> + emit_insn (gen_altivec_vmulesh (operands[0], operands[1], operands[2]));
> + DONE;
> +})
> +
> +(define_insn "altivec_vmuleub"
> [(set (match_operand:V8HI 0 "register_operand" "=v")
> (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
> (match_operand:V16QI 2 "register_operand" "v")]
> @@ -976,43 +1088,25 @@
> "vmuleub %0,%1,%2"
> [(set_attr "type" "veccomplex")])
>
> -(define_insn "vec_widen_smult_even_v16qi"
> +(define_insn "altivec_vmuloub"
> [(set (match_operand:V8HI 0 "register_operand" "=v")
> (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
> (match_operand:V16QI 2 "register_operand" "v")]
> - UNSPEC_VMULESB))]
> - "TARGET_ALTIVEC"
> - "vmulesb %0,%1,%2"
> - [(set_attr "type" "veccomplex")])
> -
> -(define_insn "vec_widen_umult_even_v8hi"
> - [(set (match_operand:V4SI 0 "register_operand" "=v")
> - (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
> - (match_operand:V8HI 2 "register_operand" "v")]
> - UNSPEC_VMULEUH))]
> - "TARGET_ALTIVEC"
> - "vmuleuh %0,%1,%2"
> - [(set_attr "type" "veccomplex")])
> -
> -(define_insn "vec_widen_smult_even_v8hi"
> - [(set (match_operand:V4SI 0 "register_operand" "=v")
> - (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
> - (match_operand:V8HI 2 "register_operand" "v")]
> - UNSPEC_VMULESH))]
> + UNSPEC_VMULOUB))]
> "TARGET_ALTIVEC"
> - "vmulesh %0,%1,%2"
> + "vmuloub %0,%1,%2"
> [(set_attr "type" "veccomplex")])
>
> -(define_insn "vec_widen_umult_odd_v16qi"
> +(define_insn "altivec_vmulesb"
> [(set (match_operand:V8HI 0 "register_operand" "=v")
> (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
> (match_operand:V16QI 2 "register_operand" "v")]
> - UNSPEC_VMULOUB))]
> + UNSPEC_VMULESB))]
> "TARGET_ALTIVEC"
> - "vmuloub %0,%1,%2"
> + "vmulesb %0,%1,%2"
> [(set_attr "type" "veccomplex")])
>
> -(define_insn "vec_widen_smult_odd_v16qi"
> +(define_insn "altivec_vmulosb"
> [(set (match_operand:V8HI 0 "register_operand" "=v")
> (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
> (match_operand:V16QI 2 "register_operand" "v")]
> @@ -1021,7 +1115,16 @@
> "vmulosb %0,%1,%2"
> [(set_attr "type" "veccomplex")])
>
> -(define_insn "vec_widen_umult_odd_v8hi"
> +(define_insn "altivec_vmuleuh"
> + [(set (match_operand:V4SI 0 "register_operand" "=v")
> + (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
> + (match_operand:V8HI 2 "register_operand" "v")]
> + UNSPEC_VMULEUH))]
> + "TARGET_ALTIVEC"
> + "vmuleuh %0,%1,%2"
> + [(set_attr "type" "veccomplex")])
> +
> +(define_insn "altivec_vmulouh"
> [(set (match_operand:V4SI 0 "register_operand" "=v")
> (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
> (match_operand:V8HI 2 "register_operand" "v")]
> @@ -1030,7 +1133,16 @@
> "vmulouh %0,%1,%2"
> [(set_attr "type" "veccomplex")])
>
> -(define_insn "vec_widen_smult_odd_v8hi"
> +(define_insn "altivec_vmulesh"
> + [(set (match_operand:V4SI 0 "register_operand" "=v")
> + (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
> + (match_operand:V8HI 2 "register_operand" "v")]
> + UNSPEC_VMULESH))]
> + "TARGET_ALTIVEC"
> + "vmulesh %0,%1,%2"
> + [(set_attr "type" "veccomplex")])
> +
> +(define_insn "altivec_vmulosh"
> [(set (match_operand:V4SI 0 "register_operand" "=v")
> (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
> (match_operand:V8HI 2 "register_operand" "v")]
> @@ -1047,7 +1159,13 @@
> (match_operand:V4SI 2 "register_operand" "v")]
> UNSPEC_VPKPX))]
> "TARGET_ALTIVEC"
> - "vpkpx %0,%1,%2"
> + "*
> + {
> + if (BYTES_BIG_ENDIAN)
> + return \"vpkpx %0,%1,%2\";
> + else
> + return \"vpkpx %0,%2,%1\";
> + }"
> [(set_attr "type" "vecperm")])
>
> (define_insn "altivec_vpks<VI_char>ss"
> @@ -1056,7 +1174,13 @@
> (match_operand:VP 2 "register_operand" "v")]
> UNSPEC_VPACK_SIGN_SIGN_SAT))]
> "<VI_unit>"
> - "vpks<VI_char>ss %0,%1,%2"
> + "*
> + {
> + if (BYTES_BIG_ENDIAN)
> + return \"vpks<VI_char>ss %0,%1,%2\";
> + else
> + return \"vpks<VI_char>ss %0,%2,%1\";
> + }"
> [(set_attr "type" "vecperm")])
>
> (define_insn "altivec_vpks<VI_char>us"
> @@ -1065,7 +1189,13 @@
> (match_operand:VP 2 "register_operand" "v")]
> UNSPEC_VPACK_SIGN_UNS_SAT))]
> "<VI_unit>"
> - "vpks<VI_char>us %0,%1,%2"
> + "*
> + {
> + if (BYTES_BIG_ENDIAN)
> + return \"vpks<VI_char>us %0,%1,%2\";
> + else
> + return \"vpks<VI_char>us %0,%2,%1\";
> + }"
> [(set_attr "type" "vecperm")])
>
> (define_insn "altivec_vpku<VI_char>us"
> @@ -1074,7 +1204,13 @@
> (match_operand:VP 2 "register_operand" "v")]
> UNSPEC_VPACK_UNS_UNS_SAT))]
> "<VI_unit>"
> - "vpku<VI_char>us %0,%1,%2"
> + "*
> + {
> + if (BYTES_BIG_ENDIAN)
> + return \"vpku<VI_char>us %0,%1,%2\";
> + else
> + return \"vpku<VI_char>us %0,%2,%1\";
> + }"
> [(set_attr "type" "vecperm")])
>
> (define_insn "altivec_vpku<VI_char>um"
> @@ -1083,7 +1219,13 @@
> (match_operand:VP 2 "register_operand" "v")]
> UNSPEC_VPACK_UNS_UNS_MOD))]
> "<VI_unit>"
> - "vpku<VI_char>um %0,%1,%2"
> + "*
> + {
> + if (BYTES_BIG_ENDIAN)
> + return \"vpku<VI_char>um %0,%1,%2\";
> + else
> + return \"vpku<VI_char>um %0,%2,%1\";
> + }"
> [(set_attr "type" "vecperm")])
>
> (define_insn "*altivec_vrl<VI_char>"
> @@ -1276,7 +1418,12 @@
> (match_operand:V16QI 3 "register_operand" "")]
> UNSPEC_VPERM))]
> "TARGET_ALTIVEC"
> - "")
> +{
> + if (!BYTES_BIG_ENDIAN) {
> + altivec_expand_vec_perm_le (operands);
> + DONE;
> + }
> +})
>
> (define_expand "vec_perm_constv16qi"
> [(match_operand:V16QI 0 "register_operand" "")
> @@ -1928,25 +2075,26 @@
> rtx vzero = gen_reg_rtx (V8HImode);
> rtx mask = gen_reg_rtx (V16QImode);
> rtvec v = rtvec_alloc (16);
> + bool be = BYTES_BIG_ENDIAN;
>
> emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
>
> - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 0);
> - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 1);
> - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 2);
> - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 3);
> - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 4);
> - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 5);
> - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 6);
> - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 7);
> + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 7);
> + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 0 : 16);
> + RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 16 : 6);
> + RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 1 : 16);
> + RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 5);
> + RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 2 : 16);
> + RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 16 : 4);
> + RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 3 : 16);
> + RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 3);
> + RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 4 : 16);
> + RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 2);
> + RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 5 : 16);
> + RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 1);
> + RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 6 : 16);
> + RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 : 0);
> + RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 7 : 16);
>
> emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
> @@ -1963,25 +2111,26 @@
> rtx vzero = gen_reg_rtx (V4SImode);
> rtx mask = gen_reg_rtx (V16QImode);
> rtvec v = rtvec_alloc (16);
> + bool be = BYTES_BIG_ENDIAN;
>
> emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
>
> - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 17);
> - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 0);
> - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 1);
> - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 17);
> - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 2);
> - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 3);
> - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 17);
> - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 4);
> - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 5);
> - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 17);
> - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 6);
> - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 7);
> + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 7);
> + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 17 : 6);
> + RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 0 : 17);
> + RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 1 : 16);
> + RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 5);
> + RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 17 : 4);
> + RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 2 : 17);
> + RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 3 : 16);
> + RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 3);
> + RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 17 : 2);
> + RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 4 : 17);
> + RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 5 : 16);
> + RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 1);
> + RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 : 0);
> + RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 6 : 17);
> + RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 7 : 16);
>
> emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
> @@ -1998,25 +2147,26 @@
> rtx vzero = gen_reg_rtx (V8HImode);
> rtx mask = gen_reg_rtx (V16QImode);
> rtvec v = rtvec_alloc (16);
> + bool be = BYTES_BIG_ENDIAN;
>
> emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
>
> - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 8);
> - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 9);
> - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 10);
> - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 11);
> - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 12);
> - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 13);
> - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 14);
> - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 15);
> + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15);
> + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 8 : 16);
> + RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 16 : 14);
> + RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 9 : 16);
> + RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13);
> + RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 10 : 16);
> + RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 16 : 12);
> + RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16);
> + RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11);
> + RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 12 : 16);
> + RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 10);
> + RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16);
> + RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 9);
> + RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 14 : 16);
> + RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 : 8);
> + RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
>
> emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
> @@ -2033,25 +2183,26 @@
> rtx vzero = gen_reg_rtx (V4SImode);
> rtx mask = gen_reg_rtx (V16QImode);
> rtvec v = rtvec_alloc (16);
> + bool be = BYTES_BIG_ENDIAN;
>
> emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
>
> - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 17);
> - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 8);
> - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 9);
> - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 17);
> - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 10);
> - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 11);
> - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 17);
> - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 12);
> - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 13);
> - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
> - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 17);
> - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 14);
> - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 15);
> + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15);
> + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 17 : 14);
> + RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 8 : 17);
> + RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 9 : 16);
> + RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13);
> + RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 17 : 12);
> + RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 10 : 17);
> + RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16);
> + RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11);
> + RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 17 : 10);
> + RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 12 : 17);
> + RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16);
> + RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 9);
> + RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 : 8);
> + RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 14 : 17);
> + RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
>
> emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
> @@ -2071,7 +2222,10 @@
>
> emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2]));
> emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
> + else
> + emit_insn (gen_altivec_vmrghh (operands[0], vo, ve));
> DONE;
> }")
>
> @@ -2088,7 +2242,10 @@
>
> emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2]));
> emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
> + else
> + emit_insn (gen_altivec_vmrglh (operands[0], vo, ve));
> DONE;
> }")
>
> @@ -2105,7 +2262,10 @@
>
> emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2]));
> emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
> + else
> + emit_insn (gen_altivec_vmrghh (operands[0], vo, ve));
> DONE;
> }")
>
> @@ -2122,7 +2282,10 @@
>
> emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2]));
> emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
> + else
> + emit_insn (gen_altivec_vmrglh (operands[0], vo, ve));
> DONE;
> }")
>
> @@ -2139,7 +2302,10 @@
>
> emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2]));
> emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
> + else
> + emit_insn (gen_altivec_vmrghw (operands[0], vo, ve));
> DONE;
> }")
>
> @@ -2156,7 +2322,10 @@
>
> emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2]));
> emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
> + else
> + emit_insn (gen_altivec_vmrglw (operands[0], vo, ve));
> DONE;
> }")
>
> @@ -2173,7 +2342,10 @@
>
> emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2]));
> emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
> + else
> + emit_insn (gen_altivec_vmrghw (operands[0], vo, ve));
> DONE;
> }")
>
> @@ -2190,7 +2362,10 @@
>
> emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2]));
> emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
> + else
> + emit_insn (gen_altivec_vmrglw (operands[0], vo, ve));
> DONE;
> }")
>
> Index: gcc-4_8-test/gcc/config/rs6000/rs6000-protos.h
> ===================================================================
> --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000-protos.h
> +++ gcc-4_8-test/gcc/config/rs6000/rs6000-protos.h
> @@ -56,6 +56,7 @@ extern void paired_expand_vector_init (r
> extern void rs6000_expand_vector_set (rtx, rtx, int);
> extern void rs6000_expand_vector_extract (rtx, rtx, int);
> extern bool altivec_expand_vec_perm_const (rtx op[4]);
> +extern void altivec_expand_vec_perm_le (rtx op[4]);
> extern bool rs6000_expand_vec_perm_const (rtx op[4]);
> extern void rs6000_expand_extract_even (rtx, rtx, rtx);
> extern void rs6000_expand_interleave (rtx, rtx, rtx, bool);
> @@ -122,6 +123,7 @@ extern rtx rs6000_longcall_ref (rtx);
> extern void rs6000_fatal_bad_address (rtx);
> extern rtx create_TOC_reference (rtx, rtx);
> extern void rs6000_split_multireg_move (rtx, rtx);
> +extern void rs6000_emit_le_vsx_move (rtx, rtx, enum machine_mode);
> extern void rs6000_emit_move (rtx, rtx, enum machine_mode);
> extern rtx rs6000_secondary_memory_needed_rtx (enum machine_mode);
> extern rtx (*rs6000_legitimize_reload_address_ptr) (rtx, enum machine_mode,
> Index: gcc-4_8-test/gcc/config/rs6000/vsx.md
> ===================================================================
> --- gcc-4_8-test.orig/gcc/config/rs6000/vsx.md
> +++ gcc-4_8-test/gcc/config/rs6000/vsx.md
> @@ -216,6 +216,359 @@
> ])
>
> ;; VSX moves
> +
> +;; The patterns for LE permuted loads and stores come before the general
> +;; VSX moves so they match first.
> +(define_insn_and_split "*vsx_le_perm_load_<mode>"
> + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
> + (match_operand:VSX_D 1 "memory_operand" "Z"))]
> + "!BYTES_BIG_ENDIAN && TARGET_VSX"
> + "#"
> + "!BYTES_BIG_ENDIAN && TARGET_VSX"
> + [(set (match_dup 2)
> + (vec_select:<MODE>
> + (match_dup 1)
> + (parallel [(const_int 1) (const_int 0)])))
> + (set (match_dup 0)
> + (vec_select:<MODE>
> + (match_dup 2)
> + (parallel [(const_int 1) (const_int 0)])))]
> + "
> +{
> + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
> + : operands[0];
> +}
> + "
> + [(set_attr "type" "vecload")
> + (set_attr "length" "8")])
> +
> +(define_insn_and_split "*vsx_le_perm_load_<mode>"
> + [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
> + (match_operand:VSX_W 1 "memory_operand" "Z"))]
> + "!BYTES_BIG_ENDIAN && TARGET_VSX"
> + "#"
> + "!BYTES_BIG_ENDIAN && TARGET_VSX"
> + [(set (match_dup 2)
> + (vec_select:<MODE>
> + (match_dup 1)
> + (parallel [(const_int 2) (const_int 3)
> + (const_int 0) (const_int 1)])))
> + (set (match_dup 0)
> + (vec_select:<MODE>
> + (match_dup 2)
> + (parallel [(const_int 2) (const_int 3)
> + (const_int 0) (const_int 1)])))]
> + "
> +{
> + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
> + : operands[0];
> +}
> + "
> + [(set_attr "type" "vecload")
> + (set_attr "length" "8")])
> +
> +(define_insn_and_split "*vsx_le_perm_load_v8hi"
> + [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
> + (match_operand:V8HI 1 "memory_operand" "Z"))]
> + "!BYTES_BIG_ENDIAN && TARGET_VSX"
> + "#"
> + "!BYTES_BIG_ENDIAN && TARGET_VSX"
> + [(set (match_dup 2)
> + (vec_select:V8HI
> + (match_dup 1)
> + (parallel [(const_int 4) (const_int 5)
> + (const_int 6) (const_int 7)
> + (const_int 0) (const_int 1)
> + (const_int 2) (const_int 3)])))
> + (set (match_dup 0)
> + (vec_select:V8HI
> + (match_dup 2)
> + (parallel [(const_int 4) (const_int 5)
> + (const_int 6) (const_int 7)
> + (const_int 0) (const_int 1)
> + (const_int 2) (const_int 3)])))]
> + "
> +{
> + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
> + : operands[0];
> +}
> + "
> + [(set_attr "type" "vecload")
> + (set_attr "length" "8")])
> +
> +(define_insn_and_split "*vsx_le_perm_load_v16qi"
> + [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
> + (match_operand:V16QI 1 "memory_operand" "Z"))]
> + "!BYTES_BIG_ENDIAN && TARGET_VSX"
> + "#"
> + "!BYTES_BIG_ENDIAN && TARGET_VSX"
> + [(set (match_dup 2)
> + (vec_select:V16QI
> + (match_dup 1)
> + (parallel [(const_int 8) (const_int 9)
> + (const_int 10) (const_int 11)
> + (const_int 12) (const_int 13)
> + (const_int 14) (const_int 15)
> + (const_int 0) (const_int 1)
> + (const_int 2) (const_int 3)
> + (const_int 4) (const_int 5)
> + (const_int 6) (const_int 7)])))
> + (set (match_dup 0)
> + (vec_select:V16QI
> + (match_dup 2)
> + (parallel [(const_int 8) (const_int 9)
> + (const_int 10) (const_int 11)
> + (const_int 12) (const_int 13)
> + (const_int 14) (const_int 15)
> + (const_int 0) (const_int 1)
> + (const_int 2) (const_int 3)
> + (const_int 4) (const_int 5)
> + (const_int 6) (const_int 7)])))]
> + "
> +{
> + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
> + : operands[0];
> +}
> + "
> + [(set_attr "type" "vecload")
> + (set_attr "length" "8")])
> +
> +(define_insn "*vsx_le_perm_store_<mode>"
> + [(set (match_operand:VSX_D 0 "memory_operand" "=Z")
> + (match_operand:VSX_D 1 "vsx_register_operand" "+wa"))]
> + "!BYTES_BIG_ENDIAN && TARGET_VSX"
> + "#"
> + [(set_attr "type" "vecstore")
> + (set_attr "length" "12")])
> +
> +(define_split
> + [(set (match_operand:VSX_D 0 "memory_operand" "")
> + (match_operand:VSX_D 1 "vsx_register_operand" ""))]
> + "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
> + [(set (match_dup 2)
> + (vec_select:<MODE>
> + (match_dup 1)
> + (parallel [(const_int 1) (const_int 0)])))
> + (set (match_dup 0)
> + (vec_select:<MODE>
> + (match_dup 2)
> + (parallel [(const_int 1) (const_int 0)])))]
> +{
> + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1])
> + : operands[1];
> +})
> +
> +;; The post-reload split requires that we re-permute the source
> +;; register in case it is still live.
> +(define_split
> + [(set (match_operand:VSX_D 0 "memory_operand" "")
> + (match_operand:VSX_D 1 "vsx_register_operand" ""))]
> + "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
> + [(set (match_dup 1)
> + (vec_select:<MODE>
> + (match_dup 1)
> + (parallel [(const_int 1) (const_int 0)])))
> + (set (match_dup 0)
> + (vec_select:<MODE>
> + (match_dup 1)
> + (parallel [(const_int 1) (const_int 0)])))
> + (set (match_dup 1)
> + (vec_select:<MODE>
> + (match_dup 1)
> + (parallel [(const_int 1) (const_int 0)])))]
> + "")
> +
> +(define_insn "*vsx_le_perm_store_<mode>"
> + [(set (match_operand:VSX_W 0 "memory_operand" "=Z")
> + (match_operand:VSX_W 1 "vsx_register_operand" "+wa"))]
> + "!BYTES_BIG_ENDIAN && TARGET_VSX"
> + "#"
> + [(set_attr "type" "vecstore")
> + (set_attr "length" "12")])
> +
> +(define_split
> + [(set (match_operand:VSX_W 0 "memory_operand" "")
> + (match_operand:VSX_W 1 "vsx_register_operand" ""))]
> + "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
> + [(set (match_dup 2)
> + (vec_select:<MODE>
> + (match_dup 1)
> + (parallel [(const_int 2) (const_int 3)
> + (const_int 0) (const_int 1)])))
> + (set (match_dup 0)
> + (vec_select:<MODE>
> + (match_dup 2)
> + (parallel [(const_int 2) (const_int 3)
> + (const_int 0) (const_int 1)])))]
> +{
> + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1])
> + : operands[1];
> +})
> +
> +;; The post-reload split requires that we re-permute the source
> +;; register in case it is still live.
> +(define_split
> + [(set (match_operand:VSX_W 0 "memory_operand" "")
> + (match_operand:VSX_W 1 "vsx_register_operand" ""))]
> + "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
> + [(set (match_dup 1)
> + (vec_select:<MODE>
> + (match_dup 1)
> + (parallel [(const_int 2) (const_int 3)
> + (const_int 0) (const_int 1)])))
> + (set (match_dup 0)
> + (vec_select:<MODE>
> + (match_dup 1)
> + (parallel [(const_int 2) (const_int 3)
> + (const_int 0) (const_int 1)])))
> + (set (match_dup 1)
> + (vec_select:<MODE>
> + (match_dup 1)
> + (parallel [(const_int 2) (const_int 3)
> + (const_int 0) (const_int 1)])))]
> + "")
> +
> +(define_insn "*vsx_le_perm_store_v8hi"
> + [(set (match_operand:V8HI 0 "memory_operand" "=Z")
> + (match_operand:V8HI 1 "vsx_register_operand" "+wa"))]
> + "!BYTES_BIG_ENDIAN && TARGET_VSX"
> + "#"
> + [(set_attr "type" "vecstore")
> + (set_attr "length" "12")])
> +
> +(define_split
> + [(set (match_operand:V8HI 0 "memory_operand" "")
> + (match_operand:V8HI 1 "vsx_register_operand" ""))]
> + "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
> + [(set (match_dup 2)
> + (vec_select:V8HI
> + (match_dup 1)
> + (parallel [(const_int 4) (const_int 5)
> + (const_int 6) (const_int 7)
> + (const_int 0) (const_int 1)
> + (const_int 2) (const_int 3)])))
> + (set (match_dup 0)
> + (vec_select:V8HI
> + (match_dup 2)
> + (parallel [(const_int 4) (const_int 5)
> + (const_int 6) (const_int 7)
> + (const_int 0) (const_int 1)
> + (const_int 2) (const_int 3)])))]
> +{
> + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1])
> + : operands[1];
> +})
> +
> +;; The post-reload split requires that we re-permute the source
> +;; register in case it is still live.
> +(define_split
> + [(set (match_operand:V8HI 0 "memory_operand" "")
> + (match_operand:V8HI 1 "vsx_register_operand" ""))]
> + "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
> + [(set (match_dup 1)
> + (vec_select:V8HI
> + (match_dup 1)
> + (parallel [(const_int 4) (const_int 5)
> + (const_int 6) (const_int 7)
> + (const_int 0) (const_int 1)
> + (const_int 2) (const_int 3)])))
> + (set (match_dup 0)
> + (vec_select:V8HI
> + (match_dup 1)
> + (parallel [(const_int 4) (const_int 5)
> + (const_int 6) (const_int 7)
> + (const_int 0) (const_int 1)
> + (const_int 2) (const_int 3)])))
> + (set (match_dup 1)
> + (vec_select:V8HI
> + (match_dup 1)
> + (parallel [(const_int 4) (const_int 5)
> + (const_int 6) (const_int 7)
> + (const_int 0) (const_int 1)
> + (const_int 2) (const_int 3)])))]
> + "")
> +
> +(define_insn "*vsx_le_perm_store_v16qi"
> + [(set (match_operand:V16QI 0 "memory_operand" "=Z")
> + (match_operand:V16QI 1 "vsx_register_operand" "+wa"))]
> + "!BYTES_BIG_ENDIAN && TARGET_VSX"
> + "#"
> + [(set_attr "type" "vecstore")
> + (set_attr "length" "12")])
> +
> +(define_split
> + [(set (match_operand:V16QI 0 "memory_operand" "")
> + (match_operand:V16QI 1 "vsx_register_operand" ""))]
> + "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
> + [(set (match_dup 2)
> + (vec_select:V16QI
> + (match_dup 1)
> + (parallel [(const_int 8) (const_int 9)
> + (const_int 10) (const_int 11)
> + (const_int 12) (const_int 13)
> + (const_int 14) (const_int 15)
> + (const_int 0) (const_int 1)
> + (const_int 2) (const_int 3)
> + (const_int 4) (const_int 5)
> + (const_int 6) (const_int 7)])))
> + (set (match_dup 0)
> + (vec_select:V16QI
> + (match_dup 2)
> + (parallel [(const_int 8) (const_int 9)
> + (const_int 10) (const_int 11)
> + (const_int 12) (const_int 13)
> + (const_int 14) (const_int 15)
> + (const_int 0) (const_int 1)
> + (const_int 2) (const_int 3)
> + (const_int 4) (const_int 5)
> + (const_int 6) (const_int 7)])))]
> +{
> + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1])
> + : operands[1];
> +})
> +
> +;; The post-reload split requires that we re-permute the source
> +;; register in case it is still live.
> +(define_split
> + [(set (match_operand:V16QI 0 "memory_operand" "")
> + (match_operand:V16QI 1 "vsx_register_operand" ""))]
> + "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
> + [(set (match_dup 1)
> + (vec_select:V16QI
> + (match_dup 1)
> + (parallel [(const_int 8) (const_int 9)
> + (const_int 10) (const_int 11)
> + (const_int 12) (const_int 13)
> + (const_int 14) (const_int 15)
> + (const_int 0) (const_int 1)
> + (const_int 2) (const_int 3)
> + (const_int 4) (const_int 5)
> + (const_int 6) (const_int 7)])))
> + (set (match_dup 0)
> + (vec_select:V16QI
> + (match_dup 1)
> + (parallel [(const_int 8) (const_int 9)
> + (const_int 10) (const_int 11)
> + (const_int 12) (const_int 13)
> + (const_int 14) (const_int 15)
> + (const_int 0) (const_int 1)
> + (const_int 2) (const_int 3)
> + (const_int 4) (const_int 5)
> + (const_int 6) (const_int 7)])))
> + (set (match_dup 1)
> + (vec_select:V16QI
> + (match_dup 1)
> + (parallel [(const_int 8) (const_int 9)
> + (const_int 10) (const_int 11)
> + (const_int 12) (const_int 13)
> + (const_int 14) (const_int 15)
> + (const_int 0) (const_int 1)
> + (const_int 2) (const_int 3)
> + (const_int 4) (const_int 5)
> + (const_int 6) (const_int 7)])))]
> + "")
> +
> +
> (define_insn "*vsx_mov<mode>"
> [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=Z,<VSr>,<VSr>,?Z,?wa,?wa,wQ,?&r,??Y,??r,??r,<VSr>,?wa,*r,v,wZ, v")
> (match_operand:VSX_M 1 "input_operand" "<VSr>,Z,<VSr>,wa,Z,wa,r,wQ,r,Y,r,j,j,j,W,v,wZ"))]
> @@ -962,7 +1315,12 @@
> (match_operand:<VS_scalar> 1 "vsx_register_operand" "ws,wa")
> (match_operand:<VS_scalar> 2 "vsx_register_operand" "ws,wa")))]
> "VECTOR_MEM_VSX_P (<MODE>mode)"
> - "xxpermdi %x0,%x1,%x2,0"
> +{
> + if (BYTES_BIG_ENDIAN)
> + return "xxpermdi %x0,%x1,%x2,0";
> + else
> + return "xxpermdi %x0,%x2,%x1,0";
> +}
> [(set_attr "type" "vecperm")])
>
> ;; Special purpose concat using xxpermdi to glue two single precision values
> @@ -975,9 +1333,161 @@
> (match_operand:SF 2 "vsx_register_operand" "f,f")]
> UNSPEC_VSX_CONCAT))]
> "VECTOR_MEM_VSX_P (V2DFmode)"
> - "xxpermdi %x0,%x1,%x2,0"
> +{
> + if (BYTES_BIG_ENDIAN)
> + return "xxpermdi %x0,%x1,%x2,0";
> + else
> + return "xxpermdi %x0,%x2,%x1,0";
> +}
> + [(set_attr "type" "vecperm")])
> +
> +;; xxpermdi for little endian loads and stores. We need several of
> +;; these since the form of the PARALLEL differs by mode.
> +(define_insn "*vsx_xxpermdi2_le_<mode>"
> + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
> + (vec_select:VSX_D
> + (match_operand:VSX_D 1 "vsx_register_operand" "wa")
> + (parallel [(const_int 1) (const_int 0)])))]
> + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
> + "xxpermdi %x0,%x1,%x1,2"
> + [(set_attr "type" "vecperm")])
> +
> +(define_insn "*vsx_xxpermdi4_le_<mode>"
> + [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
> + (vec_select:VSX_W
> + (match_operand:VSX_W 1 "vsx_register_operand" "wa")
> + (parallel [(const_int 2) (const_int 3)
> + (const_int 0) (const_int 1)])))]
> + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
> + "xxpermdi %x0,%x1,%x1,2"
> + [(set_attr "type" "vecperm")])
> +
> +(define_insn "*vsx_xxpermdi8_le_V8HI"
> + [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
> + (vec_select:V8HI
> + (match_operand:V8HI 1 "vsx_register_operand" "wa")
> + (parallel [(const_int 4) (const_int 5)
> + (const_int 6) (const_int 7)
> + (const_int 0) (const_int 1)
> + (const_int 2) (const_int 3)])))]
> + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)"
> + "xxpermdi %x0,%x1,%x1,2"
> + [(set_attr "type" "vecperm")])
> +
> +(define_insn "*vsx_xxpermdi16_le_V16QI"
> + [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
> + (vec_select:V16QI
> + (match_operand:V16QI 1 "vsx_register_operand" "wa")
> + (parallel [(const_int 8) (const_int 9)
> + (const_int 10) (const_int 11)
> + (const_int 12) (const_int 13)
> + (const_int 14) (const_int 15)
> + (const_int 0) (const_int 1)
> + (const_int 2) (const_int 3)
> + (const_int 4) (const_int 5)
> + (const_int 6) (const_int 7)])))]
> + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)"
> + "xxpermdi %x0,%x1,%x1,2"
> [(set_attr "type" "vecperm")])
>
> +;; lxvd2x for little endian loads. We need several of
> +;; these since the form of the PARALLEL differs by mode.
> +(define_insn "*vsx_lxvd2x2_le_<mode>"
> + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
> + (vec_select:VSX_D
> + (match_operand:VSX_D 1 "memory_operand" "Z")
> + (parallel [(const_int 1) (const_int 0)])))]
> + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
> + "lxvd2x %x0,%y1"
> + [(set_attr "type" "vecload")])
> +
> +(define_insn "*vsx_lxvd2x4_le_<mode>"
> + [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
> + (vec_select:VSX_W
> + (match_operand:VSX_W 1 "memory_operand" "Z")
> + (parallel [(const_int 2) (const_int 3)
> + (const_int 0) (const_int 1)])))]
> + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
> + "lxvd2x %x0,%y1"
> + [(set_attr "type" "vecload")])
> +
> +(define_insn "*vsx_lxvd2x8_le_V8HI"
> + [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
> + (vec_select:V8HI
> + (match_operand:V8HI 1 "memory_operand" "Z")
> + (parallel [(const_int 4) (const_int 5)
> + (const_int 6) (const_int 7)
> + (const_int 0) (const_int 1)
> + (const_int 2) (const_int 3)])))]
> + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)"
> + "lxvd2x %x0,%y1"
> + [(set_attr "type" "vecload")])
> +
> +(define_insn "*vsx_lxvd2x16_le_V16QI"
> + [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
> + (vec_select:V16QI
> + (match_operand:V16QI 1 "memory_operand" "Z")
> + (parallel [(const_int 8) (const_int 9)
> + (const_int 10) (const_int 11)
> + (const_int 12) (const_int 13)
> + (const_int 14) (const_int 15)
> + (const_int 0) (const_int 1)
> + (const_int 2) (const_int 3)
> + (const_int 4) (const_int 5)
> + (const_int 6) (const_int 7)])))]
> + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)"
> + "lxvd2x %x0,%y1"
> + [(set_attr "type" "vecload")])
> +
> +;; stxvd2x for little endian stores. We need several of
> +;; these since the form of the PARALLEL differs by mode.
> +(define_insn "*vsx_stxvd2x2_le_<mode>"
> + [(set (match_operand:VSX_D 0 "memory_operand" "=Z")
> + (vec_select:VSX_D
> + (match_operand:VSX_D 1 "vsx_register_operand" "wa")
> + (parallel [(const_int 1) (const_int 0)])))]
> + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
> + "stxvd2x %x1,%y0"
> + [(set_attr "type" "vecstore")])
> +
> +(define_insn "*vsx_stxvd2x4_le_<mode>"
> + [(set (match_operand:VSX_W 0 "memory_operand" "=Z")
> + (vec_select:VSX_W
> + (match_operand:VSX_W 1 "vsx_register_operand" "wa")
> + (parallel [(const_int 2) (const_int 3)
> + (const_int 0) (const_int 1)])))]
> + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
> + "stxvd2x %x1,%y0"
> + [(set_attr "type" "vecstore")])
> +
> +(define_insn "*vsx_stxvd2x8_le_V8HI"
> + [(set (match_operand:V8HI 0 "memory_operand" "=Z")
> + (vec_select:V8HI
> + (match_operand:V8HI 1 "vsx_register_operand" "wa")
> + (parallel [(const_int 4) (const_int 5)
> + (const_int 6) (const_int 7)
> + (const_int 0) (const_int 1)
> + (const_int 2) (const_int 3)])))]
> + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)"
> + "stxvd2x %x1,%y0"
> + [(set_attr "type" "vecstore")])
> +
> +(define_insn "*vsx_stxvd2x16_le_V16QI"
> + [(set (match_operand:V16QI 0 "memory_operand" "=Z")
> + (vec_select:V16QI
> + (match_operand:V16QI 1 "vsx_register_operand" "wa")
> + (parallel [(const_int 8) (const_int 9)
> + (const_int 10) (const_int 11)
> + (const_int 12) (const_int 13)
> + (const_int 14) (const_int 15)
> + (const_int 0) (const_int 1)
> + (const_int 2) (const_int 3)
> + (const_int 4) (const_int 5)
> + (const_int 6) (const_int 7)])))]
> + "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)"
> + "stxvd2x %x1,%y0"
> + [(set_attr "type" "vecstore")])
> +
> ;; Set the element of a V2DI/VD2F mode
> (define_insn "vsx_set_<mode>"
> [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?wa")
> @@ -987,9 +1497,10 @@
> UNSPEC_VSX_SET))]
> "VECTOR_MEM_VSX_P (<MODE>mode)"
> {
> - if (INTVAL (operands[3]) == 0)
> + int idx_first = BYTES_BIG_ENDIAN ? 0 : 1;
> + if (INTVAL (operands[3]) == idx_first)
> return \"xxpermdi %x0,%x2,%x1,1\";
> - else if (INTVAL (operands[3]) == 1)
> + else if (INTVAL (operands[3]) == 1 - idx_first)
> return \"xxpermdi %x0,%x1,%x2,0\";
> else
> gcc_unreachable ();
> @@ -1004,8 +1515,12 @@
> [(match_operand:QI 2 "u5bit_cint_operand" "i,i,i")])))]
> "VECTOR_MEM_VSX_P (<MODE>mode)"
> {
> + int fldDM;
> gcc_assert (UINTVAL (operands[2]) <= 1);
> - operands[3] = GEN_INT (INTVAL (operands[2]) << 1);
> + fldDM = INTVAL (operands[2]) << 1;
> + if (!BYTES_BIG_ENDIAN)
> + fldDM = 3 - fldDM;
> + operands[3] = GEN_INT (fldDM);
> return \"xxpermdi %x0,%x1,%x1,%3\";
> }
> [(set_attr "type" "vecperm")])
> @@ -1025,6 +1540,21 @@
> (const_string "fpload")))
> (set_attr "length" "4")])
>
> +;; Optimize extracting element 1 from memory for little endian
> +(define_insn "*vsx_extract_<mode>_one_le"
> + [(set (match_operand:<VS_scalar> 0 "vsx_register_operand" "=ws,d,?wa")
> + (vec_select:<VS_scalar>
> + (match_operand:VSX_D 1 "indexed_or_indirect_operand" "Z,Z,Z")
> + (parallel [(const_int 1)])))]
> + "VECTOR_MEM_VSX_P (<MODE>mode) && !WORDS_BIG_ENDIAN"
> + "lxsd%U1x %x0,%y1"
> + [(set (attr "type")
> + (if_then_else
> + (match_test "update_indexed_address_mem (operands[1], VOIDmode)")
> + (const_string "fpload_ux")
> + (const_string "fpload")))
> + (set_attr "length" "4")])
> +
> ;; Extract a SF element from V4SF
> (define_insn_and_split "vsx_extract_v4sf"
> [(set (match_operand:SF 0 "vsx_register_operand" "=f,f")
> @@ -1045,7 +1575,7 @@
> rtx op2 = operands[2];
> rtx op3 = operands[3];
> rtx tmp;
> - HOST_WIDE_INT ele = INTVAL (op2);
> + HOST_WIDE_INT ele = BYTES_BIG_ENDIAN ? INTVAL (op2) : 3 - INTVAL (op2);
>
> if (ele == 0)
> tmp = op1;
> Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/fusion.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/fusion.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/fusion.c
> @@ -1,5 +1,6 @@
> /* { dg-do compile { target { powerpc*-*-* } } } */
> /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
> +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
> /* { dg-require-effective-target powerpc_p8vector_ok } */
> /* { dg-options "-mcpu=power7 -mtune=power8 -O3" } */
>
> Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr43154.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/pr43154.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr43154.c
> @@ -1,5 +1,6 @@
> /* { dg-do compile { target { powerpc*-*-* } } } */
> /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
> +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
> /* { dg-require-effective-target powerpc_vsx_ok } */
> /* { dg-options "-O2 -mcpu=power7" } */
>
> Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c
> @@ -19,19 +19,6 @@ V b4(V x)
> return __builtin_shuffle(x, (V){ 4,5,6,7, 4,5,6,7, 4,5,6,7, 4,5,6,7, });
> }
>
> -V p2(V x, V y)
> -{
> - return __builtin_shuffle(x, y,
> - (V){ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 });
> -
> -}
> -
> -V p4(V x, V y)
> -{
> - return __builtin_shuffle(x, y,
> - (V){ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 });
> -}
> -
> V h1(V x, V y)
> {
> return __builtin_shuffle(x, y,
> @@ -72,5 +59,3 @@ V l4(V x, V y)
> /* { dg-final { scan-assembler "vspltb" } } */
> /* { dg-final { scan-assembler "vsplth" } } */
> /* { dg-final { scan-assembler "vspltw" } } */
> -/* { dg-final { scan-assembler "vpkuhum" } } */
> -/* { dg-final { scan-assembler "vpkuwum" } } */
> Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c
> ===================================================================
> --- /dev/null
> +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
> +/* { dg-options "-O -maltivec -mno-vsx" } */
> +
> +typedef unsigned char V __attribute__((vector_size(16)));
> +
> +V p2(V x, V y)
> +{
> + return __builtin_shuffle(x, y,
> + (V){ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 });
> +
> +}
> +
> +V p4(V x, V y)
> +{
> + return __builtin_shuffle(x, y,
> + (V){ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 });
> +}
> +
> +/* { dg-final { scan-assembler-not "vperm" } } */
> +/* { dg-final { scan-assembler "vpkuhum" } } */
> +/* { dg-final { scan-assembler "vpkuwum" } } */
> Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/eg-5.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vmx/eg-5.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/eg-5.c
> @@ -7,10 +7,17 @@ matvecmul4 (vector float c0, vector floa
> /* Set result to a vector of f32 0's */
> vector float result = ((vector float){0.,0.,0.,0.});
>
> +#ifdef __LITTLE_ENDIAN__
> + result = vec_madd (c0, vec_splat (v, 3), result);
> + result = vec_madd (c1, vec_splat (v, 2), result);
> + result = vec_madd (c2, vec_splat (v, 1), result);
> + result = vec_madd (c3, vec_splat (v, 0), result);
> +#else
> result = vec_madd (c0, vec_splat (v, 0), result);
> result = vec_madd (c1, vec_splat (v, 1), result);
> result = vec_madd (c2, vec_splat (v, 2), result);
> result = vec_madd (c3, vec_splat (v, 3), result);
> +#endif
>
> return result;
> }
> Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c
> @@ -13,12 +13,27 @@
> #define DO_INLINE __attribute__ ((always_inline))
> #define DONT_INLINE __attribute__ ((noinline))
>
> +#ifdef __LITTLE_ENDIAN__
> +static inline DO_INLINE int inline_me(vector signed short data)
> +{
> + union {vector signed short v; signed short s[8];} u;
> + signed short x;
> + unsigned char x1, x2;
> +
> + u.v = data;
> + x = u.s[7];
> + x1 = (x >> 8) & 0xff;
> + x2 = x & 0xff;
> + return ((x2 << 8) | x1);
> +}
> +#else
> static inline DO_INLINE int inline_me(vector signed short data)
> {
> union {vector signed short v; signed short s[8];} u;
> u.v = data;
> return u.s[7];
> }
> +#endif
>
> static DONT_INLINE int foo(vector signed short data)
> {
> Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/vec-set.c
> ===================================================================
> --- /dev/null
> +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/vec-set.c
> @@ -0,0 +1,14 @@
> +#include "harness.h"
> +
> +vector short
> +vec_set (short m)
> +{
> + return (vector short){m, 0, 0, 0, 0, 0, 0, 0};
> +}
> +
> +static void test()
> +{
> + check (vec_all_eq (vec_set (7),
> + ((vector short){7, 0, 0, 0, 0, 0, 0, 0})),
> + "vec_set");
> +}
> Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/3b-15.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vmx/3b-15.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/3b-15.c
> @@ -3,7 +3,11 @@
> vector unsigned char
> f (vector unsigned char a, vector unsigned char b, vector unsigned char c)
> {
> +#ifdef __BIG_ENDIAN__
> return vec_perm(a,b,c);
> +#else
> + return vec_perm(b,a,c);
> +#endif
> }
>
> static void test()
> @@ -12,8 +16,13 @@ static void test()
> 8,9,10,11,12,13,14,15}),
> ((vector unsigned char){70,71,72,73,74,75,76,77,
> 78,79,80,81,82,83,84,85}),
> +#ifdef __BIG_ENDIAN__
> ((vector unsigned char){0x1,0x14,0x18,0x10,0x16,0x15,0x19,0x1a,
> 0x1c,0x1c,0x1c,0x12,0x8,0x1d,0x1b,0xe})),
> +#else
> + ((vector unsigned char){0x1e,0xb,0x7,0xf,0x9,0xa,0x6,0x5,
> + 0x3,0x3,0x3,0xd,0x17,0x2,0x4,0x11})),
> +#endif
> ((vector unsigned char){1,74,78,70,76,75,79,80,82,82,82,72,8,83,81,14})),
> "f");
> }
> Index: gcc-4_8-test/libcpp/lex.c
> ===================================================================
> --- gcc-4_8-test.orig/libcpp/lex.c
> +++ gcc-4_8-test/libcpp/lex.c
> @@ -559,8 +559,13 @@ search_line_fast (const uchar *s, const
> beginning with all ones and shifting in zeros according to the
> mis-alignment. The LVSR instruction pulls the exact shift we
> want from the address. */
> +#ifdef __BIG_ENDIAN__
> mask = __builtin_vec_lvsr(0, s);
> mask = __builtin_vec_perm(zero, ones, mask);
> +#else
> + mask = __builtin_vec_lvsl(0, s);
> + mask = __builtin_vec_perm(ones, zero, mask);
> +#endif
> data &= mask;
>
> /* While altivec loads mask addresses, we still need to align S so
> @@ -624,7 +629,11 @@ search_line_fast (const uchar *s, const
> /* L now contains 0xff in bytes for which we matched one of the
> relevant characters. We can find the byte index by finding
> its bit index and dividing by 8. */
> +#ifdef __BIG_ENDIAN__
> l = __builtin_clzl(l) >> 3;
> +#else
> + l = __builtin_ctzl(l) >> 3;
> +#endif
> return s + l;
>
> #undef N
> Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr48258-1.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/pr48258-1.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr48258-1.c
> @@ -1,5 +1,6 @@
> /* { dg-do compile } */
> /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
> +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
> /* { dg-require-effective-target powerpc_vsx_ok } */
> /* { dg-options "-O3 -mcpu=power7 -mabi=altivec -ffast-math -fno-unroll-loops" } */
> /* { dg-final { scan-assembler-times "xvaddsp" 3 } } */
> Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
> @@ -1,4 +1,5 @@
> /* { dg-require-effective-target vect_int } */
> +/* { dg-skip-if "cost too high" { powerpc*le-*-* } { "*" } { "" } } */
>
> #include <stdarg.h>
> #include "../../tree-vect.h"
>
>
>
>
>
--
Richard Biener <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
More information about the Gcc-patches
mailing list