[4.8, PATCH 7/26] Backport Power8 and LE support: Vector LE

Richard Biener rguenther@suse.de
Mon Mar 24 10:17:00 GMT 2014


On Wed, 19 Mar 2014, Bill Schmidt wrote:

> Hi,
> 
> This patch (diff-le-vector) backports the changes to support vector
> infrastructure on powerpc64le.  Copying Richard and Jakub for the libcpp
> bits.

The libcpp bits are fine.

Thanks,
Richard.

> Thanks,
> Bill
> 
> 
> [gcc]
> 
> 2014-03-29  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	Backport from mainline r205333
> 	2013-11-24  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/rs6000.c (rs6000_expand_vec_perm_const_1): Correct
> 	for little endian.
> 
> 	Backport from mainline r205241
> 	2013-11-21  Bill Schmidt  <wschmidt@vnet.ibm.com>
> 
> 	* config/rs6000/vector.md (vec_pack_trunc_v2df): Revert previous
> 	little endian change.
> 	(vec_pack_sfix_trunc_v2df): Likewise.
> 	(vec_pack_ufix_trunc_v2df): Likewise.
> 	* config/rs6000/rs6000.c (rs6000_expand_interleave): Correct
> 	double checking of endianness.
> 
> 	Backport from mainline r205146
> 	2013-11-20  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/vsx.md (vsx_set_<mode>): Adjust for little endian.
> 	(vsx_extract_<mode>): Likewise.
> 	(*vsx_extract_<mode>_one_le): New LE variant on
> 	*vsx_extract_<mode>_zero.
> 	(vsx_extract_v4sf): Adjust for little endian.
> 
> 	Backport from mainline r205080
> 	2013-11-19  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Adjust
> 	V16QI vector splat case for little endian.
> 
> 	Backport from mainline r205045:
> 
> 	2013-11-19  Ulrich Weigand  <Ulrich.Weigand@de.ibm.com>
> 
> 	* config/rs6000/vector.md ("mov<mode>"): Do not call
> 	rs6000_emit_le_vsx_move to move into or out of GPRs.
> 	* config/rs6000/rs6000.c (rs6000_emit_le_vsx_move): Assert
> 	source and destination are not GPR hard regs.
> 
> 	Backport from mainline r204920
> 	2011-11-17  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/rs6000.c (rs6000_frame_related): Add split_reg
> 	parameter and use it in REG_FRAME_RELATED_EXPR note.
> 	(emit_frame_save): Call rs6000_frame_related with extra NULL_RTX
> 	parameter.
> 	(rs6000_emit_prologue): Likewise, but for little endian VSX
> 	stores, pass the source register of the store instead.
> 
> 	Backport from mainline r204862
> 	2013-11-15  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/altivec.md (UNSPEC_VPERM_X, UNSPEC_VPERM_UNS_X):
> 	Remove.
> 	(altivec_vperm_<mode>): Revert earlier little endian change.
> 	(*altivec_vperm_<mode>_internal): Remove.
> 	(altivec_vperm_<mode>_uns): Revert earlier little endian change.
> 	(*altivec_vperm_<mode>_uns_internal): Remove.
> 	* config/rs6000/vector.md (vec_realign_load_<mode>): Revise
> 	commentary.
> 
> 	Backport from mainline r204441
> 	2013-11-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/rs6000.c (rs6000_option_override_internal):
> 	Remove restriction against use of VSX instructions when generating
> 	code for little endian mode.
> 
> 	Backport from mainline r204440
> 	2013-11-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/altivec.md (mulv4si3): Ensure we generate vmulouh
> 	for both big and little endian.
> 	(mulv8hi3): Swap input operands for merge high and merge low
> 	instructions for little endian.
> 
> 	Backport from mainline r204439
> 	2013-11-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/altivec.md (vec_widen_umult_even_v16qi): Change
> 	define_insn to define_expand that uses even patterns for big
> 	endian and odd patterns for little endian.
> 	(vec_widen_smult_even_v16qi): Likewise.
> 	(vec_widen_umult_even_v8hi): Likewise.
> 	(vec_widen_smult_even_v8hi): Likewise.
> 	(vec_widen_umult_odd_v16qi): Likewise.
> 	(vec_widen_smult_odd_v16qi): Likewise.
> 	(vec_widen_umult_odd_v8hi): Likewise.
> 	(vec_widen_smult_odd_v8hi): Likewise.
> 	(altivec_vmuleub): New define_insn.
> 	(altivec_vmuloub): Likewise.
> 	(altivec_vmulesb): Likewise.
> 	(altivec_vmulosb): Likewise.
> 	(altivec_vmuleuh): Likewise.
> 	(altivec_vmulouh): Likewise.
> 	(altivec_vmulesh): Likewise.
> 	(altivec_vmulosh): Likewise.
> 
> 	Backport from mainline r204395
> 	2013-11-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/vector.md (vec_pack_sfix_trunc_v2df): Adjust for
> 	little endian.
> 	(vec_pack_ufix_trunc_v2df): Likewise.
> 
> 	Backport from mainline r204363
> 	2013-11-04  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/altivec.md (vec_widen_umult_hi_v16qi): Swap
> 	arguments to merge instruction for little endian.
> 	(vec_widen_umult_lo_v16qi): Likewise.
> 	(vec_widen_smult_hi_v16qi): Likewise.
> 	(vec_widen_smult_lo_v16qi): Likewise.
> 	(vec_widen_umult_hi_v8hi): Likewise.
> 	(vec_widen_umult_lo_v8hi): Likewise.
> 	(vec_widen_smult_hi_v8hi): Likewise.
> 	(vec_widen_smult_lo_v8hi): Likewise.
> 
> 	Backport from mainline r204350
> 	2013-11-04  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/vsx.md (*vsx_le_perm_store_<mode> for VSX_D):
> 	Replace the define_insn_and_split with a define_insn and two
> 	define_splits, with the split after reload re-permuting the source
> 	register to its original value.
> 	(*vsx_le_perm_store_<mode> for VSX_W): Likewise.
> 	(*vsx_le_perm_store_v8hi): Likewise.
> 	(*vsx_le_perm_store_v16qi): Likewise.
> 
> 	Backport from mainline r204321
> 	2013-11-04  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/vector.md (vec_pack_trunc_v2df):  Adjust for
> 	little endian.
> 
> 	Backport from mainline r204321
> 	2013-11-02  Bill Schmidt  <wschmidt@vnet.linux.ibm.com>
> 
> 	* config/rs6000/rs6000.c (rs6000_expand_vector_set): Adjust for
> 	little endian.
> 
> 	Backport from mainline r203980
> 	2013-10-23  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/altivec.md (mulv8hi3): Adjust for little endian.
> 
> 	Backport from mainline r203930
> 	2013-10-22  Bill Schmidt  <wschmidt@vnet.ibm.com>
> 
> 	* config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse
> 	meaning of merge-high and merge-low masks for little endian; avoid
> 	use of vector-pack masks for little endian for mismatched modes.
> 
> 	Backport from mainline r203877
> 	2013-10-20  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Adjust for
> 	little endian.
> 	(vec_unpacku_hi_v8hi): Likewise.
> 	(vec_unpacku_lo_v16qi): Likewise.
> 	(vec_unpacku_lo_v8hi): Likewise.
> 
> 	Backport from mainline r203863
> 	2013-10-19  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/rs6000.c (vspltis_constant): Make sure we check
> 	all elements for both endian flavors.
> 
> 	Backport from mainline r203714
> 	2013-10-16  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* gcc/config/rs6000/vector.md (vec_unpacks_hi_v4sf): Correct for
> 	endianness.
> 	(vec_unpacks_lo_v4sf): Likewise.
> 	(vec_unpacks_float_hi_v4si): Likewise.
> 	(vec_unpacks_float_lo_v4si): Likewise.
> 	(vec_unpacku_float_hi_v4si): Likewise.
> 	(vec_unpacku_float_lo_v4si): Likewise.
> 
> 	Backport from mainline r203713
> 	2013-10-16  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/vsx.md (vsx_concat_<mode>): Adjust output for LE.
> 	(vsx_concat_v2sf): Likewise.
> 
> 	Backport from mainline r203458
> 	2013-10-11  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/vsx.md (*vsx_le_perm_load_v2di): Generalize to
> 	handle vector float as well.
> 	(*vsx_le_perm_load_v4si): Likewise.
> 	(*vsx_le_perm_store_v2di): Likewise.
> 	(*vsx_le_perm_store_v4si): Likewise.
> 
> 	Backport from mainline r203457
> 	2013-10-11  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/vector.md (vec_realign_load<mode>): Generate vperm
> 	directly to circumvent subtract from splat{31} workaround.
> 	* config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_le): New
> 	prototype.
> 	* config/rs6000/rs6000.c (altivec_expand_vec_perm_le): New.
> 	* config/rs6000/altivec.md (define_c_enum "unspec"): Add
> 	UNSPEC_VPERM_X and UNSPEC_VPERM_UNS_X.
> 	(altivec_vperm_<mode>): Convert to define_insn_and_split to
> 	separate big and little endian logic.
> 	(*altivec_vperm_<mode>_internal): New define_insn.
> 	(altivec_vperm_<mode>_uns): Convert to define_insn_and_split to
> 	separate big and little endian logic.
> 	(*altivec_vperm_<mode>_uns_internal): New define_insn.
> 	(vec_permv16qi): Add little endian logic.
> 
> 	Backport from mainline r203247
> 	2013-10-07  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/rs6000.c (altivec_expand_vec_perm_const_le): New.
> 	(altivec_expand_vec_perm_const): Call it.
> 
> 	Backport from mainline r203246
> 	2013-10-07  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/vector.md (mov<mode>): Emit permuted move
> 	sequences for LE VSX loads and stores at expand time.
> 	* config/rs6000/rs6000-protos.h (rs6000_emit_le_vsx_move): New
> 	prototype.
> 	* config/rs6000/rs6000.c (rs6000_const_vec): New.
> 	(rs6000_gen_le_vsx_permute): New.
> 	(rs6000_gen_le_vsx_load): New.
> 	(rs6000_gen_le_vsx_store): New.
> 	(rs6000_gen_le_vsx_move): New.
> 	* config/rs6000/vsx.md (*vsx_le_perm_load_v2di): New.
> 	(*vsx_le_perm_load_v4si): New.
> 	(*vsx_le_perm_load_v8hi): New.
> 	(*vsx_le_perm_load_v16qi): New.
> 	(*vsx_le_perm_store_v2di): New.
> 	(*vsx_le_perm_store_v4si): New.
> 	(*vsx_le_perm_store_v8hi): New.
> 	(*vsx_le_perm_store_v16qi): New.
> 	(*vsx_xxpermdi2_le_<mode>): New.
> 	(*vsx_xxpermdi4_le_<mode>): New.
> 	(*vsx_xxpermdi8_le_V8HI): New.
> 	(*vsx_xxpermdi16_le_V16QI): New.
> 	(*vsx_lxvd2x2_le_<mode>): New.
> 	(*vsx_lxvd2x4_le_<mode>): New.
> 	(*vsx_lxvd2x8_le_V8HI): New.
> 	(*vsx_lxvd2x16_le_V16QI): New.
> 	(*vsx_stxvd2x2_le_<mode>): New.
> 	(*vsx_stxvd2x4_le_<mode>): New.
> 	(*vsx_stxvd2x8_le_V8HI): New.
> 	(*vsx_stxvd2x16_le_V16QI): New.
> 
> 	Backport from mainline r201235
> 	2013-07-24  Bill Schmidt  <wschmidt@linux.ibm.com>
> 	            Anton Blanchard <anton@au1.ibm.com>
> 
> 	* config/rs6000/altivec.md (altivec_vpkpx): Handle little endian.
> 	(altivec_vpks<VI_char>ss): Likewise.
> 	(altivec_vpks<VI_char>us): Likewise.
> 	(altivec_vpku<VI_char>us): Likewise.
> 	(altivec_vpku<VI_char>um): Likewise.
> 
> 	Backport from mainline r201208
> 	2013-07-24  Bill Schmidt  <wschmidt@vnet.linux.ibm.com>
> 	            Anton Blanchard <anton@au1.ibm.com>
> 
> 	* config/rs6000/vector.md (vec_realign_load_<mode>): Reorder input
> 	operands to vperm for little endian.
> 	* config/rs6000/rs6000.c (rs6000_expand_builtin): Use lvsr instead
> 	of lvsl to create the control mask for a vperm for little endian.
> 
> 	Backport from mainline r201195
> 	2013-07-23  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 	            Anton Blanchard <anton@au1.ibm.com>
> 
> 	* config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse
> 	two operands for little-endian.
> 
> 	Backport from mainline r201193
> 	2013-07-23  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 	            Anton Blanchard <anton@au1.ibm.com>
> 
> 	* config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Correct
> 	selection of field for vector splat in little endian mode.
> 
> 	Backport from mainline r201149
> 	2013-07-22  Bill Schmidt  <wschmidt@vnet.linux.ibm.com>
> 	            Anton Blanchard <anton@au1.ibm.com>
> 
> 	* config/rs6000/rs6000.c (rs6000_expand_vector_init): Fix
> 	endianness when selecting field to splat.
> 
> [gcc/testsuite]
> 
> 2014-03-29  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	Backport from mainline r205638
> 	2013-12-03  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c: Skip for little
> 	endian.
> 
> 	Backport from mainline r205146
> 	2013-11-20  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* gcc.target/powerpc/pr48258-1.c: Skip for little endian.
> 
> 	Backport from mainline r204862
> 	2013-11-15  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* gcc.dg/vmx/3b-15.c: Revise for little endian.
> 
> 	Backport from mainline r204321
> 	2013-11-02  Bill Schmidt  <wschmidt@vnet.linux.ibm.com>
> 
> 	* gcc.dg/vmx/vec-set.c: New.
> 
> 	Backport from mainline r204138
> 	2013-10-28  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* gcc.dg/vmx/gcc-bug-i.c: Add little endian variant.
> 	* gcc.dg/vmx/eg-5.c: Likewise.
> 
> 	Backport from mainline r203930
> 	2013-10-22  Bill Schmidt  <wschmidt@vnet.ibm.com>
> 
> 	* gcc.target/powerpc/altivec-perm-1.c: Move the two vector pack
> 	tests into...
> 	* gcc.target/powerpc/altivec-perm-3.c: ...this new test, which is
> 	restricted to big-endian targets.
> 
> 	Backport from mainline r203246
> 	2013-10-07  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* gcc.target/powerpc/pr43154.c: Skip for ppc64 little endian.
> 	* gcc.target/powerpc/fusion.c: Likewise.
> 
> [libcpp]
> 
> 2014-03-29  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	Backport from mainline
> 	2013-11-18  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* lex.c (search_line_fast): Correct for little endian.
> 
> 
> Index: gcc-4_8-test/gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.c
> +++ gcc-4_8-test/gcc/config/rs6000/rs6000.c
> @@ -3216,11 +3216,6 @@ rs6000_option_override_internal (bool gl
>  	}
>        else if (TARGET_PAIRED_FLOAT)
>  	msg = N_("-mvsx and -mpaired are incompatible");
> -      /* The hardware will allow VSX and little endian, but until we make sure
> -	 things like vector select, etc. work don't allow VSX on little endian
> -	 systems at this point.  */
> -      else if (!BYTES_BIG_ENDIAN)
> -	msg = N_("-mvsx used with little endian code");
>        else if (TARGET_AVOID_XFORM > 0)
>  	msg = N_("-mvsx needs indexed addressing");
>        else if (!TARGET_ALTIVEC && (rs6000_isa_flags_explicit
> @@ -4991,15 +4986,16 @@ vspltis_constant (rtx op, unsigned step,
>  
>    /* Check if VAL is present in every STEP-th element, and the
>       other elements are filled with its most significant bit.  */
> -  for (i = 0; i < nunits - 1; ++i)
> +  for (i = 1; i < nunits; ++i)
>      {
>        HOST_WIDE_INT desired_val;
> -      if (((BYTES_BIG_ENDIAN ? i + 1 : i) & (step - 1)) == 0)
> +      unsigned elt = BYTES_BIG_ENDIAN ? nunits - 1 - i : i;
> +      if ((i & (step - 1)) == 0)
>  	desired_val = val;
>        else
>  	desired_val = msb_val;
>  
> -      if (desired_val != const_vector_elt_as_int (op, i))
> +      if (desired_val != const_vector_elt_as_int (op, elt))
>  	return false;
>      }
>  
> @@ -5446,6 +5442,7 @@ rs6000_expand_vector_init (rtx target, r
>       of 64-bit items is not supported on Altivec.  */
>    if (all_same && GET_MODE_SIZE (inner_mode) <= 4)
>      {
> +      rtx field;
>        mem = assign_stack_temp (mode, GET_MODE_SIZE (inner_mode));
>        emit_move_insn (adjust_address_nv (mem, inner_mode, 0),
>  		      XVECEXP (vals, 0, 0));
> @@ -5456,9 +5453,11 @@ rs6000_expand_vector_init (rtx target, r
>  					      gen_rtx_SET (VOIDmode,
>  							   target, mem),
>  					      x)));
> +      field = (BYTES_BIG_ENDIAN ? const0_rtx
> +	       : GEN_INT (GET_MODE_NUNITS (mode) - 1));
>        x = gen_rtx_VEC_SELECT (inner_mode, target,
>  			      gen_rtx_PARALLEL (VOIDmode,
> -						gen_rtvec (1, const0_rtx)));
> +						gen_rtvec (1, field)));
>        emit_insn (gen_rtx_SET (VOIDmode, target,
>  			      gen_rtx_VEC_DUPLICATE (mode, x)));
>        return;
> @@ -5531,10 +5530,27 @@ rs6000_expand_vector_set (rtx target, rt
>      XVECEXP (mask, 0, elt*width + i)
>        = GEN_INT (i + 0x10);
>    x = gen_rtx_CONST_VECTOR (V16QImode, XVEC (mask, 0));
> -  x = gen_rtx_UNSPEC (mode,
> -		      gen_rtvec (3, target, reg,
> -				 force_reg (V16QImode, x)),
> -		      UNSPEC_VPERM);
> +
> +  if (BYTES_BIG_ENDIAN)
> +    x = gen_rtx_UNSPEC (mode,
> +			gen_rtvec (3, target, reg,
> +				   force_reg (V16QImode, x)),
> +			UNSPEC_VPERM);
> +  else 
> +    {
> +      /* Invert selector.  */
> +      rtx splat = gen_rtx_VEC_DUPLICATE (V16QImode,
> +					 gen_rtx_CONST_INT (QImode, -1));
> +      rtx tmp = gen_reg_rtx (V16QImode);
> +      emit_move_insn (tmp, splat);
> +      x = gen_rtx_MINUS (V16QImode, tmp, force_reg (V16QImode, x));
> +      emit_move_insn (tmp, x);
> +
> +      /* Permute with operands reversed and adjusted selector.  */
> +      x = gen_rtx_UNSPEC (mode, gen_rtvec (3, reg, target, tmp),
> +			  UNSPEC_VPERM);
> +    }
> +
>    emit_insn (gen_rtx_SET (VOIDmode, target, x));
>  }
>  
> @@ -7830,6 +7846,107 @@ rs6000_eliminate_indexed_memrefs (rtx op
>  			       copy_addr_to_reg (XEXP (operands[1], 0)));
>  }
>  
> +/* Generate a vector of constants to permute MODE for a little-endian
> +   storage operation by swapping the two halves of a vector.  */
> +static rtvec
> +rs6000_const_vec (enum machine_mode mode)
> +{
> +  int i, subparts;
> +  rtvec v;
> +
> +  switch (mode)
> +    {
> +    case V2DFmode:
> +    case V2DImode:
> +      subparts = 2;
> +      break;
> +    case V4SFmode:
> +    case V4SImode:
> +      subparts = 4;
> +      break;
> +    case V8HImode:
> +      subparts = 8;
> +      break;
> +    case V16QImode:
> +      subparts = 16;
> +      break;
> +    default:
> +      gcc_unreachable();
> +    }
> +
> +  v = rtvec_alloc (subparts);
> +
> +  for (i = 0; i < subparts / 2; ++i)
> +    RTVEC_ELT (v, i) = gen_rtx_CONST_INT (DImode, i + subparts / 2);
> +  for (i = subparts / 2; i < subparts; ++i)
> +    RTVEC_ELT (v, i) = gen_rtx_CONST_INT (DImode, i - subparts / 2);
> +
> +  return v;
> +}
> +
> +/* Generate a permute rtx that represents an lxvd2x, stxvd2x, or xxpermdi
> +   for a VSX load or store operation.  */
> +rtx
> +rs6000_gen_le_vsx_permute (rtx source, enum machine_mode mode)
> +{
> +  rtx par = gen_rtx_PARALLEL (VOIDmode, rs6000_const_vec (mode));
> +  return gen_rtx_VEC_SELECT (mode, source, par);
> +}
> +
> +/* Emit a little-endian load from vector memory location SOURCE to VSX
> +   register DEST in mode MODE.  The load is done with two permuting
> +   insn's that represent an lxvd2x and xxpermdi.  */
> +void
> +rs6000_emit_le_vsx_load (rtx dest, rtx source, enum machine_mode mode)
> +{
> +  rtx tmp = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (dest) : dest;
> +  rtx permute_mem = rs6000_gen_le_vsx_permute (source, mode);
> +  rtx permute_reg = rs6000_gen_le_vsx_permute (tmp, mode);
> +  emit_insn (gen_rtx_SET (VOIDmode, tmp, permute_mem));
> +  emit_insn (gen_rtx_SET (VOIDmode, dest, permute_reg));
> +}
> +
> +/* Emit a little-endian store to vector memory location DEST from VSX
> +   register SOURCE in mode MODE.  The store is done with two permuting
> +   insn's that represent an xxpermdi and an stxvd2x.  */
> +void
> +rs6000_emit_le_vsx_store (rtx dest, rtx source, enum machine_mode mode)
> +{
> +  rtx tmp = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (source) : source;
> +  rtx permute_src = rs6000_gen_le_vsx_permute (source, mode);
> +  rtx permute_tmp = rs6000_gen_le_vsx_permute (tmp, mode);
> +  emit_insn (gen_rtx_SET (VOIDmode, tmp, permute_src));
> +  emit_insn (gen_rtx_SET (VOIDmode, dest, permute_tmp));
> +}
> +
> +/* Emit a sequence representing a little-endian VSX load or store,
> +   moving data from SOURCE to DEST in mode MODE.  This is done
> +   separately from rs6000_emit_move to ensure it is called only
> +   during expand.  LE VSX loads and stores introduced later are
> +   handled with a split.  The expand-time RTL generation allows
> +   us to optimize away redundant pairs of register-permutes.  */
> +void
> +rs6000_emit_le_vsx_move (rtx dest, rtx source, enum machine_mode mode)
> +{
> +  gcc_assert (!BYTES_BIG_ENDIAN
> +	      && VECTOR_MEM_VSX_P (mode)
> +	      && mode != TImode
> +	      && !gpr_or_gpr_p (dest, source)
> +	      && (MEM_P (source) ^ MEM_P (dest)));
> +
> +  if (MEM_P (source))
> +    {
> +      gcc_assert (REG_P (dest));
> +      rs6000_emit_le_vsx_load (dest, source, mode);
> +    }
> +  else
> +    {
> +      if (!REG_P (source))
> +	source = force_reg (mode, source);
> +      rs6000_emit_le_vsx_store (dest, source, mode);
> +    }
> +}
> +
>  /* Emit a move from SOURCE to DEST in mode MODE.  */
>  void
>  rs6000_emit_move (rtx dest, rtx source, enum machine_mode mode)
> @@ -12589,7 +12706,8 @@ rs6000_expand_builtin (tree exp, rtx tar
>      case ALTIVEC_BUILTIN_MASK_FOR_LOAD:
>      case ALTIVEC_BUILTIN_MASK_FOR_STORE:
>        {
> -	int icode = (int) CODE_FOR_altivec_lvsr;
> +	int icode = (BYTES_BIG_ENDIAN ? (int) CODE_FOR_altivec_lvsr
> +		     : (int) CODE_FOR_altivec_lvsl);
>  	enum machine_mode tmode = insn_data[icode].operand[0].mode;
>  	enum machine_mode mode = insn_data[icode].operand[1].mode;
>  	tree arg;
> @@ -20880,7 +20998,7 @@ output_probe_stack_range (rtx reg1, rtx
>  
>  static rtx
>  rs6000_frame_related (rtx insn, rtx reg, HOST_WIDE_INT val,
> -		      rtx reg2, rtx rreg)
> +		      rtx reg2, rtx rreg, rtx split_reg)
>  {
>    rtx real, temp;
>  
> @@ -20971,6 +21089,11 @@ rs6000_frame_related (rtx insn, rtx reg,
>  	  }
>      }
>  
> +  /* If a store insn has been split into multiple insns, the
> +     true source register is given by split_reg.  */
> +  if (split_reg != NULL_RTX)
> +    real = gen_rtx_SET (VOIDmode, SET_DEST (real), split_reg);
> +
>    RTX_FRAME_RELATED_P (insn) = 1;
>    add_reg_note (insn, REG_FRAME_RELATED_EXPR, real);
>  
> @@ -21078,7 +21201,7 @@ emit_frame_save (rtx frame_reg, enum mac
>    reg = gen_rtx_REG (mode, regno);
>    insn = emit_insn (gen_frame_store (reg, frame_reg, offset));
>    return rs6000_frame_related (insn, frame_reg, frame_reg_to_sp,
> -			       NULL_RTX, NULL_RTX);
> +			       NULL_RTX, NULL_RTX, NULL_RTX);
>  }
>  
>  /* Emit an offset memory reference suitable for a frame store, while
> @@ -21599,7 +21722,7 @@ rs6000_emit_prologue (void)
>  
>        insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, p));
>        rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
> -			    treg, GEN_INT (-info->total_size));
> +			    treg, GEN_INT (-info->total_size), NULL_RTX);
>        sp_off = frame_off = info->total_size;
>      }
>  
> @@ -21684,7 +21807,7 @@ rs6000_emit_prologue (void)
>  
>  	  insn = emit_move_insn (mem, reg);
>  	  rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
> -				NULL_RTX, NULL_RTX);
> +				NULL_RTX, NULL_RTX, NULL_RTX);
>  	  END_USE (0);
>  	}
>      }
> @@ -21752,7 +21875,7 @@ rs6000_emit_prologue (void)
>  				     info->lr_save_offset,
>  				     DFmode, sel);
>        rs6000_frame_related (insn, ptr_reg, sp_off,
> -			    NULL_RTX, NULL_RTX);
> +			    NULL_RTX, NULL_RTX, NULL_RTX);
>        if (lr)
>  	END_USE (0);
>      }
> @@ -21831,7 +21954,7 @@ rs6000_emit_prologue (void)
>  					 SAVRES_SAVE | SAVRES_GPR);
>  
>  	  rs6000_frame_related (insn, spe_save_area_ptr, sp_off - save_off,
> -				NULL_RTX, NULL_RTX);
> +				NULL_RTX, NULL_RTX, NULL_RTX);
>  	}
>  
>        /* Move the static chain pointer back.  */
> @@ -21881,7 +22004,7 @@ rs6000_emit_prologue (void)
>  				     info->lr_save_offset + ptr_off,
>  				     reg_mode, sel);
>        rs6000_frame_related (insn, ptr_reg, sp_off - ptr_off,
> -			    NULL_RTX, NULL_RTX);
> +			    NULL_RTX, NULL_RTX, NULL_RTX);
>        if (lr)
>  	END_USE (0);
>      }
> @@ -21897,7 +22020,7 @@ rs6000_emit_prologue (void)
>  			     info->gp_save_offset + frame_off + reg_size * i);
>        insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, p));
>        rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
> -			    NULL_RTX, NULL_RTX);
> +			    NULL_RTX, NULL_RTX, NULL_RTX);
>      }
>    else if (!WORLD_SAVE_P (info))
>      {
> @@ -22124,7 +22247,7 @@ rs6000_emit_prologue (void)
>  				     info->altivec_save_offset + ptr_off,
>  				     0, V4SImode, SAVRES_SAVE | SAVRES_VR);
>        rs6000_frame_related (insn, scratch_reg, sp_off - ptr_off,
> -			    NULL_RTX, NULL_RTX);
> +			    NULL_RTX, NULL_RTX, NULL_RTX);
>        if (REGNO (frame_reg_rtx) == REGNO (scratch_reg))
>  	{
>  	  /* The oddity mentioned above clobbered our frame reg.  */
> @@ -22140,7 +22263,7 @@ rs6000_emit_prologue (void)
>        for (i = info->first_altivec_reg_save; i <= LAST_ALTIVEC_REGNO; ++i)
>  	if (info->vrsave_mask & ALTIVEC_REG_BIT (i))
>  	  {
> -	    rtx areg, savereg, mem;
> +	    rtx areg, savereg, mem, split_reg;
>  	    int offset;
>  
>  	    offset = (info->altivec_save_offset + frame_off
> @@ -22158,8 +22281,18 @@ rs6000_emit_prologue (void)
>  
>  	    insn = emit_move_insn (mem, savereg);
>  
> +	    /* When we split a VSX store into two insns, we need to make
> +	       sure the DWARF info knows which register we are storing.
> +	       Pass it in to be used on the appropriate note.  */
> +	    if (!BYTES_BIG_ENDIAN
> +		&& GET_CODE (PATTERN (insn)) == SET
> +		&& GET_CODE (SET_SRC (PATTERN (insn))) == VEC_SELECT)
> +	      split_reg = savereg;
> +	    else
> +	      split_reg = NULL_RTX;
> +
>  	    rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
> -				  areg, GEN_INT (offset));
> +				  areg, GEN_INT (offset), split_reg);
>  	  }
>      }
>  
> @@ -28813,6 +28946,136 @@ rs6000_emit_parity (rtx dst, rtx src)
>      }
>  }
>  
> +/* Expand an Altivec constant permutation for little endian mode.
> +   There are two issues: First, the two input operands must be
> +   swapped so that together they form a double-wide array in LE
> +   order.  Second, the vperm instruction has surprising behavior
> +   in LE mode:  it interprets the elements of the source vectors
> +   in BE mode ("left to right") and interprets the elements of
> +   the destination vector in LE mode ("right to left").  To
> +   correct for this, we must subtract each element of the permute
> +   control vector from 31.
> +
> +   For example, suppose we want to concatenate vr10 = {0, 1, 2, 3}
> +   with vr11 = {4, 5, 6, 7} and extract {0, 2, 4, 6} using a vperm.
> +   We place {0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27} in vr12 to
> +   serve as the permute control vector.  Then, in BE mode,
> +
> +     vperm 9,10,11,12
> +
> +   places the desired result in vr9.  However, in LE mode the 
> +   vector contents will be
> +
> +     vr10 = 00000003 00000002 00000001 00000000
> +     vr11 = 00000007 00000006 00000005 00000004
> +
> +   The result of the vperm using the same permute control vector is
> +
> +     vr9  = 05000000 07000000 01000000 03000000
> +
> +   That is, the leftmost 4 bytes of vr10 are interpreted as the
> +   source for the rightmost 4 bytes of vr9, and so on.
> +
> +   If we change the permute control vector to
> +
> +     vr12 = {31,20,29,28,23,22,21,20,15,14,13,12,7,6,5,4}
> +
> +   and issue
> +
> +     vperm 9,11,10,12
> +
> +   we get the desired
> +
> +   vr9  = 00000006 00000004 00000002 00000000.  */
> +
> +void
> +altivec_expand_vec_perm_const_le (rtx operands[4])
> +{
> +  unsigned int i;
> +  rtx perm[16];
> +  rtx constv, unspec;
> +  rtx target = operands[0];
> +  rtx op0 = operands[1];
> +  rtx op1 = operands[2];
> +  rtx sel = operands[3];
> +
> +  /* Unpack and adjust the constant selector.  */
> +  for (i = 0; i < 16; ++i)
> +    {
> +      rtx e = XVECEXP (sel, 0, i);
> +      unsigned int elt = 31 - (INTVAL (e) & 31);
> +      perm[i] = GEN_INT (elt);
> +    }
> +
> +  /* Expand to a permute, swapping the inputs and using the
> +     adjusted selector.  */
> +  if (!REG_P (op0))
> +    op0 = force_reg (V16QImode, op0);
> +  if (!REG_P (op1))
> +    op1 = force_reg (V16QImode, op1);
> +
> +  constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
> +  constv = force_reg (V16QImode, constv);
> +  unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, constv),
> +			   UNSPEC_VPERM);
> +  if (!REG_P (target))
> +    {
> +      rtx tmp = gen_reg_rtx (V16QImode);
> +      emit_move_insn (tmp, unspec);
> +      unspec = tmp;
> +    }
> +
> +  emit_move_insn (target, unspec);
> +}
> +
> +/* Similarly to altivec_expand_vec_perm_const_le, we must adjust the
> +   permute control vector.  But here it's not a constant, so we must
> +   generate a vector splat/subtract to do the adjustment.  */
> +
> +void
> +altivec_expand_vec_perm_le (rtx operands[4])
> +{
> +  rtx splat, unspec;
> +  rtx target = operands[0];
> +  rtx op0 = operands[1];
> +  rtx op1 = operands[2];
> +  rtx sel = operands[3];
> +  rtx tmp = target;
> +
> +  /* Get everything in regs so the pattern matches.  */
> +  if (!REG_P (op0))
> +    op0 = force_reg (V16QImode, op0);
> +  if (!REG_P (op1))
> +    op1 = force_reg (V16QImode, op1);
> +  if (!REG_P (sel))
> +    sel = force_reg (V16QImode, sel);
> +  if (!REG_P (target))
> +    tmp = gen_reg_rtx (V16QImode);
> +
> +  /* SEL = splat(31) - SEL.  */
> +  /* We want to subtract from 31, but we can't vspltisb 31 since
> +     it's out of range.  -1 works as well because only the low-order
> +     five bits of the permute control vector elements are used.  */
> +  splat = gen_rtx_VEC_DUPLICATE (V16QImode,
> +				 gen_rtx_CONST_INT (QImode, -1));
> +  emit_move_insn (tmp, splat);
> +  sel = gen_rtx_MINUS (V16QImode, tmp, sel);
> +  emit_move_insn (tmp, sel);
> +
> +  /* Permute with operands reversed and adjusted selector.  */
> +  unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, tmp),
> +			   UNSPEC_VPERM);
> +
> +  /* Copy into target, possibly by way of a register.  */
> +  if (!REG_P (target))
> +    {
> +      emit_move_insn (tmp, unspec);
> +      unspec = tmp;
> +    }
> +
> +  emit_move_insn (target, unspec);
> +}
> +
>  /* Expand an Altivec constant permutation.  Return true if we match
>     an efficient implementation; false to fall back to VPERM.  */
>  
> @@ -28829,17 +29092,23 @@ altivec_expand_vec_perm_const (rtx opera
>        {  1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 } },
>      { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum,
>        {  2,  3,  6,  7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 } },
> -    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb,
> +    { OPTION_MASK_ALTIVEC, 
> +      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb : CODE_FOR_altivec_vmrglb,
>        {  0, 16,  1, 17,  2, 18,  3, 19,  4, 20,  5, 21,  6, 22,  7, 23 } },
> -    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh,
> +    { OPTION_MASK_ALTIVEC,
> +      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh : CODE_FOR_altivec_vmrglh,
>        {  0,  1, 16, 17,  2,  3, 18, 19,  4,  5, 20, 21,  6,  7, 22, 23 } },
> -    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw,
> +    { OPTION_MASK_ALTIVEC,
> +      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw : CODE_FOR_altivec_vmrglw,
>        {  0,  1,  2,  3, 16, 17, 18, 19,  4,  5,  6,  7, 20, 21, 22, 23 } },
> -    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb,
> +    { OPTION_MASK_ALTIVEC,
> +      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb : CODE_FOR_altivec_vmrghb,
>        {  8, 24,  9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31 } },
> -    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh,
> +    { OPTION_MASK_ALTIVEC,
> +      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh : CODE_FOR_altivec_vmrghh,
>        {  8,  9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31 } },
> -    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw,
> +    { OPTION_MASK_ALTIVEC,
> +      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw : CODE_FOR_altivec_vmrghw,
>        {  8,  9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } },
>      { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew,
>        {  0,  1,  2,  3, 16, 17, 18, 19,  8,  9, 10, 11, 24, 25, 26, 27 } },
> @@ -28901,6 +29170,8 @@ altivec_expand_vec_perm_const (rtx opera
>  	  break;
>        if (i == 16)
>  	{
> +          if (!BYTES_BIG_ENDIAN)
> +            elt = 15 - elt;
>  	  emit_insn (gen_altivec_vspltb (target, op0, GEN_INT (elt)));
>  	  return true;
>  	}
> @@ -28912,9 +29183,10 @@ altivec_expand_vec_perm_const (rtx opera
>  	      break;
>  	  if (i == 16)
>  	    {
> +	      int field = BYTES_BIG_ENDIAN ? elt / 2 : 7 - elt / 2;
>  	      x = gen_reg_rtx (V8HImode);
>  	      emit_insn (gen_altivec_vsplth (x, gen_lowpart (V8HImode, op0),
> -					     GEN_INT (elt / 2)));
> +					     GEN_INT (field)));
>  	      emit_move_insn (target, gen_lowpart (V16QImode, x));
>  	      return true;
>  	    }
> @@ -28930,9 +29202,10 @@ altivec_expand_vec_perm_const (rtx opera
>  	      break;
>  	  if (i == 16)
>  	    {
> +	      int field = BYTES_BIG_ENDIAN ? elt / 4 : 3 - elt / 4;
>  	      x = gen_reg_rtx (V4SImode);
>  	      emit_insn (gen_altivec_vspltw (x, gen_lowpart (V4SImode, op0),
> -					     GEN_INT (elt / 4)));
> +					     GEN_INT (field)));
>  	      emit_move_insn (target, gen_lowpart (V16QImode, x));
>  	      return true;
>  	    }
> @@ -28970,7 +29243,30 @@ altivec_expand_vec_perm_const (rtx opera
>  	  enum machine_mode omode = insn_data[icode].operand[0].mode;
>  	  enum machine_mode imode = insn_data[icode].operand[1].mode;
>  
> -	  if (swapped)
> +	  /* For little-endian, don't use vpkuwum and vpkuhum if the
> +	     underlying vector type is not V4SI and V8HI, respectively.
> +	     For example, using vpkuwum with a V8HI picks up the even
> +	     halfwords (BE numbering) when the even halfwords (LE
> +	     numbering) are what we need.  */
> +	  if (!BYTES_BIG_ENDIAN
> +	      && icode == CODE_FOR_altivec_vpkuwum
> +	      && ((GET_CODE (op0) == REG
> +		   && GET_MODE (op0) != V4SImode)
> +		  || (GET_CODE (op0) == SUBREG
> +		      && GET_MODE (XEXP (op0, 0)) != V4SImode)))
> +	    continue;
> +	  if (!BYTES_BIG_ENDIAN
> +	      && icode == CODE_FOR_altivec_vpkuhum
> +	      && ((GET_CODE (op0) == REG
> +		   && GET_MODE (op0) != V8HImode)
> +		  || (GET_CODE (op0) == SUBREG
> +		      && GET_MODE (XEXP (op0, 0)) != V8HImode)))
> +	    continue;
> +
> +          /* For little-endian, the two input operands must be swapped
> +             (or swapped back) to ensure proper right-to-left numbering
> +             from 0 to 2N-1.  */
> +	  if (swapped ^ !BYTES_BIG_ENDIAN)
>  	    x = op0, op0 = op1, op1 = x;
>  	  if (imode != V16QImode)
>  	    {
> @@ -28988,6 +29284,12 @@ altivec_expand_vec_perm_const (rtx opera
>  	}
>      }
>  
> +  if (!BYTES_BIG_ENDIAN)
> +    {
> +      altivec_expand_vec_perm_const_le (operands);
> +      return true;
> +    }
> +
>    return false;
>  }
>  
> @@ -29037,6 +29339,21 @@ rs6000_expand_vec_perm_const_1 (rtx targ
>        gcc_assert (GET_MODE_NUNITS (vmode) == 2);
>        dmode = mode_for_vector (GET_MODE_INNER (vmode), 4);
>  
> +      /* For little endian, swap operands and invert/swap selectors
> +	 to get the correct xxpermdi.  The operand swap sets up the
> +	 inputs as a little endian array.  The selectors are swapped
> +	 because they are defined to use big endian ordering.  The
> +	 selectors are inverted to get the correct doublewords for
> +	 little endian ordering.  */
> +      if (!BYTES_BIG_ENDIAN)
> +	{
> +	  int n;
> +	  perm0 = 3 - perm0;
> +	  perm1 = 3 - perm1;
> +	  n = perm0, perm0 = perm1, perm1 = n;
> +	  x = op0, op0 = op1, op1 = x;
> +	}
> +
>        x = gen_rtx_VEC_CONCAT (dmode, op0, op1);
>        v = gen_rtvec (2, GEN_INT (perm0), GEN_INT (perm1));
>        x = gen_rtx_VEC_SELECT (vmode, x, gen_rtx_PARALLEL (VOIDmode, v));
> @@ -29132,7 +29449,7 @@ rs6000_expand_interleave (rtx target, rt
>    unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
>    rtx perm[16];
>  
> -  high = (highp == BYTES_BIG_ENDIAN ? 0 : nelt / 2);
> +  high = (highp ? 0 : nelt / 2);
>    for (i = 0; i < nelt / 2; i++)
>      {
>        perm[i * 2] = GEN_INT (i + high);
> Index: gcc-4_8-test/gcc/config/rs6000/vector.md
> ===================================================================
> --- gcc-4_8-test.orig/gcc/config/rs6000/vector.md
> +++ gcc-4_8-test/gcc/config/rs6000/vector.md
> @@ -88,7 +88,8 @@
>  				 (smax "smax")])
>  
>  
> -;; Vector move instructions.
> +;; Vector move instructions.  Little-endian VSX loads and stores require
> +;; special handling to circumvent "element endianness."
>  (define_expand "mov<mode>"
>    [(set (match_operand:VEC_M 0 "nonimmediate_operand" "")
>  	(match_operand:VEC_M 1 "any_operand" ""))]
> @@ -104,6 +105,16 @@
>  	       && !vlogical_operand (operands[1], <MODE>mode))
>  	operands[1] = force_reg (<MODE>mode, operands[1]);
>      }
> +  if (!BYTES_BIG_ENDIAN
> +      && VECTOR_MEM_VSX_P (<MODE>mode)
> +      && <MODE>mode != TImode
> +      && !gpr_or_gpr_p (operands[0], operands[1])
> +      && (memory_operand (operands[0], <MODE>mode)
> +          ^ memory_operand (operands[1], <MODE>mode)))
> +    {
> +      rs6000_emit_le_vsx_move (operands[0], operands[1], <MODE>mode);
> +      DONE;
> +    }
>  })
>  
>  ;; Generic vector floating point load/store instructions.  These will match
> @@ -862,7 +873,7 @@
>  {
>    rtx reg = gen_reg_rtx (V4SFmode);
>  
> -  rs6000_expand_interleave (reg, operands[1], operands[1], true);
> +  rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN);
>    emit_insn (gen_vsx_xvcvspdp (operands[0], reg));
>    DONE;
>  })
> @@ -874,7 +885,7 @@
>  {
>    rtx reg = gen_reg_rtx (V4SFmode);
>  
> -  rs6000_expand_interleave (reg, operands[1], operands[1], false);
> +  rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN);
>    emit_insn (gen_vsx_xvcvspdp (operands[0], reg));
>    DONE;
>  })
> @@ -886,7 +897,7 @@
>  {
>    rtx reg = gen_reg_rtx (V4SImode);
>  
> -  rs6000_expand_interleave (reg, operands[1], operands[1], true);
> +  rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN);
>    emit_insn (gen_vsx_xvcvsxwdp (operands[0], reg));
>    DONE;
>  })
> @@ -898,7 +909,7 @@
>  {
>    rtx reg = gen_reg_rtx (V4SImode);
>  
> -  rs6000_expand_interleave (reg, operands[1], operands[1], false);
> +  rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN);
>    emit_insn (gen_vsx_xvcvsxwdp (operands[0], reg));
>    DONE;
>  })
> @@ -910,7 +921,7 @@
>  {
>    rtx reg = gen_reg_rtx (V4SImode);
>  
> -  rs6000_expand_interleave (reg, operands[1], operands[1], true);
> +  rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN);
>    emit_insn (gen_vsx_xvcvuxwdp (operands[0], reg));
>    DONE;
>  })
> @@ -922,7 +933,7 @@
>  {
>    rtx reg = gen_reg_rtx (V4SImode);
>  
> -  rs6000_expand_interleave (reg, operands[1], operands[1], false);
> +  rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN);
>    emit_insn (gen_vsx_xvcvuxwdp (operands[0], reg));
>    DONE;
>  })
> @@ -936,8 +947,19 @@
>     (match_operand:V16QI 3 "vlogical_operand" "")]
>    "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
>  {
> -  emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1], operands[2],
> -				       operands[3]));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1],
> +    	      				 operands[2], operands[3]));
> +  else
> +    {
> +      /* We have changed lvsr to lvsl, so to complete the transformation
> +         of vperm for LE, we must swap the inputs.  */
> +      rtx unspec = gen_rtx_UNSPEC (<MODE>mode,
> +                                   gen_rtvec (3, operands[2],
> +                                              operands[1], operands[3]),
> +                                   UNSPEC_VPERM);
> +      emit_move_insn (operands[0], unspec);
> +    }
>    DONE;
>  })
>  
> Index: gcc-4_8-test/gcc/config/rs6000/altivec.md
> ===================================================================
> --- gcc-4_8-test.orig/gcc/config/rs6000/altivec.md
> +++ gcc-4_8-test/gcc/config/rs6000/altivec.md
> @@ -649,7 +649,7 @@
>     convert_move (small_swap, swap, 0);
>   
>     low_product = gen_reg_rtx (V4SImode);
> -   emit_insn (gen_vec_widen_umult_odd_v8hi (low_product, one, two));
> +   emit_insn (gen_altivec_vmulouh (low_product, one, two));
>   
>     high_product = gen_reg_rtx (V4SImode);
>     emit_insn (gen_altivec_vmsumuhm (high_product, one, small_swap, zero));
> @@ -676,10 +676,18 @@
>     emit_insn (gen_vec_widen_smult_even_v8hi (even, operands[1], operands[2]));
>     emit_insn (gen_vec_widen_smult_odd_v8hi (odd, operands[1], operands[2]));
>  
> -   emit_insn (gen_altivec_vmrghw (high, even, odd));
> -   emit_insn (gen_altivec_vmrglw (low, even, odd));
> -
> -   emit_insn (gen_altivec_vpkuwum (operands[0], high, low));
> +   if (BYTES_BIG_ENDIAN)
> +     {
> +       emit_insn (gen_altivec_vmrghw (high, even, odd));
> +       emit_insn (gen_altivec_vmrglw (low, even, odd));
> +       emit_insn (gen_altivec_vpkuwum (operands[0], high, low));
> +     }
> +   else
> +     {
> +       emit_insn (gen_altivec_vmrghw (high, odd, even));
> +       emit_insn (gen_altivec_vmrglw (low, odd, even));
> +       emit_insn (gen_altivec_vpkuwum (operands[0], low, high));
> +     } 
>  
>     DONE;
>  }")
> @@ -967,7 +975,111 @@
>    "vmrgow %0,%1,%2"
>    [(set_attr "type" "vecperm")])
>  
> -(define_insn "vec_widen_umult_even_v16qi"
> +(define_expand "vec_widen_umult_even_v16qi"
> +  [(use (match_operand:V8HI 0 "register_operand" ""))
> +   (use (match_operand:V16QI 1 "register_operand" ""))
> +   (use (match_operand:V16QI 2 "register_operand" ""))]
> +  "TARGET_ALTIVEC"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmuleub (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmuloub (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
> +(define_expand "vec_widen_smult_even_v16qi"
> +  [(use (match_operand:V8HI 0 "register_operand" ""))
> +   (use (match_operand:V16QI 1 "register_operand" ""))
> +   (use (match_operand:V16QI 2 "register_operand" ""))]
> +  "TARGET_ALTIVEC"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmulesb (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmulosb (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
> +(define_expand "vec_widen_umult_even_v8hi"
> +  [(use (match_operand:V4SI 0 "register_operand" ""))
> +   (use (match_operand:V8HI 1 "register_operand" ""))
> +   (use (match_operand:V8HI 2 "register_operand" ""))]
> +  "TARGET_ALTIVEC"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmuleuh (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmulouh (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
> +(define_expand "vec_widen_smult_even_v8hi"
> +  [(use (match_operand:V4SI 0 "register_operand" ""))
> +   (use (match_operand:V8HI 1 "register_operand" ""))
> +   (use (match_operand:V8HI 2 "register_operand" ""))]
> +  "TARGET_ALTIVEC"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmulesh (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmulosh (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
> +(define_expand "vec_widen_umult_odd_v16qi"
> +  [(use (match_operand:V8HI 0 "register_operand" ""))
> +   (use (match_operand:V16QI 1 "register_operand" ""))
> +   (use (match_operand:V16QI 2 "register_operand" ""))]
> +  "TARGET_ALTIVEC"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmuloub (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmuleub (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
> +(define_expand "vec_widen_smult_odd_v16qi"
> +  [(use (match_operand:V8HI 0 "register_operand" ""))
> +   (use (match_operand:V16QI 1 "register_operand" ""))
> +   (use (match_operand:V16QI 2 "register_operand" ""))]
> +  "TARGET_ALTIVEC"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmulosb (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmulesb (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
> +(define_expand "vec_widen_umult_odd_v8hi"
> +  [(use (match_operand:V4SI 0 "register_operand" ""))
> +   (use (match_operand:V8HI 1 "register_operand" ""))
> +   (use (match_operand:V8HI 2 "register_operand" ""))]
> +  "TARGET_ALTIVEC"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmulouh (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmuleuh (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
> +(define_expand "vec_widen_smult_odd_v8hi"
> +  [(use (match_operand:V4SI 0 "register_operand" ""))
> +   (use (match_operand:V8HI 1 "register_operand" ""))
> +   (use (match_operand:V8HI 2 "register_operand" ""))]
> +  "TARGET_ALTIVEC"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmulosh (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmulesh (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
> +(define_insn "altivec_vmuleub"
>    [(set (match_operand:V8HI 0 "register_operand" "=v")
>          (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
>                        (match_operand:V16QI 2 "register_operand" "v")]
> @@ -976,43 +1088,25 @@
>    "vmuleub %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
>  
> -(define_insn "vec_widen_smult_even_v16qi"
> +(define_insn "altivec_vmuloub"
>    [(set (match_operand:V8HI 0 "register_operand" "=v")
>          (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
>                        (match_operand:V16QI 2 "register_operand" "v")]
> -		     UNSPEC_VMULESB))]
> -  "TARGET_ALTIVEC"
> -  "vmulesb %0,%1,%2"
> -  [(set_attr "type" "veccomplex")])
> -
> -(define_insn "vec_widen_umult_even_v8hi"
> -  [(set (match_operand:V4SI 0 "register_operand" "=v")
> -        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
> -                      (match_operand:V8HI 2 "register_operand" "v")]
> -		     UNSPEC_VMULEUH))]
> -  "TARGET_ALTIVEC"
> -  "vmuleuh %0,%1,%2"
> -  [(set_attr "type" "veccomplex")])
> -
> -(define_insn "vec_widen_smult_even_v8hi"
> -  [(set (match_operand:V4SI 0 "register_operand" "=v")
> -        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
> -                      (match_operand:V8HI 2 "register_operand" "v")]
> -		     UNSPEC_VMULESH))]
> +		     UNSPEC_VMULOUB))]
>    "TARGET_ALTIVEC"
> -  "vmulesh %0,%1,%2"
> +  "vmuloub %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
>  
> -(define_insn "vec_widen_umult_odd_v16qi"
> +(define_insn "altivec_vmulesb"
>    [(set (match_operand:V8HI 0 "register_operand" "=v")
>          (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
>                        (match_operand:V16QI 2 "register_operand" "v")]
> -		     UNSPEC_VMULOUB))]
> +		     UNSPEC_VMULESB))]
>    "TARGET_ALTIVEC"
> -  "vmuloub %0,%1,%2"
> +  "vmulesb %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
>  
> -(define_insn "vec_widen_smult_odd_v16qi"
> +(define_insn "altivec_vmulosb"
>    [(set (match_operand:V8HI 0 "register_operand" "=v")
>          (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
>                        (match_operand:V16QI 2 "register_operand" "v")]
> @@ -1021,7 +1115,16 @@
>    "vmulosb %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
>  
> -(define_insn "vec_widen_umult_odd_v8hi"
> +(define_insn "altivec_vmuleuh"
> +  [(set (match_operand:V4SI 0 "register_operand" "=v")
> +        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
> +                      (match_operand:V8HI 2 "register_operand" "v")]
> +		     UNSPEC_VMULEUH))]
> +  "TARGET_ALTIVEC"
> +  "vmuleuh %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
> +(define_insn "altivec_vmulouh"
>    [(set (match_operand:V4SI 0 "register_operand" "=v")
>          (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
>                        (match_operand:V8HI 2 "register_operand" "v")]
> @@ -1030,7 +1133,16 @@
>    "vmulouh %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
>  
> -(define_insn "vec_widen_smult_odd_v8hi"
> +(define_insn "altivec_vmulesh"
> +  [(set (match_operand:V4SI 0 "register_operand" "=v")
> +        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
> +                      (match_operand:V8HI 2 "register_operand" "v")]
> +		     UNSPEC_VMULESH))]
> +  "TARGET_ALTIVEC"
> +  "vmulesh %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
> +(define_insn "altivec_vmulosh"
>    [(set (match_operand:V4SI 0 "register_operand" "=v")
>          (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
>                        (match_operand:V8HI 2 "register_operand" "v")]
> @@ -1047,7 +1159,13 @@
>                        (match_operand:V4SI 2 "register_operand" "v")]
>  		     UNSPEC_VPKPX))]
>    "TARGET_ALTIVEC"
> -  "vpkpx %0,%1,%2"
> +  "*
> +  {
> +    if (BYTES_BIG_ENDIAN)
> +      return \"vpkpx %0,%1,%2\";
> +    else
> +      return \"vpkpx %0,%2,%1\";
> +  }"
>    [(set_attr "type" "vecperm")])
>  
>  (define_insn "altivec_vpks<VI_char>ss"
> @@ -1056,7 +1174,13 @@
>  			    (match_operand:VP 2 "register_operand" "v")]
>  			   UNSPEC_VPACK_SIGN_SIGN_SAT))]
>    "<VI_unit>"
> -  "vpks<VI_char>ss %0,%1,%2"
> +  "*
> +  {
> +    if (BYTES_BIG_ENDIAN)
> +      return \"vpks<VI_char>ss %0,%1,%2\";
> +    else
> +      return \"vpks<VI_char>ss %0,%2,%1\";
> +  }"
>    [(set_attr "type" "vecperm")])
>  
>  (define_insn "altivec_vpks<VI_char>us"
> @@ -1065,7 +1189,13 @@
>  			    (match_operand:VP 2 "register_operand" "v")]
>  			   UNSPEC_VPACK_SIGN_UNS_SAT))]
>    "<VI_unit>"
> -  "vpks<VI_char>us %0,%1,%2"
> +  "*
> +  {
> +    if (BYTES_BIG_ENDIAN)
> +      return \"vpks<VI_char>us %0,%1,%2\";
> +    else
> +      return \"vpks<VI_char>us %0,%2,%1\";
> +  }"
>    [(set_attr "type" "vecperm")])
>  
>  (define_insn "altivec_vpku<VI_char>us"
> @@ -1074,7 +1204,13 @@
>  			    (match_operand:VP 2 "register_operand" "v")]
>  			   UNSPEC_VPACK_UNS_UNS_SAT))]
>    "<VI_unit>"
> -  "vpku<VI_char>us %0,%1,%2"
> +  "*
> +  {
> +    if (BYTES_BIG_ENDIAN)
> +      return \"vpku<VI_char>us %0,%1,%2\";
> +    else
> +      return \"vpku<VI_char>us %0,%2,%1\";
> +  }"
>    [(set_attr "type" "vecperm")])
>  
>  (define_insn "altivec_vpku<VI_char>um"
> @@ -1083,7 +1219,13 @@
>  			    (match_operand:VP 2 "register_operand" "v")]
>  			   UNSPEC_VPACK_UNS_UNS_MOD))]
>    "<VI_unit>"
> -  "vpku<VI_char>um %0,%1,%2"
> +  "*
> +  {
> +    if (BYTES_BIG_ENDIAN)
> +      return \"vpku<VI_char>um %0,%1,%2\";
> +    else
> +      return \"vpku<VI_char>um %0,%2,%1\";
> +  }"
>    [(set_attr "type" "vecperm")])
>  
>  (define_insn "*altivec_vrl<VI_char>"
> @@ -1276,7 +1418,12 @@
>  		       (match_operand:V16QI 3 "register_operand" "")]
>  		      UNSPEC_VPERM))]
>    "TARGET_ALTIVEC"
> -  "")
> +{
> +  if (!BYTES_BIG_ENDIAN) {
> +    altivec_expand_vec_perm_le (operands);
> +    DONE;
> +  }
> +})
>  
>  (define_expand "vec_perm_constv16qi"
>    [(match_operand:V16QI 0 "register_operand" "")
> @@ -1928,25 +2075,26 @@
>    rtx vzero = gen_reg_rtx (V8HImode);
>    rtx mask = gen_reg_rtx (V16QImode);
>    rtvec v = rtvec_alloc (16);
> +  bool be = BYTES_BIG_ENDIAN;
>     
>    emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
>     
> -  RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 0);
> -  RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 1);
> -  RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 2);
> -  RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 3);
> -  RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 4);
> -  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 5);
> -  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 6);
> -  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 7);
> +  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 :  7);
> +  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ?  0 : 16);
> +  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ? 16 :  6);
> +  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  1 : 16);
> +  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 :  5);
> +  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ?  2 : 16);
> +  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 16 :  4);
> +  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ?  3 : 16);
> +  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 :  3);
> +  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ?  4 : 16);
> +  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 :  2);
> +  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ?  5 : 16);
> +  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  1);
> +  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ?  6 : 16);
> +  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  0);
> +  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
>  
>    emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
> @@ -1963,25 +2111,26 @@
>    rtx vzero = gen_reg_rtx (V4SImode);
>    rtx mask = gen_reg_rtx (V16QImode);
>    rtvec v = rtvec_alloc (16);
> +  bool be = BYTES_BIG_ENDIAN;
>  
>    emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
>   
> -  RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 17);
> -  RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 0);
> -  RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 1);
> -  RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 17);
> -  RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 2);
> -  RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 3);
> -  RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 17);
> -  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 4);
> -  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 5);
> -  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 17);
> -  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 6);
> -  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 7);
> +  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 :  7);
> +  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ? 17 :  6);
> +  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ?  0 : 17);
> +  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  1 : 16);
> +  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 :  5);
> +  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ? 17 :  4);
> +  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ?  2 : 17);
> +  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ?  3 : 16);
> +  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 :  3);
> +  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ? 17 :  2);
> +  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ?  4 : 17);
> +  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ?  5 : 16);
> +  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  1);
> +  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 :  0);
> +  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ?  6 : 17);
> +  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
>  
>    emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
> @@ -1998,25 +2147,26 @@
>    rtx vzero = gen_reg_rtx (V8HImode);
>    rtx mask = gen_reg_rtx (V16QImode);
>    rtvec v = rtvec_alloc (16);
> +  bool be = BYTES_BIG_ENDIAN;
>  
>    emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
>  
> -  RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 8);
> -  RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 9);
> -  RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 10);
> -  RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 11);
> -  RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 12);
> -  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 13);
> -  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 14);
> -  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 15);
> +  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15);
> +  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ?  8 : 16);
> +  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ? 16 : 14);
> +  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  9 : 16);
> +  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13);
> +  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ? 10 : 16);
> +  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 16 : 12);
> +  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16);
> +  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11);
> +  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ? 12 : 16);
> +  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 10);
> +  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16);
> +  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  9);
> +  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 14 : 16);
> +  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  8);
> +  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
>  
>    emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
> @@ -2033,25 +2183,26 @@
>    rtx vzero = gen_reg_rtx (V4SImode);
>    rtx mask = gen_reg_rtx (V16QImode);
>    rtvec v = rtvec_alloc (16);
> +  bool be = BYTES_BIG_ENDIAN;
>  
>    emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
>   
> -  RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 17);
> -  RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 8);
> -  RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 9);
> -  RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 17);
> -  RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 10);
> -  RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 11);
> -  RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 17);
> -  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 12);
> -  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 13);
> -  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 17);
> -  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 14);
> -  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 15);
> +  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15);
> +  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ? 17 : 14);
> +  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ?  8 : 17);
> +  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  9 : 16);
> +  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13);
> +  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ? 17 : 12);
> +  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 10 : 17);
> +  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16);
> +  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11);
> +  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ? 17 : 10);
> +  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 12 : 17);
> +  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16);
> +  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  9);
> +  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 :  8);
> +  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 14 : 17);
> +  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
>  
>    emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
> @@ -2071,7 +2222,10 @@
>    
>    emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2]));
>    emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
> +  else
> +    emit_insn (gen_altivec_vmrghh (operands[0], vo, ve));
>    DONE;
>  }")
>  
> @@ -2088,7 +2242,10 @@
>    
>    emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2]));
>    emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
> +  else
> +    emit_insn (gen_altivec_vmrglh (operands[0], vo, ve));
>    DONE;
>  }")
>  
> @@ -2105,7 +2262,10 @@
>    
>    emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2]));
>    emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
> +  else
> +    emit_insn (gen_altivec_vmrghh (operands[0], vo, ve));
>    DONE;
>  }")
>  
> @@ -2122,7 +2282,10 @@
>    
>    emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2]));
>    emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
> +  else
> +    emit_insn (gen_altivec_vmrglh (operands[0], vo, ve));
>    DONE;
>  }")
>  
> @@ -2139,7 +2302,10 @@
>    
>    emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2]));
>    emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
> +  else
> +    emit_insn (gen_altivec_vmrghw (operands[0], vo, ve));
>    DONE;
>  }")
>  
> @@ -2156,7 +2322,10 @@
>    
>    emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2]));
>    emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
> +  else
> +    emit_insn (gen_altivec_vmrglw (operands[0], vo, ve));
>    DONE;
>  }")
>  
> @@ -2173,7 +2342,10 @@
>    
>    emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2]));
>    emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
> +  else
> +    emit_insn (gen_altivec_vmrghw (operands[0], vo, ve));
>    DONE;
>  }")
>  
> @@ -2190,7 +2362,10 @@
>    
>    emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2]));
>    emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
> +  else
> +    emit_insn (gen_altivec_vmrglw (operands[0], vo, ve));
>    DONE;
>  }")
>  
> Index: gcc-4_8-test/gcc/config/rs6000/rs6000-protos.h
> ===================================================================
> --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000-protos.h
> +++ gcc-4_8-test/gcc/config/rs6000/rs6000-protos.h
> @@ -56,6 +56,7 @@ extern void paired_expand_vector_init (r
>  extern void rs6000_expand_vector_set (rtx, rtx, int);
>  extern void rs6000_expand_vector_extract (rtx, rtx, int);
>  extern bool altivec_expand_vec_perm_const (rtx op[4]);
> +extern void altivec_expand_vec_perm_le (rtx op[4]);
>  extern bool rs6000_expand_vec_perm_const (rtx op[4]);
>  extern void rs6000_expand_extract_even (rtx, rtx, rtx);
>  extern void rs6000_expand_interleave (rtx, rtx, rtx, bool);
> @@ -122,6 +123,7 @@ extern rtx rs6000_longcall_ref (rtx);
>  extern void rs6000_fatal_bad_address (rtx);
>  extern rtx create_TOC_reference (rtx, rtx);
>  extern void rs6000_split_multireg_move (rtx, rtx);
> +extern void rs6000_emit_le_vsx_move (rtx, rtx, enum machine_mode);
>  extern void rs6000_emit_move (rtx, rtx, enum machine_mode);
>  extern rtx rs6000_secondary_memory_needed_rtx (enum machine_mode);
>  extern rtx (*rs6000_legitimize_reload_address_ptr) (rtx, enum machine_mode,
> Index: gcc-4_8-test/gcc/config/rs6000/vsx.md
> ===================================================================
> --- gcc-4_8-test.orig/gcc/config/rs6000/vsx.md
> +++ gcc-4_8-test/gcc/config/rs6000/vsx.md
> @@ -216,6 +216,359 @@
>    ])
>  
>  ;; VSX moves
> +
> +;; The patterns for LE permuted loads and stores come before the general
> +;; VSX moves so they match first.
> +(define_insn_and_split "*vsx_le_perm_load_<mode>"
> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
> +        (match_operand:VSX_D 1 "memory_operand" "Z"))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  "#"
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  [(set (match_dup 2)
> +        (vec_select:<MODE>
> +          (match_dup 1)
> +          (parallel [(const_int 1) (const_int 0)])))
> +   (set (match_dup 0)
> +        (vec_select:<MODE>
> +          (match_dup 2)
> +          (parallel [(const_int 1) (const_int 0)])))]
> +  "
> +{
> +  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
> +                                       : operands[0];
> +}
> +  "
> +  [(set_attr "type" "vecload")
> +   (set_attr "length" "8")])
> +
> +(define_insn_and_split "*vsx_le_perm_load_<mode>"
> +  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
> +        (match_operand:VSX_W 1 "memory_operand" "Z"))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  "#"
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  [(set (match_dup 2)
> +        (vec_select:<MODE>
> +          (match_dup 1)
> +          (parallel [(const_int 2) (const_int 3)
> +                     (const_int 0) (const_int 1)])))
> +   (set (match_dup 0)
> +        (vec_select:<MODE>
> +          (match_dup 2)
> +          (parallel [(const_int 2) (const_int 3)
> +                     (const_int 0) (const_int 1)])))]
> +  "
> +{
> +  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
> +                                       : operands[0];
> +}
> +  "
> +  [(set_attr "type" "vecload")
> +   (set_attr "length" "8")])
> +
> +(define_insn_and_split "*vsx_le_perm_load_v8hi"
> +  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
> +        (match_operand:V8HI 1 "memory_operand" "Z"))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  "#"
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  [(set (match_dup 2)
> +        (vec_select:V8HI
> +          (match_dup 1)
> +          (parallel [(const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)])))
> +   (set (match_dup 0)
> +        (vec_select:V8HI
> +          (match_dup 2)
> +          (parallel [(const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)])))]
> +  "
> +{
> +  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
> +                                       : operands[0];
> +}
> +  "
> +  [(set_attr "type" "vecload")
> +   (set_attr "length" "8")])
> +
> +(define_insn_and_split "*vsx_le_perm_load_v16qi"
> +  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
> +        (match_operand:V16QI 1 "memory_operand" "Z"))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  "#"
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  [(set (match_dup 2)
> +        (vec_select:V16QI
> +          (match_dup 1)
> +          (parallel [(const_int 8) (const_int 9)
> +                     (const_int 10) (const_int 11)
> +                     (const_int 12) (const_int 13)
> +                     (const_int 14) (const_int 15)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)
> +                     (const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)])))
> +   (set (match_dup 0)
> +        (vec_select:V16QI
> +          (match_dup 2)
> +          (parallel [(const_int 8) (const_int 9)
> +                     (const_int 10) (const_int 11)
> +                     (const_int 12) (const_int 13)
> +                     (const_int 14) (const_int 15)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)
> +                     (const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)])))]
> +  "
> +{
> +  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
> +                                       : operands[0];
> +}
> +  "
> +  [(set_attr "type" "vecload")
> +   (set_attr "length" "8")])
> +
> +(define_insn "*vsx_le_perm_store_<mode>"
> +  [(set (match_operand:VSX_D 0 "memory_operand" "=Z")
> +        (match_operand:VSX_D 1 "vsx_register_operand" "+wa"))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  "#"
> +  [(set_attr "type" "vecstore")
> +   (set_attr "length" "12")])
> +
> +(define_split
> +  [(set (match_operand:VSX_D 0 "memory_operand" "")
> +        (match_operand:VSX_D 1 "vsx_register_operand" ""))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
> +  [(set (match_dup 2)
> +        (vec_select:<MODE>
> +          (match_dup 1)
> +          (parallel [(const_int 1) (const_int 0)])))
> +   (set (match_dup 0)
> +        (vec_select:<MODE>
> +          (match_dup 2)
> +          (parallel [(const_int 1) (const_int 0)])))]
> +{
> +  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) 
> +                                       : operands[1];
> +})
> +
> +;; The post-reload split requires that we re-permute the source
> +;; register in case it is still live.
> +(define_split
> +  [(set (match_operand:VSX_D 0 "memory_operand" "")
> +        (match_operand:VSX_D 1 "vsx_register_operand" ""))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
> +  [(set (match_dup 1)
> +        (vec_select:<MODE>
> +          (match_dup 1)
> +          (parallel [(const_int 1) (const_int 0)])))
> +   (set (match_dup 0)
> +        (vec_select:<MODE>
> +          (match_dup 1)
> +          (parallel [(const_int 1) (const_int 0)])))
> +   (set (match_dup 1)
> +        (vec_select:<MODE>
> +          (match_dup 1)
> +          (parallel [(const_int 1) (const_int 0)])))]
> +  "")
> +
> +(define_insn "*vsx_le_perm_store_<mode>"
> +  [(set (match_operand:VSX_W 0 "memory_operand" "=Z")
> +        (match_operand:VSX_W 1 "vsx_register_operand" "+wa"))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  "#"
> +  [(set_attr "type" "vecstore")
> +   (set_attr "length" "12")])
> +
> +(define_split
> +  [(set (match_operand:VSX_W 0 "memory_operand" "")
> +        (match_operand:VSX_W 1 "vsx_register_operand" ""))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
> +  [(set (match_dup 2)
> +        (vec_select:<MODE>
> +          (match_dup 1)
> +          (parallel [(const_int 2) (const_int 3)
> +	             (const_int 0) (const_int 1)])))
> +   (set (match_dup 0)
> +        (vec_select:<MODE>
> +          (match_dup 2)
> +          (parallel [(const_int 2) (const_int 3)
> +	             (const_int 0) (const_int 1)])))]
> +{
> +  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) 
> +                                       : operands[1];
> +})
> +
> +;; The post-reload split requires that we re-permute the source
> +;; register in case it is still live.
> +(define_split
> +  [(set (match_operand:VSX_W 0 "memory_operand" "")
> +        (match_operand:VSX_W 1 "vsx_register_operand" ""))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
> +  [(set (match_dup 1)
> +        (vec_select:<MODE>
> +          (match_dup 1)
> +          (parallel [(const_int 2) (const_int 3)
> +	             (const_int 0) (const_int 1)])))
> +   (set (match_dup 0)
> +        (vec_select:<MODE>
> +          (match_dup 1)
> +          (parallel [(const_int 2) (const_int 3)
> +	             (const_int 0) (const_int 1)])))
> +   (set (match_dup 1)
> +        (vec_select:<MODE>
> +          (match_dup 1)
> +          (parallel [(const_int 2) (const_int 3)
> +	             (const_int 0) (const_int 1)])))]
> +  "")
> +
> +(define_insn "*vsx_le_perm_store_v8hi"
> +  [(set (match_operand:V8HI 0 "memory_operand" "=Z")
> +        (match_operand:V8HI 1 "vsx_register_operand" "+wa"))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  "#"
> +  [(set_attr "type" "vecstore")
> +   (set_attr "length" "12")])
> +
> +(define_split
> +  [(set (match_operand:V8HI 0 "memory_operand" "")
> +        (match_operand:V8HI 1 "vsx_register_operand" ""))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
> +  [(set (match_dup 2)
> +        (vec_select:V8HI
> +          (match_dup 1)
> +          (parallel [(const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)])))
> +   (set (match_dup 0)
> +        (vec_select:V8HI
> +          (match_dup 2)
> +          (parallel [(const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)])))]
> +{
> +  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) 
> +                                       : operands[1];
> +})
> +
> +;; The post-reload split requires that we re-permute the source
> +;; register in case it is still live.
> +(define_split
> +  [(set (match_operand:V8HI 0 "memory_operand" "")
> +        (match_operand:V8HI 1 "vsx_register_operand" ""))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
> +  [(set (match_dup 1)
> +        (vec_select:V8HI
> +          (match_dup 1)
> +          (parallel [(const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)])))
> +   (set (match_dup 0)
> +        (vec_select:V8HI
> +          (match_dup 1)
> +          (parallel [(const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)])))
> +   (set (match_dup 1)
> +        (vec_select:V8HI
> +          (match_dup 1)
> +          (parallel [(const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)])))]
> +  "")
> +
> +(define_insn "*vsx_le_perm_store_v16qi"
> +  [(set (match_operand:V16QI 0 "memory_operand" "=Z")
> +        (match_operand:V16QI 1 "vsx_register_operand" "+wa"))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  "#"
> +  [(set_attr "type" "vecstore")
> +   (set_attr "length" "12")])
> +
> +(define_split
> +  [(set (match_operand:V16QI 0 "memory_operand" "")
> +        (match_operand:V16QI 1 "vsx_register_operand" ""))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
> +  [(set (match_dup 2)
> +        (vec_select:V16QI
> +          (match_dup 1)
> +          (parallel [(const_int 8) (const_int 9)
> +                     (const_int 10) (const_int 11)
> +                     (const_int 12) (const_int 13)
> +                     (const_int 14) (const_int 15)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)
> +                     (const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)])))
> +   (set (match_dup 0)
> +        (vec_select:V16QI
> +          (match_dup 2)
> +          (parallel [(const_int 8) (const_int 9)
> +                     (const_int 10) (const_int 11)
> +                     (const_int 12) (const_int 13)
> +                     (const_int 14) (const_int 15)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)
> +                     (const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)])))]
> +{
> +  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) 
> +                                       : operands[1];
> +})
> +
> +;; The post-reload split requires that we re-permute the source
> +;; register in case it is still live.
> +(define_split
> +  [(set (match_operand:V16QI 0 "memory_operand" "")
> +        (match_operand:V16QI 1 "vsx_register_operand" ""))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
> +  [(set (match_dup 1)
> +        (vec_select:V16QI
> +          (match_dup 1)
> +          (parallel [(const_int 8) (const_int 9)
> +                     (const_int 10) (const_int 11)
> +                     (const_int 12) (const_int 13)
> +                     (const_int 14) (const_int 15)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)
> +                     (const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)])))
> +   (set (match_dup 0)
> +        (vec_select:V16QI
> +          (match_dup 1)
> +          (parallel [(const_int 8) (const_int 9)
> +                     (const_int 10) (const_int 11)
> +                     (const_int 12) (const_int 13)
> +                     (const_int 14) (const_int 15)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)
> +                     (const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)])))
> +   (set (match_dup 1)
> +        (vec_select:V16QI
> +          (match_dup 1)
> +          (parallel [(const_int 8) (const_int 9)
> +                     (const_int 10) (const_int 11)
> +                     (const_int 12) (const_int 13)
> +                     (const_int 14) (const_int 15)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)
> +                     (const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)])))]
> +  "")
> +
> +
>  (define_insn "*vsx_mov<mode>"
>    [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=Z,<VSr>,<VSr>,?Z,?wa,?wa,wQ,?&r,??Y,??r,??r,<VSr>,?wa,*r,v,wZ, v")
>  	(match_operand:VSX_M 1 "input_operand" "<VSr>,Z,<VSr>,wa,Z,wa,r,wQ,r,Y,r,j,j,j,W,v,wZ"))]
> @@ -962,7 +1315,12 @@
>  	 (match_operand:<VS_scalar> 1 "vsx_register_operand" "ws,wa")
>  	 (match_operand:<VS_scalar> 2 "vsx_register_operand" "ws,wa")))]
>    "VECTOR_MEM_VSX_P (<MODE>mode)"
> -  "xxpermdi %x0,%x1,%x2,0"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    return "xxpermdi %x0,%x1,%x2,0";
> +  else
> +    return "xxpermdi %x0,%x2,%x1,0";
> +}
>    [(set_attr "type" "vecperm")])
>  
>  ;; Special purpose concat using xxpermdi to glue two single precision values
> @@ -975,9 +1333,161 @@
>  	  (match_operand:SF 2 "vsx_register_operand" "f,f")]
>  	 UNSPEC_VSX_CONCAT))]
>    "VECTOR_MEM_VSX_P (V2DFmode)"
> -  "xxpermdi %x0,%x1,%x2,0"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    return "xxpermdi %x0,%x1,%x2,0";
> +  else
> +    return "xxpermdi %x0,%x2,%x1,0";
> +}
> +  [(set_attr "type" "vecperm")])
> +
> +;; xxpermdi for little endian loads and stores.  We need several of
> +;; these since the form of the PARALLEL differs by mode.
> +(define_insn "*vsx_xxpermdi2_le_<mode>"
> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
> +        (vec_select:VSX_D
> +          (match_operand:VSX_D 1 "vsx_register_operand" "wa")
> +          (parallel [(const_int 1) (const_int 0)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
> +  "xxpermdi %x0,%x1,%x1,2"
> +  [(set_attr "type" "vecperm")])
> +
> +(define_insn "*vsx_xxpermdi4_le_<mode>"
> +  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
> +        (vec_select:VSX_W
> +          (match_operand:VSX_W 1 "vsx_register_operand" "wa")
> +          (parallel [(const_int 2) (const_int 3)
> +                     (const_int 0) (const_int 1)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
> +  "xxpermdi %x0,%x1,%x1,2"
> +  [(set_attr "type" "vecperm")])
> +
> +(define_insn "*vsx_xxpermdi8_le_V8HI"
> +  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
> +        (vec_select:V8HI
> +          (match_operand:V8HI 1 "vsx_register_operand" "wa")
> +          (parallel [(const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)"
> +  "xxpermdi %x0,%x1,%x1,2"
> +  [(set_attr "type" "vecperm")])
> +
> +(define_insn "*vsx_xxpermdi16_le_V16QI"
> +  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
> +        (vec_select:V16QI
> +          (match_operand:V16QI 1 "vsx_register_operand" "wa")
> +          (parallel [(const_int 8) (const_int 9)
> +                     (const_int 10) (const_int 11)
> +                     (const_int 12) (const_int 13)
> +                     (const_int 14) (const_int 15)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)
> +                     (const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)"
> +  "xxpermdi %x0,%x1,%x1,2"
>    [(set_attr "type" "vecperm")])
>  
> +;; lxvd2x for little endian loads.  We need several of
> +;; these since the form of the PARALLEL differs by mode.
> +(define_insn "*vsx_lxvd2x2_le_<mode>"
> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
> +        (vec_select:VSX_D
> +          (match_operand:VSX_D 1 "memory_operand" "Z")
> +          (parallel [(const_int 1) (const_int 0)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
> +  "lxvd2x %x0,%y1"
> +  [(set_attr "type" "vecload")])
> +
> +(define_insn "*vsx_lxvd2x4_le_<mode>"
> +  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
> +        (vec_select:VSX_W
> +          (match_operand:VSX_W 1 "memory_operand" "Z")
> +          (parallel [(const_int 2) (const_int 3)
> +                     (const_int 0) (const_int 1)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
> +  "lxvd2x %x0,%y1"
> +  [(set_attr "type" "vecload")])
> +
> +(define_insn "*vsx_lxvd2x8_le_V8HI"
> +  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
> +        (vec_select:V8HI
> +          (match_operand:V8HI 1 "memory_operand" "Z")
> +          (parallel [(const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)"
> +  "lxvd2x %x0,%y1"
> +  [(set_attr "type" "vecload")])
> +
> +(define_insn "*vsx_lxvd2x16_le_V16QI"
> +  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
> +        (vec_select:V16QI
> +          (match_operand:V16QI 1 "memory_operand" "Z")
> +          (parallel [(const_int 8) (const_int 9)
> +                     (const_int 10) (const_int 11)
> +                     (const_int 12) (const_int 13)
> +                     (const_int 14) (const_int 15)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)
> +                     (const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)"
> +  "lxvd2x %x0,%y1"
> +  [(set_attr "type" "vecload")])
> +
> +;; stxvd2x for little endian stores.  We need several of
> +;; these since the form of the PARALLEL differs by mode.
> +(define_insn "*vsx_stxvd2x2_le_<mode>"
> +  [(set (match_operand:VSX_D 0 "memory_operand" "=Z")
> +        (vec_select:VSX_D
> +          (match_operand:VSX_D 1 "vsx_register_operand" "wa")
> +          (parallel [(const_int 1) (const_int 0)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
> +  "stxvd2x %x1,%y0"
> +  [(set_attr "type" "vecstore")])
> +
> +(define_insn "*vsx_stxvd2x4_le_<mode>"
> +  [(set (match_operand:VSX_W 0 "memory_operand" "=Z")
> +        (vec_select:VSX_W
> +          (match_operand:VSX_W 1 "vsx_register_operand" "wa")
> +          (parallel [(const_int 2) (const_int 3)
> +                     (const_int 0) (const_int 1)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
> +  "stxvd2x %x1,%y0"
> +  [(set_attr "type" "vecstore")])
> +
> +(define_insn "*vsx_stxvd2x8_le_V8HI"
> +  [(set (match_operand:V8HI 0 "memory_operand" "=Z")
> +        (vec_select:V8HI
> +          (match_operand:V8HI 1 "vsx_register_operand" "wa")
> +          (parallel [(const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)"
> +  "stxvd2x %x1,%y0"
> +  [(set_attr "type" "vecstore")])
> +
> +(define_insn "*vsx_stxvd2x16_le_V16QI"
> +  [(set (match_operand:V16QI 0 "memory_operand" "=Z")
> +        (vec_select:V16QI
> +          (match_operand:V16QI 1 "vsx_register_operand" "wa")
> +          (parallel [(const_int 8) (const_int 9)
> +                     (const_int 10) (const_int 11)
> +                     (const_int 12) (const_int 13)
> +                     (const_int 14) (const_int 15)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)
> +                     (const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)"
> +  "stxvd2x %x1,%y0"
> +  [(set_attr "type" "vecstore")])
> +
>  ;; Set the element of a V2DI/VD2F mode
>  (define_insn "vsx_set_<mode>"
>    [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?wa")
> @@ -987,9 +1497,10 @@
>  		      UNSPEC_VSX_SET))]
>    "VECTOR_MEM_VSX_P (<MODE>mode)"
>  {
> -  if (INTVAL (operands[3]) == 0)
> +  int idx_first = BYTES_BIG_ENDIAN ? 0 : 1;
> +  if (INTVAL (operands[3]) == idx_first)
>      return \"xxpermdi %x0,%x2,%x1,1\";
> -  else if (INTVAL (operands[3]) == 1)
> +  else if (INTVAL (operands[3]) == 1 - idx_first)
>      return \"xxpermdi %x0,%x1,%x2,0\";
>    else
>      gcc_unreachable ();
> @@ -1004,8 +1515,12 @@
>  			[(match_operand:QI 2 "u5bit_cint_operand" "i,i,i")])))]
>    "VECTOR_MEM_VSX_P (<MODE>mode)"
>  {
> +  int fldDM;
>    gcc_assert (UINTVAL (operands[2]) <= 1);
> -  operands[3] = GEN_INT (INTVAL (operands[2]) << 1);
> +  fldDM = INTVAL (operands[2]) << 1;
> +  if (!BYTES_BIG_ENDIAN)
> +    fldDM = 3 - fldDM;
> +  operands[3] = GEN_INT (fldDM);
>    return \"xxpermdi %x0,%x1,%x1,%3\";
>  }
>    [(set_attr "type" "vecperm")])
> @@ -1025,6 +1540,21 @@
>  	(const_string "fpload")))
>     (set_attr "length" "4")])  
>  
> +;; Optimize extracting element 1 from memory for little endian
> +(define_insn "*vsx_extract_<mode>_one_le"
> +  [(set (match_operand:<VS_scalar> 0 "vsx_register_operand" "=ws,d,?wa")
> +	(vec_select:<VS_scalar>
> +	 (match_operand:VSX_D 1 "indexed_or_indirect_operand" "Z,Z,Z")
> +	 (parallel [(const_int 1)])))]
> +  "VECTOR_MEM_VSX_P (<MODE>mode) && !WORDS_BIG_ENDIAN"
> +  "lxsd%U1x %x0,%y1"
> +  [(set (attr "type")
> +      (if_then_else
> +	(match_test "update_indexed_address_mem (operands[1], VOIDmode)")
> +	(const_string "fpload_ux")
> +	(const_string "fpload")))
> +   (set_attr "length" "4")])  
> +
>  ;; Extract a SF element from V4SF
>  (define_insn_and_split "vsx_extract_v4sf"
>    [(set (match_operand:SF 0 "vsx_register_operand" "=f,f")
> @@ -1045,7 +1575,7 @@
>    rtx op2 = operands[2];
>    rtx op3 = operands[3];
>    rtx tmp;
> -  HOST_WIDE_INT ele = INTVAL (op2);
> +  HOST_WIDE_INT ele = BYTES_BIG_ENDIAN ? INTVAL (op2) : 3 - INTVAL (op2);
>  
>    if (ele == 0)
>      tmp = op1;
> Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/fusion.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/fusion.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/fusion.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
>  /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
> +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
>  /* { dg-require-effective-target powerpc_p8vector_ok } */
>  /* { dg-options "-mcpu=power7 -mtune=power8 -O3" } */
>  
> Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr43154.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/pr43154.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr43154.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
>  /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
> +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
>  /* { dg-require-effective-target powerpc_vsx_ok } */
>  /* { dg-options "-O2 -mcpu=power7" } */
>  
> Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c
> @@ -19,19 +19,6 @@ V b4(V x)
>    return __builtin_shuffle(x, (V){ 4,5,6,7, 4,5,6,7, 4,5,6,7, 4,5,6,7, });
>  }
>  
> -V p2(V x, V y)
> -{
> -  return __builtin_shuffle(x, y,
> -	(V){ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 });
> -
> -}
> -
> -V p4(V x, V y)
> -{
> -  return __builtin_shuffle(x, y,
> -	(V){ 2,  3,  6,  7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 });
> -}
> -
>  V h1(V x, V y)
>  {
>    return __builtin_shuffle(x, y,
> @@ -72,5 +59,3 @@ V l4(V x, V y)
>  /* { dg-final { scan-assembler "vspltb" } } */
>  /* { dg-final { scan-assembler "vsplth" } } */
>  /* { dg-final { scan-assembler "vspltw" } } */
> -/* { dg-final { scan-assembler "vpkuhum" } } */
> -/* { dg-final { scan-assembler "vpkuwum" } } */
> Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c
> ===================================================================
> --- /dev/null
> +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
> +/* { dg-options "-O -maltivec -mno-vsx" } */
> +
> +typedef unsigned char V __attribute__((vector_size(16)));
> +
> +V p2(V x, V y)
> +{
> +  return __builtin_shuffle(x, y,
> +	(V){ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 });
> +
> +}
> +
> +V p4(V x, V y)
> +{
> +  return __builtin_shuffle(x, y,
> +	(V){ 2,  3,  6,  7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 });
> +}
> +
> +/* { dg-final { scan-assembler-not "vperm" } } */
> +/* { dg-final { scan-assembler "vpkuhum" } } */
> +/* { dg-final { scan-assembler "vpkuwum" } } */
> Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/eg-5.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vmx/eg-5.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/eg-5.c
> @@ -7,10 +7,17 @@ matvecmul4 (vector float c0, vector floa
>    /* Set result to a vector of f32 0's */
>    vector float result = ((vector float){0.,0.,0.,0.});
>  
> +#ifdef __LITTLE_ENDIAN__
> +  result  = vec_madd (c0, vec_splat (v, 3), result);
> +  result  = vec_madd (c1, vec_splat (v, 2), result);
> +  result  = vec_madd (c2, vec_splat (v, 1), result);
> +  result  = vec_madd (c3, vec_splat (v, 0), result);
> +#else
>    result  = vec_madd (c0, vec_splat (v, 0), result);
>    result  = vec_madd (c1, vec_splat (v, 1), result);
>    result  = vec_madd (c2, vec_splat (v, 2), result);
>    result  = vec_madd (c3, vec_splat (v, 3), result);
> +#endif
>  
>    return result;
>  }
> Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c
> @@ -13,12 +13,27 @@
>  #define DO_INLINE __attribute__ ((always_inline))
>  #define DONT_INLINE __attribute__ ((noinline))
>  
> +#ifdef __LITTLE_ENDIAN__
> +static inline DO_INLINE int inline_me(vector signed short data)
> +{
> +  union {vector signed short v; signed short s[8];} u;
> +  signed short x;
> +  unsigned char x1, x2;
> +
> +  u.v = data;
> +  x = u.s[7];
> +  x1 = (x >> 8) & 0xff;
> +  x2 = x & 0xff;
> +  return ((x2 << 8) | x1);
> +}
> +#else
>  static inline DO_INLINE int inline_me(vector signed short data) 
>  {
>    union {vector signed short v; signed short s[8];} u;
>    u.v = data;
>    return u.s[7];
>  }
> +#endif
>  
>  static DONT_INLINE int foo(vector signed short data)
>  {
> Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/vec-set.c
> ===================================================================
> --- /dev/null
> +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/vec-set.c
> @@ -0,0 +1,14 @@
> +#include "harness.h"
> +
> +vector short
> +vec_set (short m)
> +{
> +  return (vector short){m, 0, 0, 0, 0, 0, 0, 0};
> +}
> +
> +static void test()
> +{
> +  check (vec_all_eq (vec_set (7),
> +		     ((vector short){7, 0, 0, 0, 0, 0, 0, 0})),
> +	 "vec_set");
> +}
> Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/3b-15.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vmx/3b-15.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/3b-15.c
> @@ -3,7 +3,11 @@
>  vector unsigned char
>  f (vector unsigned char a, vector unsigned char b, vector unsigned char c)
>  {
> +#ifdef __BIG_ENDIAN__
>    return vec_perm(a,b,c); 
> +#else
> +  return vec_perm(b,a,c);
> +#endif
>  }
>  
>  static void test()
> @@ -12,8 +16,13 @@ static void test()
>  					    8,9,10,11,12,13,14,15}),
>  		     ((vector unsigned char){70,71,72,73,74,75,76,77,
>  					    78,79,80,81,82,83,84,85}),
> +#ifdef __BIG_ENDIAN__
>  		     ((vector unsigned char){0x1,0x14,0x18,0x10,0x16,0x15,0x19,0x1a,
>  					    0x1c,0x1c,0x1c,0x12,0x8,0x1d,0x1b,0xe})),
> +#else
> +                     ((vector unsigned char){0x1e,0xb,0x7,0xf,0x9,0xa,0x6,0x5,
> +                                            0x3,0x3,0x3,0xd,0x17,0x2,0x4,0x11})),
> +#endif
>  		   ((vector unsigned char){1,74,78,70,76,75,79,80,82,82,82,72,8,83,81,14})),
>  	"f");
>  }
> Index: gcc-4_8-test/libcpp/lex.c
> ===================================================================
> --- gcc-4_8-test.orig/libcpp/lex.c
> +++ gcc-4_8-test/libcpp/lex.c
> @@ -559,8 +559,13 @@ search_line_fast (const uchar *s, const
>       beginning with all ones and shifting in zeros according to the
>       mis-alignment.  The LVSR instruction pulls the exact shift we
>       want from the address.  */
> +#ifdef __BIG_ENDIAN__
>    mask = __builtin_vec_lvsr(0, s);
>    mask = __builtin_vec_perm(zero, ones, mask);
> +#else
> +  mask = __builtin_vec_lvsl(0, s);
> +  mask = __builtin_vec_perm(ones, zero, mask);
> +#endif
>    data &= mask;
>  
>    /* While altivec loads mask addresses, we still need to align S so
> @@ -624,7 +629,11 @@ search_line_fast (const uchar *s, const
>      /* L now contains 0xff in bytes for which we matched one of the
>         relevant characters.  We can find the byte index by finding
>         its bit index and dividing by 8.  */
> +#ifdef __BIG_ENDIAN__
>      l = __builtin_clzl(l) >> 3;
> +#else
> +    l = __builtin_ctzl(l) >> 3;
> +#endif
>      return s + l;
>  
>  #undef N
> Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr48258-1.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/pr48258-1.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr48258-1.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
> +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
>  /* { dg-require-effective-target powerpc_vsx_ok } */
>  /* { dg-options "-O3 -mcpu=power7 -mabi=altivec -ffast-math -fno-unroll-loops" } */
>  /* { dg-final { scan-assembler-times "xvaddsp" 3 } } */
> Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
> @@ -1,4 +1,5 @@
>  /* { dg-require-effective-target vect_int } */
> +/* { dg-skip-if "cost too high" { powerpc*le-*-* } { "*" } { "" } } */
>  
>  #include <stdarg.h>
>  #include "../../tree-vect.h"
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer



More information about the Gcc-patches mailing list