This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [MIPS][LS2][2/5] Vector intrinsics

From: Maxim Kuvyrkov <maxim at codesourcery dot com>
To: gcc-patches <gcc-patches at gcc dot gnu dot org>, Zhang Le <r0bertz at gentoo dot org>, Eric Fisher <joefoxreal at gmail dot com>, rdsandiford at googlemail dot com
Date: Thu, 05 Jun 2008 14:37:26 +0400
Subject: Re: [MIPS][LS2][2/5] Vector intrinsics
References: <4835A9B4.9000709@codesourcery.com> <4835AD3F.7080704@codesourcery.com> <877idm87up.fsf@firetop.home>

Richard Sandiford wrote:

The patch generally looks good.

Hi Richard,

Sorry for the delay, I needed time to refactor the patch to fix issues you pointed out. I've mostly changed parts of the patch for mips.md and mips.c. I think they look much cleaner now. New changes are at the beginning of the patch.

The patch was regtested on gcc, g++ and libstdc++ testsuites for n32, o32 and 64 ABIs with and without -march=loongson2?.

The only new FAIL was gcc.dg/tree-ssa/gen-vect-11c.c when -march=loongson2? is used. This testcase should be fixed to either XFAIL when compiled for loongson or to disable loongson vector instructions (the switch we don't have).

My main concern is the FPR move handling.
It looks like you use MOV.D for 64-bit integer moves, but this is usually
incorrect.  In the standard ISA spec, MOV.D is unpredictable unless
(a) the source is uninterpreted or (b) it has been interpreted as a
double-precision floating-point value.

So: does Loongson specifically exempt itself from this restriction?
Or does it have special MOV.FOO instructions for the new modes?

Either way, the patch is inconsistent.  mips_mode_ok_for_mov_fmt_p
should return true for any mode that can/will be handled by MOV.FMT.

I don't understand why you need FPR<->FPR DImode moves for 32-bit
targets but not 64-bit targets.  (movdi_64bit doesn't have FPR<->FPR
moves either.)

Loongson behaves as generic MIPS III in the respect to moves to and from FP registers. So to handle new modes I added them to MOVE64 and SPLITF mode_iterators and adjusted mips_split_doubleword_move() accordingly.

To handle new vector types I had to change mips_builtin_vector_type() to distinguish between signed and unsigned basic types as new modes are used both for signed and unsigned cases.

Maxim Kuvyrkov <maxim@codesourcery.com> writes:
Index: gcc/testsuite/lib/target-supports.exp =================================================================== --- gcc/testsuite/lib/target-supports.exp (revision 62) +++ gcc/testsuite/lib/target-supports.exp (working copy) @@ -1252,6 +1252,17 @@ proc check_effective_target_arm_neon_hw } "-mfpu=neon -mfloat-abi=softfp"] } +# Return 1 if this a Loongson-2E or -2F target using an ABI that supports +# the Loongson vector modes. + +proc check_effective_target_mips_loongson { } { + return [check_no_compiler_messages loongson assembly { + #if !defined(_MIPS_LOONGSON_VECTOR_MODES) + #error FOO + #endif + }] +} +
I think this is a poor choice of name for a user-visible macro.  "modes"
are an internal gcc concept, and your .h-based API shields the user from
the "__attribute__"s needed to construct the types.

__mips_loongson_vector_rev it is.

+;; Move patterns.
+
+;; Expander to legitimize moves involving values of vector modes.
+(define_expand "mov<mode>"
+  [(set (match_operand:VWHB 0 "nonimmediate_operand")
+	(match_operand:VWHB 1 "move_operand"))]
+  ""
+{
+  if (mips_legitimize_move (<MODE>mode, operands[0], operands[1]))
+    DONE;
+})


Hmm.  This is probably going to cause problems if other ASEs use
the same modes in future, but I guess this is OK until then.

Local style is not to have predicates for move expanders.
The predicates aren't checked, and I think it's confusing
to have an expander with "move_operand" as its predicate,
and to then call a function (mips_legitimize_move) that
deals with non-move_operands.  So:

  [(set (match_operand:VWHB 0)
	(match_operand:VWHB 1)]

Fixed.

+;; Handle legitimized moves between values of vector modes.
+(define_insn "mov<mode>_internal"
+  [(set (match_operand:VWHB 0 "nonimmediate_operand" "=m,f,f,r,f,r,m,f")
+	(match_operand:VWHB 1 "move_operand" "f,m,f,f,r,r,YG,YG"))]

"d" rather than "r".

Fixed.

+  "HAVE_LOONGSON_VECTOR_MODES"
+{
+  return mips_output_move (operands[0], operands[1]);
+}

Local style is to format single-line C blocks as:

  "HAVE_LOONGSON_VECTOR_MODES"
  { return mips_output_move (operands[0], operands[1]); }

+ [(set_attr "type" "fpstore,fpload,*,mfc,mtc,*,fpstore,mtc")


"type" shouldn't be "*", but you fixed this in patch 4.
Please include this fix, and the other type attributes,
in the original loongson.md patch.

Fixed.

+ (set_attr "mode" "<MODE>")])

"mode" set to "DI".

+
+;; Initialization of a vector.
+
+(define_expand "vec_init<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+	(match_operand 1 "" ""))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  {
+    mips_expand_vector_init (operands[0], operands[1]);
+    DONE;
+  }
+)


Expanders shouldn't have constraints.  Inconsistent formatting wrt
previous patterns (which followed local style):

(define_expand "vec_init<mode>"
  [(set (match_operand:VWHB 0 "register_operand")
	(match_operand 1))]
  "HAVE_LOONGSON_VECTOR_MODES"
{
  mips_expand_vector_init (operands[0], operands[1]);
  DONE;
})

Fixed.

+;; Instruction patterns for SIMD instructions.
+
+;; Pack with signed saturation.
+(define_insn "vec_pack_ssat_<mode>"
+  [(set (match_operand:<V_squash_double> 0 "register_operand" "=f")
+        (vec_concat:<V_squash_double>
+	  (ss_truncate:<V_squash> (match_operand:VWH 1 "register_operand" "f"))
+          (ss_truncate:<V_squash> (match_operand:VWH 2 "register_operand" "f")))
+	
+   )]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "packss<V_squash_double_suffix>\t%0,%1,%2"
+)


Inconsistent indentation (tabs vs. spaces by the looks of things).
Inconsistent position for closing ")" (which you fixed in patch 4).

In general, local style is to put ")" and "]" on the same line as the
thing they're closing, even if it means breaking a line.  So:

(define_insn "vec_pack_ssat_<mode>"
  [(set (match_operand:<V_squash_double> 0 "register_operand" "=f")
	(vec_concat:<V_squash_double>
	  (ss_truncate:<V_squash>
	    (match_operand:VWH 1 "register_operand" "f"))
	  (ss_truncate:<V_squash>
	    (match_operand:VWH 2 "register_operand" "f"))))]
  "HAVE_LOONGSON_VECTOR_MODES"
  "packss<V_squash_double_suffix>\t%0,%1,%2")

Other instances.

Fixed.

@@ -494,7 +516,10 @@ ;; 64-bit modes for which we provide move patterns. (define_mode_iterator MOVE64 - [DI DF (V2SF "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")]) + [(DI "!TARGET_64BIT") (DF "!TARGET_64BIT") + (V2SF "!TARGET_64BIT && TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT") + (V2SI "HAVE_LOONGSON_VECTOR_MODES") (V4HI "HAVE_LOONGSON_VECTOR_MODES") + (V8QI "HAVE_LOONGSON_VECTOR_MODES")])
Since we need more than one line, put V2SF and each new entry on its
own line.  The changes to the existing modes aren't right; they aren't
consistent with the comment.

Fixed. Turned out the changes to existing modes weren't necessary at all.

Index: gcc/config/mips/mips.c =================================================================== --- gcc/config/mips/mips.c (revision 62) +++ gcc/config/mips/mips.c (working copy) @@ -3518,6 +3518,23 @@ mips_output_move (rtx dest, rtx src) if (dbl_p && mips_split_64bit_move_p (dest, src)) return "#"; + /* Handle cases where the source is a constant zero vector on + Loongson targets. */ + if (HAVE_LOONGSON_VECTOR_MODES && src_code == CONST_VECTOR) + { + if (dest_code == REG) + { + /* Move constant zero vector to floating-point register. */ + gcc_assert (FP_REG_P (REGNO (dest))); + return "dmtc1\t$0,%0"; + } + else if (dest_code == MEM) + /* Move constant zero vector to memory. */ + return "sd\t$0,%0"; + else + gcc_unreachable (); + } +
Why doesn't the normal zero handling work?

Don't know. I removed this piece and everything worked.

+/* Initialize vector TARGET to VALS.  */
+
+void
+mips_expand_vector_init (rtx target, rtx vals)
+{
+  enum machine_mode mode = GET_MODE (target);
+  enum machine_mode inner = GET_MODE_INNER (mode);
+  unsigned int i, n_elts = GET_MODE_NUNITS (mode);
+  rtx mem;
+
+  gcc_assert (VECTOR_MODE_P (mode));
+
+  mem = assign_stack_temp (mode, GET_MODE_SIZE (mode), 0);
+  for (i = 0; i < n_elts; i++)
+    emit_move_insn (adjust_address_nv (mem, inner, i * GET_MODE_SIZE (inner)),
+                    XVECEXP (vals, 0, i));
+
+  emit_move_insn (target, mem);
+}

Please keep initialisation and code separate.

Fixed.


Do we really want to create a new stack slot for every initialisation?
It seems on the face of it that some sort of reuse would be nice.

I didn't address this last issue. What I don't understand is how we can reuse stack slots given that the accesses to different variables can easily step on each others toes.


--
Maxim

2008-05-22  Mark Shinwell  <shinwell@codesourcery.com>
	    Nathan Sidwell  <nathan@codesourcery.com>
	    Maxim Kuvyrkov  <maxim@codesourcery.com>
	
	* config/mips/mips-modes.def: Add V8QI, V4HI and V2SI modes.
	* config/mips/mips-protos.h (mips_expand_vector_init): New.
	* config/mips/mips-ftypes.def: Add function types for Loongson-2E/2F
	builtins.
	* config/mips/mips.c (mips_split_doubleword_move): Handle new modes.
	(mips_hard_regno_mode_ok_p): Allow 64-bit vector modes for Loongson.
	(mips_vector_mode_supported_p): Add V2SImode, V4HImode and
	V8QImode cases.
	(LOONGSON_BUILTIN): New.
	(mips_loongson_2ef_bdesc): New.
	(mips_bdesc_arrays): Add mips_loongson_2ef_bdesc.
	(mips_builtin_vector_type): Handle unsigned versions of vector modes.
	Add new parameter for that.
	(MIPS_ATYPE_UQI, MIPS_ATYPE_UDI, MIPS_ATYPE_V2SI, MIPS_ATYPE_UV2SI)
	(MIPS_ATYPE_V4HI, MIPS_ATYPE_UV4HI, MIPS_ATYPE_V8QI, MIPS_ATYPE_UV8QI):
	New.
	(mips_init_builtins): Initialize Loongson builtins if
	appropriate.
	(mips_expand_vector_init): New.
	* config/mips/mips.h (HAVE_LOONGSON_VECTOR_MODES): New.
	(TARGET_CPU_CPP_BUILTINS): Define __mips_loongson_vector_rev
	if appropriate.
	* config/mips/mips.md: Add unspec numbers for Loongson
	builtins.  Include loongson.md.
	(MOVE64): Include Loongson vector modes.
	(SPLITF): Include Loongson vector modes.
	(HALFMODE): Handle Loongson vector modes.
	* config/mips/loongson.md: New.
	* config/mips/loongson.h: New.
	* config.gcc: Add loongson.h header for mips*-*-* targets.
	* doc/extend.texi (MIPS Loongson Built-in Functions): New.

2008-05-22  Mark Shinwell  <shinwell@codesourcery.com>

	* lib/target-supports.exp (check_effective_target_mips_loongson): New.
	* gcc.target/mips/loongson-simd.c: New.

--- config/mips/mips.md	(/local/gcc-trunk/gcc)	(revision 373)
+++ config/mips/mips.md	(/local/gcc-2/gcc)	(revision 373)
@@ -213,6 +213,28 @@
    (UNSPEC_DPAQX_SA_W_PH	446)
    (UNSPEC_DPSQX_S_W_PH		447)
    (UNSPEC_DPSQX_SA_W_PH	448)
+
+   ;; ST Microelectronics Loongson-2E/2F.
+   (UNSPEC_LOONGSON_AVERAGE		500)
+   (UNSPEC_LOONGSON_EQ			501)
+   (UNSPEC_LOONGSON_GT			502)
+   (UNSPEC_LOONGSON_EXTRACT_HALFWORD	503)
+   (UNSPEC_LOONGSON_INSERT_HALFWORD_0	504)
+   (UNSPEC_LOONGSON_INSERT_HALFWORD_1	505)
+   (UNSPEC_LOONGSON_INSERT_HALFWORD_2	506)
+   (UNSPEC_LOONGSON_INSERT_HALFWORD_3	507)
+   (UNSPEC_LOONGSON_MULT_ADD		508)
+   (UNSPEC_LOONGSON_MOVE_BYTE_MASK	509)
+   (UNSPEC_LOONGSON_UMUL_HIGHPART	510)
+   (UNSPEC_LOONGSON_SMUL_HIGHPART	511)
+   (UNSPEC_LOONGSON_SMUL_LOWPART	512)
+   (UNSPEC_LOONGSON_UMUL_WORD		513)
+   (UNSPEC_LOONGSON_PASUBUB             514)
+   (UNSPEC_LOONGSON_BIADD		515)
+   (UNSPEC_LOONGSON_PSADBH		516)
+   (UNSPEC_LOONGSON_PSHUFH		517)
+   (UNSPEC_LOONGSON_UNPACK_HIGH		518)
+   (UNSPEC_LOONGSON_UNPACK_LOW		519)
   ]
 )
 
@@ -494,7 +516,11 @@
 
 ;; 64-bit modes for which we provide move patterns.
 (define_mode_iterator MOVE64
-  [DI DF (V2SF "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")])
+  [DI DF
+   (V2SF "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT")
+   (V2SI "HAVE_LOONGSON_VECTOR_MODES")
+   (V4HI "HAVE_LOONGSON_VECTOR_MODES")
+   (V8QI "HAVE_LOONGSON_VECTOR_MODES")])
 
 ;; 128-bit modes for which we provide move patterns on 64-bit targets.
 (define_mode_iterator MOVE128 [TF])
@@ -521,6 +547,9 @@
   [(DF "!TARGET_64BIT && TARGET_DOUBLE_FLOAT")
    (DI "!TARGET_64BIT && TARGET_DOUBLE_FLOAT")
    (V2SF "!TARGET_64BIT && TARGET_PAIRED_SINGLE_FLOAT")
+   (V2SI "!TARGET_64BIT && HAVE_LOONGSON_VECTOR_MODES")
+   (V4HI "!TARGET_64BIT && HAVE_LOONGSON_VECTOR_MODES")
+   (V8QI "!TARGET_64BIT && HAVE_LOONGSON_VECTOR_MODES")
    (TF "TARGET_64BIT && TARGET_FLOAT64")])
 
 ;; In GPR templates, a string like "<d>subu" will expand to "subu" in the
@@ -573,7 +602,9 @@
 
 ;; This attribute gives the integer mode that has half the size of
 ;; the controlling mode.
-(define_mode_attr HALFMODE [(DF "SI") (DI "SI") (V2SF "SI") (TF "DI")])
+(define_mode_attr HALFMODE [(DF "SI") (DI "SI") (V2SF "SI")
+                            (V2SI "SI") (V4HI "SI") (V8QI "SI")
+			    (TF "DI")])
 
 ;; This attribute works around the early SB-1 rev2 core "F2" erratum:
 ;;
@@ -6406,3 +6437,6 @@
 
 ; MIPS fixed-point instructions.
 (include "mips-fixed.md")
+
+; ST-Microelectronics Loongson-2E/2F-specific patterns.
+(include "loongson.md")
--- config/mips/mips.c	(/local/gcc-trunk/gcc)	(revision 373)
+++ config/mips/mips.c	(/local/gcc-2/gcc)	(revision 373)
@@ -3531,6 +3531,12 @@ mips_split_doubleword_move (rtx dest, rt
 	emit_insn (gen_move_doubleword_fprdf (dest, src));
       else if (!TARGET_64BIT && GET_MODE (dest) == V2SFmode)
 	emit_insn (gen_move_doubleword_fprv2sf (dest, src));
+      else if (!TARGET_64BIT && GET_MODE (dest) == V2SImode)
+	emit_insn (gen_move_doubleword_fprv2si (dest, src));
+      else if (!TARGET_64BIT && GET_MODE (dest) == V4HImode)
+	emit_insn (gen_move_doubleword_fprv4hi (dest, src));
+      else if (!TARGET_64BIT && GET_MODE (dest) == V8QImode)
+	emit_insn (gen_move_doubleword_fprv8qi (dest, src));
       else if (TARGET_64BIT && GET_MODE (dest) == TFmode)
 	emit_insn (gen_move_doubleword_fprtf (dest, src));
       else
@@ -8922,6 +8928,14 @@ mips_hard_regno_mode_ok_p (unsigned int 
       if (mode == TFmode && ISA_HAS_8CC)
 	return true;
 
+      /* Allow 64-bit vector modes for Loongson-2E/2F.  */
+      if (HAVE_LOONGSON_VECTOR_MODES
+	  && (mode == V2SImode
+	      || mode == V4HImode
+	      || mode == V8QImode
+	      || mode == DImode))
+	return true;
+
       if (class == MODE_FLOAT
 	  || class == MODE_COMPLEX_FLOAT
 	  || class == MODE_VECTOR_FLOAT)
@@ -9268,6 +9282,11 @@ mips_vector_mode_supported_p (enum machi
     case V4UQQmode:
       return TARGET_DSP;
 
+    case V2SImode:
+    case V4HImode:
+    case V8QImode:
+      return HAVE_LOONGSON_VECTOR_MODES;
+
     default:
       return false;
     }
@@ -10388,6 +10407,213 @@ static const struct mips_builtin_descrip
   DIRECT_BUILTIN (dpsqx_sa_w_ph, MIPS_DI_FTYPE_DI_V2HI_V2HI, MASK_DSPR2)
 };
 
+/* Define a Loongson MIPS_BUILTIN_DIRECT function for instruction
+   CODE_FOR_mips_<INSN>.  FUNCTION_TYPE and TARGET_FLAGS are
+   builtin_description fields.  */
+#define LOONGSON_BUILTIN(FN_NAME, INSN, FUNCTION_TYPE)		\
+  { CODE_FOR_ ## INSN, 0, "__builtin_loongson_" #FN_NAME,	\
+    MIPS_BUILTIN_DIRECT, FUNCTION_TYPE, 0 }
+
+/* Builtin functions for ST Microelectronics Loongson-2E/2F cores.  */
+static const struct mips_builtin_description mips_loongson_2ef_bdesc [] =
+{
+  /* Pack with signed saturation.  */
+  LOONGSON_BUILTIN (packsswh, vec_pack_ssat_v2si,
+                    MIPS_V4HI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (packsshb, vec_pack_ssat_v4hi,
+                    MIPS_V8QI_FTYPE_V4HI_V4HI),
+  /* Pack with unsigned saturation.  */
+  LOONGSON_BUILTIN (packushb, vec_pack_usat_v4hi,
+                    MIPS_UV8QI_FTYPE_UV4HI_UV4HI),
+  /* Vector addition, treating overflow by wraparound.  */
+  LOONGSON_BUILTIN (paddw_u, addv2si3, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (paddh_u, addv4hi3, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (paddb_u, addv8qi3, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (paddw_s, addv2si3, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (paddh_s, addv4hi3, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (paddb_s, addv8qi3, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Addition of doubleword integers, treating overflow by wraparound.  */
+  LOONGSON_BUILTIN (paddd_u, paddd, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN (paddd_s, paddd, MIPS_DI_FTYPE_DI_DI),
+  /* Vector addition, treating overflow by signed saturation.  */
+  LOONGSON_BUILTIN (paddsh, ssaddv4hi3, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (paddsb, ssaddv8qi3, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Vector addition, treating overflow by unsigned saturation.  */
+  LOONGSON_BUILTIN (paddush, usaddv4hi3, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (paddusb, usaddv8qi3, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Logical AND NOT.  */
+  LOONGSON_BUILTIN (pandn_ud, loongson_and_not_di, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN (pandn_uw, loongson_and_not_v2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (pandn_uh, loongson_and_not_v4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pandn_ub, loongson_and_not_v8qi,
+  		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (pandn_sd, loongson_and_not_di, MIPS_DI_FTYPE_DI_DI),
+  LOONGSON_BUILTIN (pandn_sw, loongson_and_not_v2si,
+  		    MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (pandn_sh, loongson_and_not_v4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pandn_sb, loongson_and_not_v8qi,
+  		    MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Average.  */
+  LOONGSON_BUILTIN (pavgh, loongson_average_v4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pavgb, loongson_average_v8qi,
+  		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Equality test.  */
+  LOONGSON_BUILTIN (pcmpeqw_u, loongson_eq_v2si, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (pcmpeqh_u, loongson_eq_v4hi, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pcmpeqb_u, loongson_eq_v8qi, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (pcmpeqw_s, loongson_eq_v2si, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (pcmpeqh_s, loongson_eq_v4hi, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pcmpeqb_s, loongson_eq_v8qi, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Greater-than test.  */
+  LOONGSON_BUILTIN (pcmpgtw_u, loongson_gt_v2si, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (pcmpgth_u, loongson_gt_v4hi, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pcmpgtb_u, loongson_gt_v8qi, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (pcmpgtw_s, loongson_gt_v2si, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (pcmpgth_s, loongson_gt_v4hi, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pcmpgtb_s, loongson_gt_v8qi, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Extract halfword.  */
+  LOONGSON_BUILTIN (pextrh_u, loongson_extract_halfword,
+  		    MIPS_UV4HI_FTYPE_UV4HI_USI),
+  LOONGSON_BUILTIN (pextrh_s, loongson_extract_halfword,
+  		    MIPS_V4HI_FTYPE_V4HI_USI),
+  /* Insert halfword.  */
+  LOONGSON_BUILTIN (pinsrh_0_u, loongson_insert_halfword_0,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pinsrh_1_u, loongson_insert_halfword_1,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pinsrh_2_u, loongson_insert_halfword_2,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pinsrh_3_u, loongson_insert_halfword_3,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (pinsrh_0_s, loongson_insert_halfword_0,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pinsrh_1_s, loongson_insert_halfword_1,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pinsrh_2_s, loongson_insert_halfword_2,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (pinsrh_3_s, loongson_insert_halfword_3,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  /* Multiply and add.  */
+  LOONGSON_BUILTIN (pmaddhw, loongson_mult_add,
+  		    MIPS_V2SI_FTYPE_V4HI_V4HI),
+  /* Maximum of signed halfwords.  */
+  LOONGSON_BUILTIN (pmaxsh, smaxv4hi3,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  /* Maximum of unsigned bytes.  */
+  LOONGSON_BUILTIN (pmaxub, umaxv8qi3,
+		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Minimum of signed halfwords.  */
+  LOONGSON_BUILTIN (pminsh, sminv4hi3,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  /* Minimum of unsigned bytes.  */
+  LOONGSON_BUILTIN (pminub, uminv8qi3,
+		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Move byte mask.  */
+  LOONGSON_BUILTIN (pmovmskb_u, loongson_move_byte_mask,
+  		    MIPS_UV8QI_FTYPE_UV8QI),
+  LOONGSON_BUILTIN (pmovmskb_s, loongson_move_byte_mask,
+  		    MIPS_V8QI_FTYPE_V8QI),
+  /* Multiply unsigned integers and store high result.  */
+  LOONGSON_BUILTIN (pmulhuh, umulv4hi3_highpart,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  /* Multiply signed integers and store high result.  */
+  LOONGSON_BUILTIN (pmulhh, smulv4hi3_highpart,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  /* Multiply signed integers and store low result.  */
+  LOONGSON_BUILTIN (pmullh, loongson_smul_lowpart,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  /* Multiply unsigned word integers.  */
+  LOONGSON_BUILTIN (pmuluw, loongson_umul_word,
+  		    MIPS_UDI_FTYPE_UV2SI_UV2SI),
+  /* Absolute difference.  */
+  LOONGSON_BUILTIN (pasubub, loongson_pasubub,
+		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Sum of unsigned byte integers.  */
+  LOONGSON_BUILTIN (biadd, reduc_uplus_v8qi,
+		    MIPS_UV4HI_FTYPE_UV8QI),
+  /* Sum of absolute differences.  */
+  LOONGSON_BUILTIN (psadbh, loongson_psadbh,
+  		    MIPS_UV4HI_FTYPE_UV8QI_UV8QI),
+  /* Shuffle halfwords.  */
+  LOONGSON_BUILTIN (pshufh_u, loongson_pshufh,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI_UQI),
+  LOONGSON_BUILTIN (pshufh_s, loongson_pshufh,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI_UQI),
+  /* Shift left logical.  */
+  LOONGSON_BUILTIN (psllh_u, loongson_psllv4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN (psllh_s, loongson_psllv4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN (psllw_u, loongson_psllv2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN (psllw_s, loongson_psllv2si,
+  		    MIPS_V2SI_FTYPE_V2SI_UQI),
+  /* Shift right arithmetic.  */
+  LOONGSON_BUILTIN (psrah_u, loongson_psrav4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN (psrah_s, loongson_psrav4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN (psraw_u, loongson_psrav2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN (psraw_s, loongson_psrav2si,
+  		    MIPS_V2SI_FTYPE_V2SI_UQI),
+  /* Shift right logical.  */
+  LOONGSON_BUILTIN (psrlh_u, loongson_psrlv4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UQI),
+  LOONGSON_BUILTIN (psrlh_s, loongson_psrlv4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_UQI),
+  LOONGSON_BUILTIN (psrlw_u, loongson_psrlv2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UQI),
+  LOONGSON_BUILTIN (psrlw_s, loongson_psrlv2si,
+  		    MIPS_V2SI_FTYPE_V2SI_UQI),
+  /* Vector subtraction, treating overflow by wraparound.  */
+  LOONGSON_BUILTIN (psubw_u, subv2si3, MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (psubh_u, subv4hi3, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (psubb_u, subv8qi3, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (psubw_s, subv2si3, MIPS_V2SI_FTYPE_V2SI_V2SI),
+  LOONGSON_BUILTIN (psubh_s, subv4hi3, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (psubb_s, subv8qi3, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Subtraction of doubleword integers, treating overflow by wraparound.  */
+  LOONGSON_BUILTIN (psubd_u, psubd, MIPS_UDI_FTYPE_UDI_UDI),
+  LOONGSON_BUILTIN (psubd_s, psubd, MIPS_DI_FTYPE_DI_DI),
+  /* Vector subtraction, treating overflow by signed saturation.  */
+  LOONGSON_BUILTIN (psubsh, sssubv4hi3, MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (psubsb, sssubv8qi3, MIPS_V8QI_FTYPE_V8QI_V8QI),
+  /* Vector subtraction, treating overflow by unsigned saturation.  */
+  LOONGSON_BUILTIN (psubush, ussubv4hi3, MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (psubusb, ussubv8qi3, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  /* Unpack high data.  */
+  LOONGSON_BUILTIN (punpckhbh_u, vec_interleave_highv8qi,
+  		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (punpckhhw_u, vec_interleave_highv4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (punpckhwd_u, vec_interleave_highv2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (punpckhbh_s, vec_interleave_highv8qi,
+  		    MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN (punpckhhw_s, vec_interleave_highv4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (punpckhwd_s, vec_interleave_highv2si,
+  		    MIPS_V2SI_FTYPE_V2SI_V2SI),
+  /* Unpack low data.  */
+  LOONGSON_BUILTIN (punpcklbh_u, vec_interleave_lowv8qi,
+  		    MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
+  LOONGSON_BUILTIN (punpcklhw_u, vec_interleave_lowv4hi,
+  		    MIPS_UV4HI_FTYPE_UV4HI_UV4HI),
+  LOONGSON_BUILTIN (punpcklwd_u, vec_interleave_lowv2si,
+  		    MIPS_UV2SI_FTYPE_UV2SI_UV2SI),
+  LOONGSON_BUILTIN (punpcklbh_s, vec_interleave_lowv8qi,
+  		    MIPS_V8QI_FTYPE_V8QI_V8QI),
+  LOONGSON_BUILTIN (punpcklhw_s, vec_interleave_lowv4hi,
+  		    MIPS_V4HI_FTYPE_V4HI_V4HI),
+  LOONGSON_BUILTIN (punpcklwd_s, vec_interleave_lowv2si,
+  		    MIPS_V2SI_FTYPE_V2SI_V2SI)
+};
+
 /* This structure describes an array of mips_builtin_description entries.  */
 struct mips_bdesc_map {
   /* The array that this entry describes.  */
@@ -10411,20 +10637,29 @@ static const struct mips_bdesc_map mips_
   { mips_sb1_bdesc, ARRAY_SIZE (mips_sb1_bdesc), PROCESSOR_SB1, 0 },
   { mips_dsp_bdesc, ARRAY_SIZE (mips_dsp_bdesc), PROCESSOR_MAX, 0 },
   { mips_dsp_32only_bdesc, ARRAY_SIZE (mips_dsp_32only_bdesc),
-    PROCESSOR_MAX, MASK_64BIT }
+    PROCESSOR_MAX, MASK_64BIT },
+  { mips_loongson_2ef_bdesc, ARRAY_SIZE (mips_loongson_2ef_bdesc),
+    PROCESSOR_MAX, 0 }
 };
 
-/* MODE is a vector mode whose elements have type TYPE.  Return the type
-   of the vector itself.  */
+/* MODE is a vector mode whose elements have type TYPE.
+   TYPE is signed or unsigned depending on UNSIGNED_P.
+   Return the type of the vector itself.  */
 
 static tree
-mips_builtin_vector_type (tree type, enum machine_mode mode)
+mips_builtin_vector_type (tree type, enum machine_mode mode, bool unsigned_p)
 {
-  static tree types[(int) MAX_MACHINE_MODE];
+  static tree types[2 * (int) MAX_MACHINE_MODE];
+  int mode_index;
+
+  mode_index = (int) mode;
 
-  if (types[(int) mode] == NULL_TREE)
-    types[(int) mode] = build_vector_type_for_mode (type, mode);
-  return types[(int) mode];
+  if (unsigned_p)
+    mode_index += MAX_MACHINE_MODE;
+
+  if (types[mode_index] == NULL_TREE)
+    types[mode_index] = build_vector_type_for_mode (type, mode);
+  return types[mode_index];
 }
 
 /* Source-level argument types.  */
@@ -10433,16 +10668,33 @@ mips_builtin_vector_type (tree type, enu
 #define MIPS_ATYPE_POINTER ptr_type_node
 
 /* Standard mode-based argument types.  */
+#define MIPS_ATYPE_UQI unsigned_intQI_type_node
 #define MIPS_ATYPE_SI intSI_type_node
 #define MIPS_ATYPE_USI unsigned_intSI_type_node
 #define MIPS_ATYPE_DI intDI_type_node
+#define MIPS_ATYPE_UDI unsigned_intDI_type_node
 #define MIPS_ATYPE_SF float_type_node
 #define MIPS_ATYPE_DF double_type_node
 
 /* Vector argument types.  */
-#define MIPS_ATYPE_V2SF mips_builtin_vector_type (float_type_node, V2SFmode)
-#define MIPS_ATYPE_V2HI mips_builtin_vector_type (intHI_type_node, V2HImode)
-#define MIPS_ATYPE_V4QI mips_builtin_vector_type (intQI_type_node, V4QImode)
+#define MIPS_ATYPE_V2SF						\
+  mips_builtin_vector_type (float_type_node, V2SFmode, false)
+#define MIPS_ATYPE_V2HI						\
+  mips_builtin_vector_type (intHI_type_node, V2HImode, false)
+#define MIPS_ATYPE_V2SI						\
+  mips_builtin_vector_type (intSI_type_node, V2SImode, false)
+#define MIPS_ATYPE_V4QI						\
+  mips_builtin_vector_type (intQI_type_node, V4QImode, false)
+#define MIPS_ATYPE_V4HI						\
+  mips_builtin_vector_type (intHI_type_node, V4HImode, false)
+#define MIPS_ATYPE_V8QI						\
+  mips_builtin_vector_type (intQI_type_node, V8QImode, false)
+#define MIPS_ATYPE_UV2SI						\
+  mips_builtin_vector_type (unsigned_intSI_type_node, V2SImode, true)
+#define MIPS_ATYPE_UV4HI						\
+  mips_builtin_vector_type (unsigned_intHI_type_node, V4HImode, true)
+#define MIPS_ATYPE_UV8QI						\
+  mips_builtin_vector_type (unsigned_intQI_type_node, V8QImode, true)
 
 /* MIPS_FTYPE_ATYPESN takes N MIPS_FTYPES-like type codes and lists
    their associated MIPS_ATYPEs.  */
@@ -10500,10 +10752,14 @@ mips_init_builtins (void)
        m < &mips_bdesc_arrays[ARRAY_SIZE (mips_bdesc_arrays)];
        m++)
     {
+      bool loongson_p = (m->bdesc == mips_loongson_2ef_bdesc);
+
       if ((m->proc == PROCESSOR_MAX || m->proc == mips_arch)
-	  && (m->unsupported_target_flags & target_flags) == 0)
+ 	  && (m->unsupported_target_flags & target_flags) == 0
+ 	  && (!loongson_p || HAVE_LOONGSON_VECTOR_MODES))
 	for (d = m->bdesc; d < &m->bdesc[m->size]; d++)
-	  if ((d->target_flags & target_flags) == d->target_flags)
+ 	  if (((d->target_flags & target_flags) == d->target_flags)
+ 	      || loongson_p)
 	    add_builtin_function (d->name,
 				  mips_build_function_type (d->function_type),
 				  d - m->bdesc + offset,
@@ -12603,6 +12859,30 @@ mips_order_regs_for_local_alloc (void)
       reg_alloc_order[24] = 0;
     }
 }
+
+/* Initialize vector TARGET to VALS.  */
+
+void
+mips_expand_vector_init (rtx target, rtx vals)
+{
+  enum machine_mode mode;
+  enum machine_mode inner;
+  unsigned int i, n_elts;
+  rtx mem;
+
+  mode = GET_MODE (target);
+  inner = GET_MODE_INNER (mode);
+  n_elts = GET_MODE_NUNITS (mode);
+
+  gcc_assert (VECTOR_MODE_P (mode));
+
+  mem = assign_stack_temp (mode, GET_MODE_SIZE (mode), 0);
+  for (i = 0; i < n_elts; i++)
+    emit_move_insn (adjust_address_nv (mem, inner, i * GET_MODE_SIZE (inner)),
+                    XVECEXP (vals, 0, i));
+
+  emit_move_insn (target, mem);
+}
 
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
--- doc/extend.texi	(/local/gcc-trunk/gcc)	(revision 373)
+++ doc/extend.texi	(/local/gcc-2/gcc)	(revision 373)
@@ -6788,6 +6788,7 @@ instructions, but allow the compiler to 
 * X86 Built-in Functions::
 * MIPS DSP Built-in Functions::
 * MIPS Paired-Single Support::
+* MIPS Loongson Built-in Functions::
 * PowerPC AltiVec Built-in Functions::
 * SPARC VIS Built-in Functions::
 * SPU Built-in Functions::
@@ -8667,6 +8668,150 @@ value is the upper one.  The opposite or
 For example, the code above will set the lower half of @code{a} to
 @code{1.5} on little-endian targets and @code{9.1} on big-endian targets.
 
+@node MIPS Loongson Built-in Functions
+@subsection MIPS Loongson Built-in Functions
+
+GCC provides intrinsics to access the SIMD instructions provided by the
+ST Microelectronics Loongson-2E and -2F processors.  These intrinsics,
+available after inclusion of the @code{loongson.h} header file,
+operate on the following 64-bit vector types:
+
+@itemize
+@item @code{uint8x8_t}, a vector of eight unsigned 8-bit integers;
+@item @code{uint16x4_t}, a vector of four unsigned 16-bit integers;
+@item @code{uint32x2_t}, a vector of two unsigned 32-bit integers;
+@item @code{int8x8_t}, a vector of eight signed 8-bit integers;
+@item @code{int16x4_t}, a vector of four signed 16-bit integers;
+@item @code{int32x2_t}, a vector of two signed 32-bit integers.
+@end itemize
+
+The intrinsics provided are listed below; each is named after the
+machine instruction to which it corresponds, with suffixes added as
+appropriate to distinguish intrinsics that expand to the same machine
+instruction yet have different argument types.  Refer to the architecture
+documentation for a description of the functionality of each
+instruction.
+
+@smallexample
+int16x4_t packsswh (int32x2_t s, int32x2_t t);
+int8x8_t packsshb (int16x4_t s, int16x4_t t);
+uint8x8_t packushb (uint16x4_t s, uint16x4_t t);
+uint32x2_t paddw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t paddh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t paddb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t paddw_s (int32x2_t s, int32x2_t t);
+int16x4_t paddh_s (int16x4_t s, int16x4_t t);
+int8x8_t paddb_s (int8x8_t s, int8x8_t t);
+uint64_t paddd_u (uint64_t s, uint64_t t);
+int64_t paddd_s (int64_t s, int64_t t);
+int16x4_t paddsh (int16x4_t s, int16x4_t t);
+int8x8_t paddsb (int8x8_t s, int8x8_t t);
+uint16x4_t paddush (uint16x4_t s, uint16x4_t t);
+uint8x8_t paddusb (uint8x8_t s, uint8x8_t t);
+uint64_t pandn_ud (uint64_t s, uint64_t t);
+uint32x2_t pandn_uw (uint32x2_t s, uint32x2_t t);
+uint16x4_t pandn_uh (uint16x4_t s, uint16x4_t t);
+uint8x8_t pandn_ub (uint8x8_t s, uint8x8_t t);
+int64_t pandn_sd (int64_t s, int64_t t);
+int32x2_t pandn_sw (int32x2_t s, int32x2_t t);
+int16x4_t pandn_sh (int16x4_t s, int16x4_t t);
+int8x8_t pandn_sb (int8x8_t s, int8x8_t t);
+uint16x4_t pavgh (uint16x4_t s, uint16x4_t t);
+uint8x8_t pavgb (uint8x8_t s, uint8x8_t t);
+uint32x2_t pcmpeqw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t pcmpeqh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t pcmpeqb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t pcmpeqw_s (int32x2_t s, int32x2_t t);
+int16x4_t pcmpeqh_s (int16x4_t s, int16x4_t t);
+int8x8_t pcmpeqb_s (int8x8_t s, int8x8_t t);
+uint32x2_t pcmpgtw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t pcmpgth_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t pcmpgtb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t pcmpgtw_s (int32x2_t s, int32x2_t t);
+int16x4_t pcmpgth_s (int16x4_t s, int16x4_t t);
+int8x8_t pcmpgtb_s (int8x8_t s, int8x8_t t);
+uint16x4_t pextrh_u (uint16x4_t s, int field);
+int16x4_t pextrh_s (int16x4_t s, int field);
+uint16x4_t pinsrh_0_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_1_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_2_u (uint16x4_t s, uint16x4_t t);
+uint16x4_t pinsrh_3_u (uint16x4_t s, uint16x4_t t);
+int16x4_t pinsrh_0_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_1_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_2_s (int16x4_t s, int16x4_t t);
+int16x4_t pinsrh_3_s (int16x4_t s, int16x4_t t);
+int32x2_t pmaddhw (int16x4_t s, int16x4_t t);
+int16x4_t pmaxsh (int16x4_t s, int16x4_t t);
+uint8x8_t pmaxub (uint8x8_t s, uint8x8_t t);
+int16x4_t pminsh (int16x4_t s, int16x4_t t);
+uint8x8_t pminub (uint8x8_t s, uint8x8_t t);
+uint8x8_t pmovmskb_u (uint8x8_t s);
+int8x8_t pmovmskb_s (int8x8_t s);
+uint16x4_t pmulhuh (uint16x4_t s, uint16x4_t t);
+int16x4_t pmulhh (int16x4_t s, int16x4_t t);
+int16x4_t pmullh (int16x4_t s, int16x4_t t);
+int64_t pmuluw (uint32x2_t s, uint32x2_t t);
+uint8x8_t pasubub (uint8x8_t s, uint8x8_t t);
+uint16x4_t biadd (uint8x8_t s);
+uint16x4_t psadbh (uint8x8_t s, uint8x8_t t);
+uint16x4_t pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order);
+int16x4_t pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order);
+uint16x4_t psllh_u (uint16x4_t s, uint8_t amount);
+int16x4_t psllh_s (int16x4_t s, uint8_t amount);
+uint32x2_t psllw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psllw_s (int32x2_t s, uint8_t amount);
+uint16x4_t psrlh_u (uint16x4_t s, uint8_t amount);
+int16x4_t psrlh_s (int16x4_t s, uint8_t amount);
+uint32x2_t psrlw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psrlw_s (int32x2_t s, uint8_t amount);
+uint16x4_t psrah_u (uint16x4_t s, uint8_t amount);
+int16x4_t psrah_s (int16x4_t s, uint8_t amount);
+uint32x2_t psraw_u (uint32x2_t s, uint8_t amount);
+int32x2_t psraw_s (int32x2_t s, uint8_t amount);
+uint32x2_t psubw_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t psubh_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t psubb_u (uint8x8_t s, uint8x8_t t);
+int32x2_t psubw_s (int32x2_t s, int32x2_t t);
+int16x4_t psubh_s (int16x4_t s, int16x4_t t);
+int8x8_t psubb_s (int8x8_t s, int8x8_t t);
+uint64_t psubd_u (uint64_t s, uint64_t t);
+int64_t psubd_s (int64_t s, int64_t t);
+int16x4_t psubsh (int16x4_t s, int16x4_t t);
+int8x8_t psubsb (int8x8_t s, int8x8_t t);
+uint16x4_t psubush (uint16x4_t s, uint16x4_t t);
+uint8x8_t psubusb (uint8x8_t s, uint8x8_t t);
+uint32x2_t punpckhwd_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t punpckhhw_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t punpckhbh_u (uint8x8_t s, uint8x8_t t);
+int32x2_t punpckhwd_s (int32x2_t s, int32x2_t t);
+int16x4_t punpckhhw_s (int16x4_t s, int16x4_t t);
+int8x8_t punpckhbh_s (int8x8_t s, int8x8_t t);
+uint32x2_t punpcklwd_u (uint32x2_t s, uint32x2_t t);
+uint16x4_t punpcklhw_u (uint16x4_t s, uint16x4_t t);
+uint8x8_t punpcklbh_u (uint8x8_t s, uint8x8_t t);
+int32x2_t punpcklwd_s (int32x2_t s, int32x2_t t);
+int16x4_t punpcklhw_s (int16x4_t s, int16x4_t t);
+int8x8_t punpcklbh_s (int8x8_t s, int8x8_t t);
+@end smallexample
+
+Also provided are helper functions for loading and storing values of the
+above 64-bit vector types to and from memory:
+
+@smallexample
+uint32x2_t vec_load_uw (uint32x2_t *src);
+uint16x4_t vec_load_uh (uint16x4_t *src);
+uint8x8_t vec_load_ub (uint8x8_t *src);
+int32x2_t vec_load_sw (int32x2_t *src);
+int16x4_t vec_load_sh (int16x4_t *src);
+int8x8_t vec_load_sb (int8x8_t *src);
+void vec_store_uw (uint32x2_t v, uint32x2_t *dest);
+void vec_store_uh (uint16x4_t v, uint16x4_t *dest);
+void vec_store_ub (uint8x8_t v, uint8x8_t *dest);
+void vec_store_sw (int32x2_t v, int32x2_t *dest);
+void vec_store_sh (int16x4_t v, int16x4_t *dest);
+void vec_store_sb (int8x8_t v, int8x8_t *dest);
+@end smallexample
+
 @menu
 * Paired-Single Arithmetic::
 * Paired-Single Built-in Functions::
--- testsuite/gcc.target/mips/loongson-simd.c	(/local/gcc-trunk/gcc)	(revision 373)
+++ testsuite/gcc.target/mips/loongson-simd.c	(/local/gcc-2/gcc)	(revision 373)
@@ -0,0 +1,2380 @@
+/* Test cases for ST Microelectronics Loongson-2E/2F SIMD intrinsics.
+   Copyright (C) 2008 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target mips_loongson } */
+
+#include "loongson.h"
+#include <stdio.h>
+#include <stdint.h>
+#include <assert.h>
+#include <limits.h>
+
+typedef union { int32x2_t v; int32_t a[2]; } int32x2_encap_t;
+typedef union { int16x4_t v; int16_t a[4]; } int16x4_encap_t;
+typedef union { int8x8_t v; int8_t a[8]; } int8x8_encap_t;
+typedef union { uint32x2_t v; uint32_t a[2]; } uint32x2_encap_t;
+typedef union { uint16x4_t v; uint16_t a[4]; } uint16x4_encap_t;
+typedef union { uint8x8_t v; uint8_t a[8]; } uint8x8_encap_t;
+
+#define UINT16x4_MAX USHRT_MAX
+#define UINT8x8_MAX UCHAR_MAX
+#define INT8x8_MAX SCHAR_MAX
+#define INT16x4_MAX SHRT_MAX
+#define INT32x2_MAX INT_MAX
+
+static void test_packsswh (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = INT16x4_MAX - 2;
+  s.a[1] = INT16x4_MAX - 1;
+  t.a[0] = INT16x4_MAX;
+  t.a[1] = INT16x4_MAX + 1;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = packsswh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == INT16x4_MAX - 2);
+  assert (r.a[1] == INT16x4_MAX - 1);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_packsshb (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = INT8x8_MAX - 6;
+  s.a[1] = INT8x8_MAX - 5;
+  s.a[2] = INT8x8_MAX - 4;
+  s.a[3] = INT8x8_MAX - 3;
+  t.a[0] = INT8x8_MAX - 2;
+  t.a[1] = INT8x8_MAX - 1;
+  t.a[2] = INT8x8_MAX;
+  t.a[3] = INT8x8_MAX + 1;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = packsshb (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_packushb (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = UINT8x8_MAX - 6;
+  s.a[1] = UINT8x8_MAX - 5;
+  s.a[2] = UINT8x8_MAX - 4;
+  s.a[3] = UINT8x8_MAX - 3;
+  t.a[0] = UINT8x8_MAX - 2;
+  t.a[1] = UINT8x8_MAX - 1;
+  t.a[2] = UINT8x8_MAX;
+  t.a[3] = UINT8x8_MAX + 1;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = packushb (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == UINT8x8_MAX - 6);
+  assert (r.a[1] == UINT8x8_MAX - 5);
+  assert (r.a[2] == UINT8x8_MAX - 4);
+  assert (r.a[3] == UINT8x8_MAX - 3);
+  assert (r.a[4] == UINT8x8_MAX - 2);
+  assert (r.a[5] == UINT8x8_MAX - 1);
+  assert (r.a[6] == UINT8x8_MAX);
+  assert (r.a[7] == UINT8x8_MAX);
+}
+
+static void test_paddw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  t.a[0] = 3;
+  t.a[1] = 4;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = paddw_u (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 4);
+  assert (r.a[1] == 6);
+}
+
+static void test_paddw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -1;
+  t.a[0] = 3;
+  t.a[1] = 4;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = paddw_s (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 3);
+}
+
+static void test_paddh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = paddh_u (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 6);
+  assert (r.a[1] == 8);
+  assert (r.a[2] == 10);
+  assert (r.a[3] == 12);
+}
+
+static void test_paddh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = paddh_s (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == -18);
+  assert (r.a[2] == -27);
+  assert (r.a[3] == -36);
+}
+
+static void test_paddb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  s.a[4] = 5;
+  s.a[5] = 6;
+  s.a[6] = 7;
+  s.a[7] = 8;
+  t.a[0] = 9;
+  t.a[1] = 10;
+  t.a[2] = 11;
+  t.a[3] = 12;
+  t.a[4] = 13;
+  t.a[5] = 14;
+  t.a[6] = 15;
+  t.a[7] = 16;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = paddb_u (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 10);
+  assert (r.a[1] == 12);
+  assert (r.a[2] == 14);
+  assert (r.a[3] == 16);
+  assert (r.a[4] == 18);
+  assert (r.a[5] == 20);
+  assert (r.a[6] == 22);
+  assert (r.a[7] == 24);
+}
+
+static void test_paddb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  s.a[4] = -50;
+  s.a[5] = -60;
+  s.a[6] = -70;
+  s.a[7] = -80;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = paddb_s (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == -18);
+  assert (r.a[2] == -27);
+  assert (r.a[3] == -36);
+  assert (r.a[4] == -45);
+  assert (r.a[5] == -54);
+  assert (r.a[6] == -63);
+  assert (r.a[7] == -72);
+}
+
+static void test_paddd_u (void)
+{
+  uint64_t d = 123456;
+  uint64_t e = 789012;
+  uint64_t r;
+  r = paddd_u (d, e);
+  assert (r == 912468);
+}
+
+static void test_paddd_s (void)
+{
+  int64_t d = 123456;
+  int64_t e = -789012;
+  int64_t r;
+  r = paddd_s (d, e);
+  assert (r == -665556);
+}
+
+static void test_paddsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 0;
+  s.a[2] = 1;
+  s.a[3] = 2;
+  t.a[0] = INT16x4_MAX;
+  t.a[1] = INT16x4_MAX;
+  t.a[2] = INT16x4_MAX;
+  t.a[3] = INT16x4_MAX;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = paddsh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == INT16x4_MAX - 1);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_paddsb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -6;
+  s.a[1] = -5;
+  s.a[2] = -4;
+  s.a[3] = -3;
+  s.a[4] = -2;
+  s.a[5] = -1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = INT8x8_MAX;
+  t.a[1] = INT8x8_MAX;
+  t.a[2] = INT8x8_MAX;
+  t.a[3] = INT8x8_MAX;
+  t.a[4] = INT8x8_MAX;
+  t.a[5] = INT8x8_MAX;
+  t.a[6] = INT8x8_MAX;
+  t.a[7] = INT8x8_MAX;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = paddsb (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_paddush (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 0;
+  s.a[3] = 1;
+  t.a[0] = UINT16x4_MAX;
+  t.a[1] = UINT16x4_MAX;
+  t.a[2] = UINT16x4_MAX;
+  t.a[3] = UINT16x4_MAX;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = paddush (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == UINT16x4_MAX);
+  assert (r.a[1] == UINT16x4_MAX);
+  assert (r.a[2] == UINT16x4_MAX);
+  assert (r.a[3] == UINT16x4_MAX);
+}
+
+static void test_paddusb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 0;
+  s.a[3] = 1;
+  s.a[4] = 0;
+  s.a[5] = 1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = UINT8x8_MAX;
+  t.a[1] = UINT8x8_MAX;
+  t.a[2] = UINT8x8_MAX;
+  t.a[3] = UINT8x8_MAX;
+  t.a[4] = UINT8x8_MAX;
+  t.a[5] = UINT8x8_MAX;
+  t.a[6] = UINT8x8_MAX;
+  t.a[7] = UINT8x8_MAX;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = paddusb (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == UINT8x8_MAX);
+  assert (r.a[1] == UINT8x8_MAX);
+  assert (r.a[2] == UINT8x8_MAX);
+  assert (r.a[3] == UINT8x8_MAX);
+  assert (r.a[4] == UINT8x8_MAX);
+  assert (r.a[5] == UINT8x8_MAX);
+  assert (r.a[6] == UINT8x8_MAX);
+  assert (r.a[7] == UINT8x8_MAX);
+}
+
+static void test_pandn_ud (void)
+{
+  uint64_t d1 = 0x0000ffff0000ffffull;
+  uint64_t d2 = 0x0000ffff0000ffffull;
+  uint64_t r;
+  r = pandn_ud (d1, d2);
+  assert (r == 0);
+}
+
+static void test_pandn_sd (void)
+{
+  int64_t d1 = (int64_t) 0x0000000000000000ull;
+  int64_t d2 = (int64_t) 0xfffffffffffffffeull;
+  int64_t r;
+  r = pandn_sd (d1, d2);
+  assert (r == -2);
+}
+
+static void test_pandn_uw (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0x00000000;
+  t.a[0] = 0x00000000;
+  t.a[1] = 0xffffffff;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = pandn_uw (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pandn_sw (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0x00000000;
+  t.a[0] = 0xffffffff;
+  t.a[1] = 0xfffffffe;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = pandn_sw (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+}
+
+static void test_pandn_uh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0x0000;
+  s.a[2] = 0xffff;
+  s.a[3] = 0x0000;
+  t.a[0] = 0x0000;
+  t.a[1] = 0xffff;
+  t.a[2] = 0x0000;
+  t.a[3] = 0xffff;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = pandn_uh (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0xffff);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pandn_sh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0x0000;
+  s.a[2] = 0xffff;
+  s.a[3] = 0x0000;
+  t.a[0] = 0xffff;
+  t.a[1] = 0xfffe;
+  t.a[2] = 0xffff;
+  t.a[3] = 0xfffe;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = pandn_sh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -2);
+}
+
+static void test_pandn_ub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 0xff;
+  s.a[1] = 0x00;
+  s.a[2] = 0xff;
+  s.a[3] = 0x00;
+  s.a[4] = 0xff;
+  s.a[5] = 0x00;
+  s.a[6] = 0xff;
+  s.a[7] = 0x00;
+  t.a[0] = 0x00;
+  t.a[1] = 0xff;
+  t.a[2] = 0x00;
+  t.a[3] = 0xff;
+  t.a[4] = 0x00;
+  t.a[5] = 0xff;
+  t.a[6] = 0x00;
+  t.a[7] = 0xff;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = pandn_ub (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0xff);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0xff);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0x00);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pandn_sb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = 0xff;
+  s.a[1] = 0x00;
+  s.a[2] = 0xff;
+  s.a[3] = 0x00;
+  s.a[4] = 0xff;
+  s.a[5] = 0x00;
+  s.a[6] = 0xff;
+  s.a[7] = 0x00;
+  t.a[0] = 0xff;
+  t.a[1] = 0xfe;
+  t.a[2] = 0xff;
+  t.a[3] = 0xfe;
+  t.a[4] = 0xff;
+  t.a[5] = 0xfe;
+  t.a[6] = 0xff;
+  t.a[7] = 0xfe;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = pandn_sb (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -2);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == -2);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == -2);
+}
+
+static void test_pavgh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = pavgh (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 5);
+  assert (r.a[3] == 6);
+}
+
+static void test_pavgb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  s.a[4] = 1;
+  s.a[5] = 2;
+  s.a[6] = 3;
+  s.a[7] = 4;
+  t.a[0] = 5;
+  t.a[1] = 6;
+  t.a[2] = 7;
+  t.a[3] = 8;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = pavgb (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 5);
+  assert (r.a[3] == 6);
+  assert (r.a[4] == 3);
+  assert (r.a[5] == 4);
+  assert (r.a[6] == 5);
+  assert (r.a[7] == 6);
+}
+
+static void test_pcmpeqw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = pcmpeqw_u (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pcmpeqh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  t.a[2] = 43;
+  t.a[3] = 43;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = pcmpeqh_u (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0xffff);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pcmpeqb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  s.a[4] = 42;
+  s.a[5] = 43;
+  s.a[6] = 42;
+  s.a[7] = 43;
+  t.a[0] = 43;
+  t.a[1] = 43;
+  t.a[2] = 43;
+  t.a[3] = 43;
+  t.a[4] = 43;
+  t.a[5] = 43;
+  t.a[6] = 43;
+  t.a[7] = 43;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = pcmpeqb_u (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0xff);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0xff);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0x00);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pcmpeqw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = pcmpeqw_s (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+}
+
+static void test_pcmpeqh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  t.a[2] = 42;
+  t.a[3] = -42;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = pcmpeqh_s (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+}
+
+static void test_pcmpeqb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  s.a[4] = -42;
+  s.a[5] = -42;
+  s.a[6] = -42;
+  s.a[7] = -42;
+  t.a[0] = 42;
+  t.a[1] = -42;
+  t.a[2] = 42;
+  t.a[3] = -42;
+  t.a[4] = 42;
+  t.a[5] = -42;
+  t.a[6] = 42;
+  t.a[7] = -42;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = pcmpeqb_s (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == -1);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == -1);
+}
+
+static void test_pcmpgtw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 43;
+  t.a[0] = 43;
+  t.a[1] = 42;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = pcmpgtw_u (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 0x00000000);
+  assert (r.a[1] == 0xffffffff);
+}
+
+static void test_pcmpgth_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  t.a[0] = 40;
+  t.a[1] = 41;
+  t.a[2] = 43;
+  t.a[3] = 42;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = pcmpgth_u (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0x0000);
+  assert (r.a[1] == 0x0000);
+  assert (r.a[2] == 0x0000);
+  assert (r.a[3] == 0xffff);
+}
+
+static void test_pcmpgtb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  s.a[4] = 44;
+  s.a[5] = 45;
+  s.a[6] = 46;
+  s.a[7] = 47;
+  t.a[0] = 48;
+  t.a[1] = 47;
+  t.a[2] = 46;
+  t.a[3] = 45;
+  t.a[4] = 44;
+  t.a[5] = 43;
+  t.a[6] = 42;
+  t.a[7] = 41;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = pcmpgtb_u (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 0x00);
+  assert (r.a[1] == 0x00);
+  assert (r.a[2] == 0x00);
+  assert (r.a[3] == 0x00);
+  assert (r.a[4] == 0x00);
+  assert (r.a[5] == 0xff);
+  assert (r.a[6] == 0xff);
+  assert (r.a[7] == 0xff);
+}
+
+static void test_pcmpgtw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = -42;
+  t.a[0] = -42;
+  t.a[1] = -42;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = pcmpgtw_s (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == 0);
+}
+
+static void test_pcmpgth_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  t.a[0] = 42;
+  t.a[1] = 43;
+  t.a[2] = 44;
+  t.a[3] = -43;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = pcmpgth_s (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == -1);
+}
+
+static void test_pcmpgtb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = -42;
+  s.a[2] = -42;
+  s.a[3] = -42;
+  s.a[4] = 42;
+  s.a[5] = 42;
+  s.a[6] = 42;
+  s.a[7] = 42;
+  t.a[0] = -45;
+  t.a[1] = -44;
+  t.a[2] = -43;
+  t.a[3] = -42;
+  t.a[4] = 42;
+  t.a[5] = 43;
+  t.a[6] = 41;
+  t.a[7] = 40;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = pcmpgtb_s (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == -1);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == -1);
+  assert (r.a[7] == -1);
+}
+
+static void test_pextrh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_t s1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 40;
+  s.a[1] = 41;
+  s.a[2] = 42;
+  s.a[3] = 43;
+  s1 = vec_load_uh (&s.v);
+  r1 = pextrh_u (s1, 1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 41);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pextrh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_t s1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -40;
+  s.a[1] = -41;
+  s.a[2] = -42;
+  s.a[3] = -43;
+  s1 = vec_load_sh (&s.v);
+  r1 = pextrh_s (s1, 2);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -42);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pinsrh_0123_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 42;
+  s.a[1] = 0;
+  s.a[2] = 0;
+  s.a[3] = 0;
+  s1 = vec_load_uh (&s.v);
+  t.a[0] = 0;
+  t.a[1] = 0;
+  t.a[2] = 0;
+  t.a[3] = 0;
+  t1 = vec_load_uh (&t.v);
+  r1 = pinsrh_0_u (t1, s1);
+  r1 = pinsrh_1_u (r1, s1);
+  r1 = pinsrh_2_u (r1, s1);
+  r1 = pinsrh_3_u (r1, s1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 42);
+  assert (r.a[1] == 42);
+  assert (r.a[2] == 42);
+  assert (r.a[3] == 42);
+}
+
+static void test_pinsrh_0123_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -42;
+  s.a[1] = 0;
+  s.a[2] = 0;
+  s.a[3] = 0;
+  s1 = vec_load_sh (&s.v);
+  t.a[0] = 0;
+  t.a[1] = 0;
+  t.a[2] = 0;
+  t.a[3] = 0;
+  t1 = vec_load_sh (&t.v);
+  r1 = pinsrh_0_s (t1, s1);
+  r1 = pinsrh_1_s (r1, s1);
+  r1 = pinsrh_2_s (r1, s1);
+  r1 = pinsrh_3_s (r1, s1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -42);
+  assert (r.a[1] == -42);
+  assert (r.a[2] == -42);
+  assert (r.a[3] == -42);
+}
+
+static void test_pmaddhw (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -5;
+  s.a[1] = -4;
+  s.a[2] = -3;
+  s.a[3] = -2;
+  s1 = vec_load_sh (&s.v);
+  t.a[0] = 10;
+  t.a[1] = 11;
+  t.a[2] = 12;
+  t.a[3] = 13;
+  t1 = vec_load_sh (&t.v);
+  r1 = pmaddhw (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == (-5*10 + -4*11));
+  assert (r.a[1] == (-3*12 + -2*13));
+}
+
+static void test_pmaxsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -20;
+  s.a[1] = 40;
+  s.a[2] = -10;
+  s.a[3] = 50;
+  s1 = vec_load_sh (&s.v);
+  t.a[0] = 20;
+  t.a[1] = -40;
+  t.a[2] = 10;
+  t.a[3] = -50;
+  t1 = vec_load_sh (&t.v);
+  r1 = pmaxsh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == 20);
+  assert (r.a[1] == 40);
+  assert (r.a[2] == 10);
+  assert (r.a[3] == 50);
+}
+
+static void test_pmaxub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  s1 = vec_load_ub (&s.v);
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  t1 = vec_load_ub (&t.v);
+  r1 = pmaxub (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 80);
+  assert (r.a[1] == 70);
+  assert (r.a[2] == 60);
+  assert (r.a[3] == 50);
+  assert (r.a[4] == 50);
+  assert (r.a[5] == 60);
+  assert (r.a[6] == 70);
+  assert (r.a[7] == 80);
+}
+
+static void test_pminsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -20;
+  s.a[1] = 40;
+  s.a[2] = -10;
+  s.a[3] = 50;
+  s1 = vec_load_sh (&s.v);
+  t.a[0] = 20;
+  t.a[1] = -40;
+  t.a[2] = 10;
+  t.a[3] = -50;
+  t1 = vec_load_sh (&t.v);
+  r1 = pminsh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -20);
+  assert (r.a[1] == -40);
+  assert (r.a[2] == -10);
+  assert (r.a[3] == -50);
+}
+
+static void test_pminub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  s1 = vec_load_ub (&s.v);
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  t1 = vec_load_ub (&t.v);
+  r1 = pminub (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 10);
+  assert (r.a[1] == 20);
+  assert (r.a[2] == 30);
+  assert (r.a[3] == 40);
+  assert (r.a[4] == 40);
+  assert (r.a[5] == 30);
+  assert (r.a[6] == 20);
+  assert (r.a[7] == 10);
+}
+
+static void test_pmovmskb_u (void)
+{
+  uint8x8_encap_t s;
+  uint8x8_t s1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 0xf0;
+  s.a[1] = 0x40;
+  s.a[2] = 0xf0;
+  s.a[3] = 0x40;
+  s.a[4] = 0xf0;
+  s.a[5] = 0x40;
+  s.a[6] = 0xf0;
+  s.a[7] = 0x40;
+  s1 = vec_load_ub (&s.v);
+  r1 = pmovmskb_u (s1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 0x55);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_pmovmskb_s (void)
+{
+  int8x8_encap_t s;
+  int8x8_t s1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 1;
+  s.a[2] = -1;
+  s.a[3] = 1;
+  s.a[4] = -1;
+  s.a[5] = 1;
+  s.a[6] = -1;
+  s.a[7] = 1;
+  s1 = vec_load_sb (&s.v);
+  r1 = pmovmskb_s (s1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == 0x55);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_pmulhuh (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0xff00;
+  s.a[1] = 0xff00;
+  s.a[2] = 0xff00;
+  s.a[3] = 0xff00;
+  s1 = vec_load_uh (&s.v);
+  t.a[0] = 16;
+  t.a[1] = 16;
+  t.a[2] = 16;
+  t.a[3] = 16;
+  t1 = vec_load_uh (&t.v);
+  r1 = pmulhuh (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0x000f);
+  assert (r.a[1] == 0x000f);
+  assert (r.a[2] == 0x000f);
+  assert (r.a[3] == 0x000f);
+}
+
+static void test_pmulhh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = 0x0ff0;
+  s.a[1] = 0x0ff0;
+  s.a[2] = 0x0ff0;
+  s.a[3] = 0x0ff0;
+  s1 = vec_load_sh (&s.v);
+  t.a[0] = -16*16;
+  t.a[1] = -16*16;
+  t.a[2] = -16*16;
+  t.a[3] = -16*16;
+  t1 = vec_load_sh (&t.v);
+  r1 = pmulhh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -16);
+  assert (r.a[1] == -16);
+  assert (r.a[2] == -16);
+  assert (r.a[3] == -16);
+}
+
+static void test_pmullh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = 0x0ff0;
+  s.a[1] = 0x0ff0;
+  s.a[2] = 0x0ff0;
+  s.a[3] = 0x0ff0;
+  s1 = vec_load_sh (&s.v);
+  t.a[0] = -16*16;
+  t.a[1] = -16*16;
+  t.a[2] = -16*16;
+  t.a[3] = -16*16;
+  t1 = vec_load_sh (&t.v);
+  r1 = pmullh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == 4096);
+  assert (r.a[1] == 4096);
+  assert (r.a[2] == 4096);
+  assert (r.a[3] == 4096);
+}
+
+static void test_pmuluw (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint64_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 0xdeadbeef;
+  s.a[1] = 0;
+  t.a[0] = 0x0f00baaa;
+  t.a[1] = 0;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = pmuluw (s1, t1);
+  assert (r1 == 0xd0cd08e1d1a70b6ull);
+}
+
+static void test_pasubub (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  s1 = vec_load_ub (&s.v);
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  t1 = vec_load_ub (&t.v);
+  r1 = pasubub (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 70);
+  assert (r.a[1] == 50);
+  assert (r.a[2] == 30);
+  assert (r.a[3] == 10);
+  assert (r.a[4] == 10);
+  assert (r.a[5] == 30);
+  assert (r.a[6] == 50);
+  assert (r.a[7] == 70);
+}
+
+static void test_biadd (void)
+{
+  uint8x8_encap_t s;
+  uint8x8_t s1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  s1 = vec_load_ub (&s.v);
+  r1 = biadd (s1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 360);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_psadbh (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 20;
+  s.a[2] = 30;
+  s.a[3] = 40;
+  s.a[4] = 50;
+  s.a[5] = 60;
+  s.a[6] = 70;
+  s.a[7] = 80;
+  s1 = vec_load_ub (&s.v);
+  t.a[0] = 80;
+  t.a[1] = 70;
+  t.a[2] = 60;
+  t.a[3] = 50;
+  t.a[4] = 40;
+  t.a[5] = 30;
+  t.a[6] = 20;
+  t.a[7] = 10;
+  t1 = vec_load_ub (&t.v);
+  r1 = psadbh (s1, t1);
+  vec_store_uh (r1, &r.v);;
+  assert (r.a[0] == 0x0140);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_pshufh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_t s1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 2;
+  s.a[2] = 3;
+  s.a[3] = 4;
+  s1 = vec_load_uh (&s.v);
+  r.a[0] = 0;
+  r.a[1] = 0;
+  r.a[2] = 0;
+  r.a[3] = 0;
+  r1 = vec_load_uh (&r.v);
+  r1 = pshufh_u (r1, s1, 0xe5);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 2);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_pshufh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_t s1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 2;
+  s.a[2] = -3;
+  s.a[3] = 4;
+  s1 = vec_load_sh (&s.v);
+  r.a[0] = 0;
+  r.a[1] = 0;
+  r.a[2] = 0;
+  r.a[3] = 0;
+  r1 = vec_load_sh (&r.v);
+  r1 = pshufh_s (r1, s1, 0xe5);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == 2);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == -3);
+  assert (r.a[3] == 4);
+}
+
+static void test_psllh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_t s1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffff;
+  s.a[1] = 0xffff;
+  s.a[2] = 0xffff;
+  s.a[3] = 0xffff;
+  s1 = vec_load_uh (&s.v);
+  r1 = psllh_u (s1, 1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0xfffe);
+  assert (r.a[1] == 0xfffe);
+  assert (r.a[2] == 0xfffe);
+  assert (r.a[3] == 0xfffe);
+}
+
+static void test_psllw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_t s1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffff;
+  s.a[1] = 0xffffffff;
+  s1 = vec_load_uw (&s.v);
+  r1 = psllw_u (s1, 2);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 0xfffffffc);
+  assert (r.a[1] == 0xfffffffc);
+}
+
+static void test_psllh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_t s1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  s.a[2] = -1;
+  s.a[3] = -1;
+  s1 = vec_load_sh (&s.v);
+  r1 = psllh_s (s1, 1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -2);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == -2);
+  assert (r.a[3] == -2);
+}
+
+static void test_psllw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_t s1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  s1 = vec_load_sw (&s.v);
+  r1 = psllw_s (s1, 2);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == -4);
+  assert (r.a[1] == -4);
+}
+
+static void test_psrah_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_t s1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffef;
+  s.a[1] = 0xffef;
+  s.a[2] = 0xffef;
+  s.a[3] = 0xffef;
+  s1 = vec_load_uh (&s.v);
+  r1 = psrah_u (s1, 1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0xfff7);
+  assert (r.a[1] == 0xfff7);
+  assert (r.a[2] == 0xfff7);
+  assert (r.a[3] == 0xfff7);
+}
+
+static void test_psraw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_t s1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffef;
+  s.a[1] = 0xffffffef;
+  s1 = vec_load_uw (&s.v);
+  r1 = psraw_u (s1, 1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 0xfffffff7);
+  assert (r.a[1] == 0xfffffff7);
+}
+
+static void test_psrah_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_t s1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -2;
+  s.a[2] = -2;
+  s.a[3] = -2;
+  s1 = vec_load_sh (&s.v);
+  r1 = psrah_s (s1, 1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+  assert (r.a[2] == -1);
+  assert (r.a[3] == -1);
+}
+
+static void test_psraw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_t s1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -2;
+  s1 = vec_load_sw (&s.v);
+  r1 = psraw_s (s1, 1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -1);
+}
+
+static void test_psrlh_u (void)
+{
+  uint16x4_encap_t s;
+  uint16x4_t s1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0xffef;
+  s.a[1] = 0xffef;
+  s.a[2] = 0xffef;
+  s.a[3] = 0xffef;
+  s1 = vec_load_uh (&s.v);
+  r1 = psrlh_u (s1, 1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0x7ff7);
+  assert (r.a[1] == 0x7ff7);
+  assert (r.a[2] == 0x7ff7);
+  assert (r.a[3] == 0x7ff7);
+}
+
+static void test_psrlw_u (void)
+{
+  uint32x2_encap_t s;
+  uint32x2_t s1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 0xffffffef;
+  s.a[1] = 0xffffffef;
+  s1 = vec_load_uw (&s.v);
+  r1 = psrlw_u (s1, 1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 0x7ffffff7);
+  assert (r.a[1] == 0x7ffffff7);
+}
+
+static void test_psrlh_s (void)
+{
+  int16x4_encap_t s;
+  int16x4_t s1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  s.a[2] = -1;
+  s.a[3] = -1;
+  s1 = vec_load_sh (&s.v);
+  r1 = psrlh_s (s1, 1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == INT16x4_MAX);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_psrlw_s (void)
+{
+  int32x2_encap_t s;
+  int32x2_t s1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -1;
+  s1 = vec_load_sw (&s.v);
+  r1 = psrlw_s (s1, 1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == INT32x2_MAX);
+  assert (r.a[1] == INT32x2_MAX);
+}
+
+static void test_psubw_u (void)
+{
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 3;
+  s.a[1] = 4;
+  t.a[0] = 2;
+  t.a[1] = 1;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = psubw_u (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 3);
+}
+
+static void test_psubw_s (void)
+{
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = -2;
+  s.a[1] = -1;
+  t.a[0] = 3;
+  t.a[1] = -4;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = psubw_s (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == -5);
+  assert (r.a[1] == 3);
+}
+
+static void test_psubh_u (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 5;
+  s.a[1] = 6;
+  s.a[2] = 7;
+  s.a[3] = 8;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = psubh_u (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 4);
+  assert (r.a[1] == 4);
+  assert (r.a[2] == 4);
+  assert (r.a[3] == 4);
+}
+
+static void test_psubh_s (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = psubh_s (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -11);
+  assert (r.a[1] == -22);
+  assert (r.a[2] == -33);
+  assert (r.a[3] == -44);
+}
+
+static void test_psubb_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 10;
+  s.a[1] = 11;
+  s.a[2] = 12;
+  s.a[3] = 13;
+  s.a[4] = 14;
+  s.a[5] = 15;
+  s.a[6] = 16;
+  s.a[7] = 17;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = psubb_u (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 9);
+  assert (r.a[1] == 9);
+  assert (r.a[2] == 9);
+  assert (r.a[3] == 9);
+  assert (r.a[4] == 9);
+  assert (r.a[5] == 9);
+  assert (r.a[6] == 9);
+  assert (r.a[7] == 9);
+}
+
+static void test_psubb_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -10;
+  s.a[1] = -20;
+  s.a[2] = -30;
+  s.a[3] = -40;
+  s.a[4] = -50;
+  s.a[5] = -60;
+  s.a[6] = -70;
+  s.a[7] = -80;
+  t.a[0] = 1;
+  t.a[1] = 2;
+  t.a[2] = 3;
+  t.a[3] = 4;
+  t.a[4] = 5;
+  t.a[5] = 6;
+  t.a[6] = 7;
+  t.a[7] = 8;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = psubb_s (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == -11);
+  assert (r.a[1] == -22);
+  assert (r.a[2] == -33);
+  assert (r.a[3] == -44);
+  assert (r.a[4] == -55);
+  assert (r.a[5] == -66);
+  assert (r.a[6] == -77);
+  assert (r.a[7] == -88);
+}
+
+static void test_psubd_u (void)
+{
+  uint64_t d = 789012;
+  uint64_t e = 123456;
+  uint64_t r;
+  r = psubd_u (d, e);
+  assert (r == 665556);
+}
+
+static void test_psubd_s (void)
+{
+  int64_t d = 123456;
+  int64_t e = -789012;
+  int64_t r;
+  r = psubd_s (d, e);
+  assert (r == 912468);
+}
+
+static void test_psubsh (void)
+{
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 0;
+  s.a[2] = 1;
+  s.a[3] = 2;
+  t.a[0] = -INT16x4_MAX;
+  t.a[1] = -INT16x4_MAX;
+  t.a[2] = -INT16x4_MAX;
+  t.a[3] = -INT16x4_MAX;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = psubsh (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == INT16x4_MAX - 1);
+  assert (r.a[1] == INT16x4_MAX);
+  assert (r.a[2] == INT16x4_MAX);
+  assert (r.a[3] == INT16x4_MAX);
+}
+
+static void test_psubsb (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -6;
+  s.a[1] = -5;
+  s.a[2] = -4;
+  s.a[3] = -3;
+  s.a[4] = -2;
+  s.a[5] = -1;
+  s.a[6] = 0;
+  s.a[7] = 1;
+  t.a[0] = -INT8x8_MAX;
+  t.a[1] = -INT8x8_MAX;
+  t.a[2] = -INT8x8_MAX;
+  t.a[3] = -INT8x8_MAX;
+  t.a[4] = -INT8x8_MAX;
+  t.a[5] = -INT8x8_MAX;
+  t.a[6] = -INT8x8_MAX;
+  t.a[7] = -INT8x8_MAX;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = psubsb (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == INT8x8_MAX - 6);
+  assert (r.a[1] == INT8x8_MAX - 5);
+  assert (r.a[2] == INT8x8_MAX - 4);
+  assert (r.a[3] == INT8x8_MAX - 3);
+  assert (r.a[4] == INT8x8_MAX - 2);
+  assert (r.a[5] == INT8x8_MAX - 1);
+  assert (r.a[6] == INT8x8_MAX);
+  assert (r.a[7] == INT8x8_MAX);
+}
+
+static void test_psubush (void)
+{
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 2;
+  s.a[3] = 3;
+  t.a[0] = 1;
+  t.a[1] = 1;
+  t.a[2] = 3;
+  t.a[3] = 3;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = psubush (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+}
+
+static void test_psubusb (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 0;
+  s.a[1] = 1;
+  s.a[2] = 2;
+  s.a[3] = 3;
+  s.a[4] = 4;
+  s.a[5] = 5;
+  s.a[6] = 6;
+  s.a[7] = 7;
+  t.a[0] = 1;
+  t.a[1] = 1;
+  t.a[2] = 3;
+  t.a[3] = 3;
+  t.a[4] = 5;
+  t.a[5] = 5;
+  t.a[6] = 7;
+  t.a[7] = 7;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = psubusb (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 0);
+  assert (r.a[1] == 0);
+  assert (r.a[2] == 0);
+  assert (r.a[3] == 0);
+  assert (r.a[4] == 0);
+  assert (r.a[5] == 0);
+  assert (r.a[6] == 0);
+  assert (r.a[7] == 0);
+}
+
+static void test_punpckhbh_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -3;
+  s.a[2] = -5;
+  s.a[3] = -7;
+  s.a[4] = -9;
+  s.a[5] = -11;
+  s.a[6] = -13;
+  s.a[7] = -15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = punpckhbh_s (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == -9);
+  assert (r.a[1] == 10);
+  assert (r.a[2] == -11);
+  assert (r.a[3] == 12);
+  assert (r.a[4] == -13);
+  assert (r.a[5] == 14);
+  assert (r.a[6] == -15);
+  assert (r.a[7] == 16);
+}
+
+static void test_punpckhbh_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  s.a[4] = 9;
+  s.a[5] = 11;
+  s.a[6] = 13;
+  s.a[7] = 15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = punpckhbh_u (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 9);
+  assert (r.a[1] == 10);
+  assert (r.a[2] == 11);
+  assert (r.a[3] == 12);
+  assert (r.a[4] == 13);
+  assert (r.a[5] == 14);
+  assert (r.a[6] == 15);
+  assert (r.a[7] == 16);
+}
+
+static void test_punpckhhw_s (void)
+{ 
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 3;
+  s.a[2] = -5;
+  s.a[3] = 7;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  t.a[2] = -6;
+  t.a[3] = 8;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = punpckhhw_s (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -5);
+  assert (r.a[1] == -6);
+  assert (r.a[2] == 7);
+  assert (r.a[3] == 8);
+}
+
+static void test_punpckhhw_u (void)
+{ 
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = punpckhhw_u (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 5);
+  assert (r.a[1] == 6);
+  assert (r.a[2] == 7);
+  assert (r.a[3] == 8);
+}
+
+static void test_punpckhwd_s (void)
+{ 
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = -4;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = punpckhwd_s (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == -4);
+}
+
+static void test_punpckhwd_u (void)
+{ 
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = punpckhwd_u (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 3);
+  assert (r.a[1] == 4);
+}
+
+static void test_punpcklbh_s (void)
+{
+  int8x8_encap_t s, t;
+  int8x8_t s1, t1;
+  int8x8_t r1;
+  int8x8_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = -3;
+  s.a[2] = -5;
+  s.a[3] = -7;
+  s.a[4] = -9;
+  s.a[5] = -11;
+  s.a[6] = -13;
+  s.a[7] = -15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  s1 = vec_load_sb (&s.v);
+  t1 = vec_load_sb (&t.v);
+  r1 = punpcklbh_s (s1, t1);
+  vec_store_sb (r1, &r.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == -3);
+  assert (r.a[3] == 4);
+  assert (r.a[4] == -5);
+  assert (r.a[5] == 6);
+  assert (r.a[6] == -7);
+  assert (r.a[7] == 8);
+}
+
+static void test_punpcklbh_u (void)
+{
+  uint8x8_encap_t s, t;
+  uint8x8_t s1, t1;
+  uint8x8_t r1;
+  uint8x8_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  s.a[4] = 9;
+  s.a[5] = 11;
+  s.a[6] = 13;
+  s.a[7] = 15;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  t.a[4] = 10;
+  t.a[5] = 12;
+  t.a[6] = 14;
+  t.a[7] = 16;
+  s1 = vec_load_ub (&s.v);
+  t1 = vec_load_ub (&t.v);
+  r1 = punpcklbh_u (s1, t1);
+  vec_store_ub (r1, &r.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+  assert (r.a[4] == 5);
+  assert (r.a[5] == 6);
+  assert (r.a[6] == 7);
+  assert (r.a[7] == 8);
+}
+
+static void test_punpcklhw_s (void)
+{ 
+  int16x4_encap_t s, t;
+  int16x4_t s1, t1;
+  int16x4_t r1;
+  int16x4_encap_t r;
+  s.a[0] = -1;
+  s.a[1] = 3;
+  s.a[2] = -5;
+  s.a[3] = 7;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  t.a[2] = -6;
+  t.a[3] = 8;
+  s1 = vec_load_sh (&s.v);
+  t1 = vec_load_sh (&t.v);
+  r1 = punpcklhw_s (s1, t1);
+  vec_store_sh (r1, &r.v);
+  assert (r.a[0] == -1);
+  assert (r.a[1] == -2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_punpcklhw_u (void)
+{ 
+  uint16x4_encap_t s, t;
+  uint16x4_t s1, t1;
+  uint16x4_t r1;
+  uint16x4_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  s.a[2] = 5;
+  s.a[3] = 7;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  t.a[2] = 6;
+  t.a[3] = 8;
+  s1 = vec_load_uh (&s.v);
+  t1 = vec_load_uh (&t.v);
+  r1 = punpcklhw_u (s1, t1);
+  vec_store_uh (r1, &r.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+  assert (r.a[2] == 3);
+  assert (r.a[3] == 4);
+}
+
+static void test_punpcklwd_s (void)
+{ 
+  int32x2_encap_t s, t;
+  int32x2_t s1, t1;
+  int32x2_t r1;
+  int32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = -2;
+  t.a[1] = 4;
+  s1 = vec_load_sw (&s.v);
+  t1 = vec_load_sw (&t.v);
+  r1 = punpcklwd_s (s1, t1);
+  vec_store_sw (r1, &r.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == -2);
+}
+
+static void test_punpcklwd_u (void)
+{ 
+  uint32x2_encap_t s, t;
+  uint32x2_t s1, t1;
+  uint32x2_t r1;
+  uint32x2_encap_t r;
+  s.a[0] = 1;
+  s.a[1] = 3;
+  t.a[0] = 2;
+  t.a[1] = 4;
+  s1 = vec_load_uw (&s.v);
+  t1 = vec_load_uw (&t.v);
+  r1 = punpcklwd_u (s1, t1);
+  vec_store_uw (r1, &r.v);
+  assert (r.a[0] == 1);
+  assert (r.a[1] == 2);
+}
+
+int main (void)
+{
+  test_packsswh ();
+  test_packsshb ();
+  test_packushb ();
+  test_paddw_u ();
+  test_paddw_s ();
+  test_paddh_u ();
+  test_paddh_s ();
+  test_paddb_u ();
+  test_paddb_s ();
+  test_paddd_u ();
+  test_paddd_s ();
+  test_paddsh ();
+  test_paddsb ();
+  test_paddush ();
+  test_paddusb ();
+  test_pandn_ud ();
+  test_pandn_sd ();
+  test_pandn_uw ();
+  test_pandn_sw ();
+  test_pandn_uh ();
+  test_pandn_sh ();
+  test_pandn_ub ();
+  test_pandn_sb ();
+  test_pavgh ();
+  test_pavgb ();
+  test_pcmpeqw_u ();
+  test_pcmpeqh_u ();
+  test_pcmpeqb_u ();
+  test_pcmpeqw_s ();
+  test_pcmpeqh_s ();
+  test_pcmpeqb_s ();
+  test_pcmpgtw_u ();
+  test_pcmpgth_u ();
+  test_pcmpgtb_u ();
+  test_pcmpgtw_s ();
+  test_pcmpgth_s ();
+  test_pcmpgtb_s ();
+  test_pextrh_u ();
+  test_pextrh_s ();
+  test_pinsrh_0123_u ();
+  test_pinsrh_0123_s ();
+  test_pmaddhw ();
+  test_pmaxsh ();
+  test_pmaxub ();
+  test_pminsh ();
+  test_pminub ();
+  test_pmovmskb_u ();
+  test_pmovmskb_s ();
+  test_pmulhuh ();
+  test_pmulhh ();
+  test_pmullh ();
+  test_pmuluw ();
+  test_pasubub ();
+  test_biadd ();
+  test_psadbh ();
+  test_pshufh_u ();
+  test_pshufh_s ();
+  test_psllh_u ();
+  test_psllw_u ();
+  test_psllh_s ();
+  test_psllw_s ();
+  test_psrah_u ();
+  test_psraw_u ();
+  test_psrah_s ();
+  test_psraw_s ();
+  test_psrlh_u ();
+  test_psrlw_u ();
+  test_psrlh_s ();
+  test_psrlw_s ();
+  test_psubw_u ();
+  test_psubw_s ();
+  test_psubh_u ();
+  test_psubh_s ();
+  test_psubb_u ();
+  test_psubb_s ();
+  test_psubd_u ();
+  test_psubd_s ();
+  test_psubsh ();
+  test_psubsb ();
+  test_psubush ();
+  test_psubusb ();
+  test_punpckhbh_s ();
+  test_punpckhbh_u ();
+  test_punpckhhw_s ();
+  test_punpckhhw_u ();
+  test_punpckhwd_s ();
+  test_punpckhwd_u ();
+  test_punpcklbh_s ();
+  test_punpcklbh_u ();
+  test_punpcklhw_s ();
+  test_punpcklhw_u ();
+  test_punpcklwd_s ();
+  test_punpcklwd_u ();
+  return 0;
+}
--- testsuite/lib/target-supports.exp	(/local/gcc-trunk/gcc)	(revision 373)
+++ testsuite/lib/target-supports.exp	(/local/gcc-2/gcc)	(revision 373)
@@ -1252,6 +1252,17 @@ proc check_effective_target_arm_neon_hw 
     } "-mfpu=neon -mfloat-abi=softfp"]
 }
 
+# Return 1 if this a Loongson-2E or -2F target using an ABI that supports
+# the Loongson vector modes.
+
+proc check_effective_target_mips_loongson { } {
+    return [check_no_compiler_messages loongson assembly {
+	#if !defined(__mips_loongson_vector_rev)
+	#error FOO
+	#endif
+    }]
+}
+
 # Return 1 if this is a PowerPC target with floating-point registers.
 
 proc check_effective_target_powerpc_fprs { } {
--- config.gcc	(/local/gcc-trunk/gcc)	(revision 373)
+++ config.gcc	(/local/gcc-2/gcc)	(revision 373)
@@ -349,6 +349,7 @@ m68k-*-*)
 mips*-*-*)
 	cpu_type=mips
 	need_64bit_hwint=yes
+	extra_headers="loongson.h"
 	;;
 powerpc*-*-*)
 	cpu_type=rs6000
--- config/mips/loongson.md	(/local/gcc-trunk/gcc)	(revision 373)
+++ config/mips/loongson.md	(/local/gcc-2/gcc)	(revision 373)
@@ -0,0 +1,429 @@
+;; Machine description for ST Microelectronics Loongson-2E/2F.
+;; Copyright (C) 2008 Free Software Foundation, Inc.
+;; Contributed by CodeSourcery.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Mode iterators and attributes.
+
+;; 64-bit vectors of bytes.
+(define_mode_iterator VB [V8QI])
+
+;; 64-bit vectors of halfwords.
+(define_mode_iterator VH [V4HI])
+
+;; 64-bit vectors of words.
+(define_mode_iterator VW [V2SI])
+
+;; 64-bit vectors of halfwords and bytes.
+(define_mode_iterator VHB [V4HI V8QI])
+
+;; 64-bit vectors of words and halfwords.
+(define_mode_iterator VWH [V2SI V4HI])
+
+;; 64-bit vectors of words, halfwords and bytes.
+(define_mode_iterator VWHB [V2SI V4HI V8QI])
+
+;; 64-bit vectors of words, halfwords and bytes; and DImode.
+(define_mode_iterator VWHBDI [V2SI V4HI V8QI DI])
+
+;; The Loongson instruction suffixes corresponding to the modes in the
+;; VWHB iterator.
+(define_mode_attr V_suffix [(V2SI "w") (V4HI "h") (V8QI "b")])
+
+;; Given a vector type T, the mode of a vector half the size of T
+;; and with the same number of elements.
+(define_mode_attr V_squash [(V2SI "V2HI") (V4HI "V4QI")])
+
+;; Given a vector type T, the mode of a vector the same size as T
+;; but with half as many elements.
+(define_mode_attr V_stretch_half [(V2SI "DI") (V4HI "V2SI") (V8QI "V4HI")])
+
+;; The Loongson instruction suffixes corresponding to the transformation
+;; expressed by V_stretch_half.
+(define_mode_attr V_stretch_half_suffix [(V2SI "wd") (V4HI "hw") (V8QI "bh")])
+
+;; Given a vector type T, the mode of a vector the same size as T
+;; but with twice as many elements.
+(define_mode_attr V_squash_double [(V2SI "V4HI") (V4HI "V8QI")])
+
+;; The Loongson instruction suffixes corresponding to the conversions
+;; specified by V_half_width.
+(define_mode_attr V_squash_double_suffix [(V2SI "wh") (V4HI "hb")])
+
+;; Move patterns.
+
+;; Expander to legitimize moves involving values of vector modes.
+(define_expand "mov<mode>"
+  [(set (match_operand:VWHB 0)
+	(match_operand:VWHB 1))]
+  ""
+{
+  if (mips_legitimize_move (<MODE>mode, operands[0], operands[1]))
+    DONE;
+})
+
+;; Handle legitimized moves between values of vector modes.
+(define_insn "mov<mode>_internal"
+  [(set (match_operand:VWHB 0 "nonimmediate_operand" "=m,f,f,d,f,d,m, f")
+	(match_operand:VWHB 1 "move_operand"          "f,m,f,f,d,d,YG,YG"))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  { return mips_output_move (operands[0], operands[1]); }
+  [(set_attr "type" "fpstore,fpload,fmove,mfc,mtc,move,fpstore,mtc")
+   (set_attr "mode" "DI")])
+
+;; Initialization of a vector.
+
+(define_expand "vec_init<mode>"
+  [(set (match_operand:VWHB 0 "register_operand")
+	(match_operand 1 ""))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+{
+  mips_expand_vector_init (operands[0], operands[1]);
+  DONE;
+})
+
+;; Instruction patterns for SIMD instructions.
+
+;; Pack with signed saturation.
+(define_insn "vec_pack_ssat_<mode>"
+  [(set (match_operand:<V_squash_double> 0 "register_operand" "=f")
+        (vec_concat:<V_squash_double>
+	 (ss_truncate:<V_squash>
+	  (match_operand:VWH 1 "register_operand" "f"))
+	 (ss_truncate:<V_squash>
+	  (match_operand:VWH 2 "register_operand" "f"))))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "packss<V_squash_double_suffix>\t%0,%1,%2")
+
+;; Pack with unsigned saturation.
+(define_insn "vec_pack_usat_<mode>"
+  [(set (match_operand:<V_squash_double> 0 "register_operand" "=f")
+        (vec_concat:<V_squash_double>
+	 (us_truncate:<V_squash>
+	  (match_operand:VH 1 "register_operand" "f"))
+	 (us_truncate:<V_squash>
+	  (match_operand:VH 2 "register_operand" "f"))))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "packus<V_squash_double_suffix>\t%0,%1,%2")
+
+;; Addition, treating overflow by wraparound.
+(define_insn "add<mode>3"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (plus:VWHB (match_operand:VWHB 1 "register_operand" "f")
+		   (match_operand:VWHB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "padd<V_suffix>\t%0,%1,%2")
+
+;; Addition of doubleword integers stored in FP registers.
+;; Overflow is treated by wraparound.
+(define_insn "paddd"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (plus:DI (match_operand:DI 1 "register_operand" "f")
+		 (match_operand:DI 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "paddd\t%0,%1,%2")
+
+;; Addition, treating overflow by signed saturation.
+(define_insn "ssadd<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (ss_plus:VHB (match_operand:VHB 1 "register_operand" "f")
+		     (match_operand:VHB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "padds<V_suffix>\t%0,%1,%2")
+
+;; Addition, treating overflow by unsigned saturation.
+(define_insn "usadd<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (us_plus:VHB (match_operand:VHB 1 "register_operand" "f")
+		     (match_operand:VHB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "paddus<V_suffix>\t%0,%1,%2")
+
+;; Logical AND NOT.
+(define_insn "loongson_and_not_<mode>"
+  [(set (match_operand:VWHBDI 0 "register_operand" "=f")
+        (and:VWHBDI
+	 (not:VWHBDI (match_operand:VWHBDI 1 "register_operand" "f"))
+	 (match_operand:VWHBDI 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pandn\t%0,%1,%2")
+
+;; Average.
+(define_insn "loongson_average_<mode>"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (unspec:VHB [(match_operand:VHB 1 "register_operand" "f")
+		     (match_operand:VHB 2 "register_operand" "f")]
+		    UNSPEC_LOONGSON_AVERAGE))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pavg<V_suffix>\t%0,%1,%2")
+
+;; Equality test.
+(define_insn "loongson_eq_<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_EQ))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pcmpeq<V_suffix>\t%0,%1,%2")
+
+;; Greater-than test.
+(define_insn "loongson_gt_<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_GT))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pcmpgt<V_suffix>\t%0,%1,%2")
+
+;; Extract halfword.
+(define_insn "loongson_extract_halfword"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+ 		    (match_operand:SI 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_EXTRACT_HALFWORD))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pextr<V_suffix>\t%0,%1,%2")
+
+;; Insert halfword.
+(define_insn "loongson_insert_halfword_0"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_INSERT_HALFWORD_0))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pinsr<V_suffix>_0\t%0,%1,%2")
+
+(define_insn "loongson_insert_halfword_1"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_INSERT_HALFWORD_1))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pinsr<V_suffix>_1\t%0,%1,%2")
+
+(define_insn "loongson_insert_halfword_2"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_INSERT_HALFWORD_2))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pinsr<V_suffix>_2\t%0,%1,%2")
+
+(define_insn "loongson_insert_halfword_3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_INSERT_HALFWORD_3))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pinsr<V_suffix>_3\t%0,%1,%2")
+
+;; Multiply and add packed integers.
+(define_insn "loongson_mult_add"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half> [(match_operand:VH 1 "register_operand" "f")
+				  (match_operand:VH 2 "register_operand" "f")]
+				 UNSPEC_LOONGSON_MULT_ADD))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmadd<V_stretch_half_suffix>\t%0,%1,%2")
+
+;; Maximum of signed halfwords.
+(define_insn "smax<mode>3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (smax:VH (match_operand:VH 1 "register_operand" "f")
+		 (match_operand:VH 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmaxs<V_suffix>\t%0,%1,%2")
+
+;; Maximum of unsigned bytes.
+(define_insn "umax<mode>3"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (umax:VB (match_operand:VB 1 "register_operand" "f")
+		 (match_operand:VB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmaxu<V_suffix>\t%0,%1,%2")
+
+;; Minimum of signed halfwords.
+(define_insn "smin<mode>3"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (smin:VH (match_operand:VH 1 "register_operand" "f")
+		 (match_operand:VH 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmins<V_suffix>\t%0,%1,%2")
+
+;; Minimum of unsigned bytes.
+(define_insn "umin<mode>3"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (umin:VB (match_operand:VB 1 "register_operand" "f")
+		 (match_operand:VB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pminu<V_suffix>\t%0,%1,%2")
+
+;; Move byte mask.
+(define_insn "loongson_move_byte_mask"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (unspec:VB [(match_operand:VB 1 "register_operand" "f")]
+		   UNSPEC_LOONGSON_MOVE_BYTE_MASK))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmovmsk<V_suffix>\t%0,%1")
+
+;; Multiply unsigned integers and store high result.
+(define_insn "umul<mode>3_highpart"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_UMUL_HIGHPART))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmulhu<V_suffix>\t%0,%1,%2")
+
+;; Multiply signed integers and store high result.
+(define_insn "smul<mode>3_highpart"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_SMUL_HIGHPART))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmulh<V_suffix>\t%0,%1,%2")
+
+;; Multiply signed integers and store low result.
+(define_insn "loongson_smul_lowpart"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "f")
+		    (match_operand:VH 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_SMUL_LOWPART))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmull<V_suffix>\t%0,%1,%2")
+
+;; Multiply unsigned word integers.
+(define_insn "loongson_umul_word"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (unspec:DI [(match_operand:VW 1 "register_operand" "f")
+		    (match_operand:VW 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_UMUL_WORD))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pmulu<V_suffix>\t%0,%1,%2")
+
+;; Absolute difference.
+(define_insn "loongson_pasubub"
+  [(set (match_operand:VB 0 "register_operand" "=f")
+        (unspec:VB [(match_operand:VB 1 "register_operand" "f")
+		    (match_operand:VB 2 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PASUBUB))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pasubub\t%0,%1,%2")
+
+;; Sum of unsigned byte integers.
+(define_insn "reduc_uplus_<mode>"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half> [(match_operand:VB 1 "register_operand" "f")]
+				 UNSPEC_LOONGSON_BIADD))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "biadd\t%0,%1")
+
+;; Sum of absolute differences.
+(define_insn "loongson_psadbh"
+  [(set (match_operand:<V_stretch_half> 0 "register_operand" "=f")
+        (unspec:<V_stretch_half> [(match_operand:VB 1 "register_operand" "f")
+				  (match_operand:VB 2 "register_operand" "f")]
+				 UNSPEC_LOONGSON_PSADBH))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pasubub\t%0,%1,%2;biadd\t%0,%0")
+
+;; Shuffle halfwords.
+(define_insn "loongson_pshufh"
+  [(set (match_operand:VH 0 "register_operand" "=f")
+        (unspec:VH [(match_operand:VH 1 "register_operand" "0")
+		    (match_operand:VH 2 "register_operand" "f")
+		    (match_operand:SI 3 "register_operand" "f")]
+		   UNSPEC_LOONGSON_PSHUFH))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "pshufh\t%0,%2,%3")
+
+;; Shift left logical.
+(define_insn "loongson_psll<mode>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (ashift:VWH (match_operand:VWH 1 "register_operand" "f")
+		    (match_operand:SI 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psll<V_suffix>\t%0,%1,%2")
+
+;; Shift right arithmetic.
+(define_insn "loongson_psra<mode>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (ashiftrt:VWH (match_operand:VWH 1 "register_operand" "f")
+		      (match_operand:SI 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psra<V_suffix>\t%0,%1,%2")
+
+;; Shift right logical.
+(define_insn "loongson_psrl<mode>"
+  [(set (match_operand:VWH 0 "register_operand" "=f")
+        (lshiftrt:VWH (match_operand:VWH 1 "register_operand" "f")
+		      (match_operand:SI 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psrl<V_suffix>\t%0,%1,%2")
+
+;; Subtraction, treating overflow by wraparound.
+(define_insn "sub<mode>3"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (minus:VWHB (match_operand:VWHB 1 "register_operand" "f")
+		    (match_operand:VWHB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psub<V_suffix>\t%0,%1,%2")
+
+;; Subtraction of doubleword integers stored in FP registers.
+;; Overflow is treated by wraparound.
+(define_insn "psubd"
+  [(set (match_operand:DI 0 "register_operand" "=f")
+        (minus:DI (match_operand:DI 1 "register_operand" "f")
+		  (match_operand:DI 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psubd\t%0,%1,%2")
+
+;; Subtraction, treating overflow by signed saturation.
+(define_insn "sssub<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (ss_minus:VHB (match_operand:VHB 1 "register_operand" "f")
+		      (match_operand:VHB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psubs<V_suffix>\t%0,%1,%2")
+
+;; Subtraction, treating overflow by unsigned saturation.
+(define_insn "ussub<mode>3"
+  [(set (match_operand:VHB 0 "register_operand" "=f")
+        (us_minus:VHB (match_operand:VHB 1 "register_operand" "f")
+		      (match_operand:VHB 2 "register_operand" "f")))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "psubus<V_suffix>\t%0,%1,%2")
+
+;; Unpack high data.
+(define_insn "vec_interleave_high<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_UNPACK_HIGH))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "punpckh<V_stretch_half_suffix>\t%0,%1,%2")
+
+;; Unpack low data.
+(define_insn "vec_interleave_low<mode>"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+        (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
+		      (match_operand:VWHB 2 "register_operand" "f")]
+		     UNSPEC_LOONGSON_UNPACK_LOW))]
+  "HAVE_LOONGSON_VECTOR_MODES"
+  "punpckl<V_stretch_half_suffix>\t%0,%1,%2")
--- config/mips/mips-ftypes.def	(/local/gcc-trunk/gcc)	(revision 373)
+++ config/mips/mips-ftypes.def	(/local/gcc-2/gcc)	(revision 373)
@@ -66,6 +66,24 @@ DEF_MIPS_FTYPE (1, (SF, SF))
 DEF_MIPS_FTYPE (2, (SF, SF, SF))
 DEF_MIPS_FTYPE (1, (SF, V2SF))
 
+DEF_MIPS_FTYPE (2, (UDI, UDI, UDI))
+DEF_MIPS_FTYPE (2, (UDI, UV2SI, UV2SI))
+
+DEF_MIPS_FTYPE (2, (UV2SI, UV2SI, UQI))
+DEF_MIPS_FTYPE (2, (UV2SI, UV2SI, UV2SI))
+
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, UQI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, USI))
+DEF_MIPS_FTYPE (3, (UV4HI, UV4HI, UV4HI, UQI))
+DEF_MIPS_FTYPE (3, (UV4HI, UV4HI, UV4HI, USI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV4HI, UV4HI))
+DEF_MIPS_FTYPE (1, (UV4HI, UV8QI))
+DEF_MIPS_FTYPE (2, (UV4HI, UV8QI, UV8QI))
+
+DEF_MIPS_FTYPE (2, (UV8QI, UV4HI, UV4HI))
+DEF_MIPS_FTYPE (1, (UV8QI, UV8QI))
+DEF_MIPS_FTYPE (2, (UV8QI, UV8QI, UV8QI))
+
 DEF_MIPS_FTYPE (1, (V2HI, SI))
 DEF_MIPS_FTYPE (2, (V2HI, SI, SI))
 DEF_MIPS_FTYPE (3, (V2HI, SI, SI, SI))
@@ -81,12 +99,27 @@ DEF_MIPS_FTYPE (2, (V2SF, V2SF, V2SF))
 DEF_MIPS_FTYPE (3, (V2SF, V2SF, V2SF, INT))
 DEF_MIPS_FTYPE (4, (V2SF, V2SF, V2SF, V2SF, V2SF))
 
+DEF_MIPS_FTYPE (2, (V2SI, V2SI, UQI))
+DEF_MIPS_FTYPE (2, (V2SI, V2SI, V2SI))
+DEF_MIPS_FTYPE (2, (V2SI, V4HI, V4HI))
+
+DEF_MIPS_FTYPE (2, (V4HI, V2SI, V2SI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, UQI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, USI))
+DEF_MIPS_FTYPE (2, (V4HI, V4HI, V4HI))
+DEF_MIPS_FTYPE (3, (V4HI, V4HI, V4HI, UQI))
+DEF_MIPS_FTYPE (3, (V4HI, V4HI, V4HI, USI))
+
 DEF_MIPS_FTYPE (1, (V4QI, SI))
 DEF_MIPS_FTYPE (2, (V4QI, V2HI, V2HI))
 DEF_MIPS_FTYPE (1, (V4QI, V4QI))
 DEF_MIPS_FTYPE (2, (V4QI, V4QI, SI))
 DEF_MIPS_FTYPE (2, (V4QI, V4QI, V4QI))
 
+DEF_MIPS_FTYPE (2, (V8QI, V4HI, V4HI))
+DEF_MIPS_FTYPE (1, (V8QI, V8QI))
+DEF_MIPS_FTYPE (2, (V8QI, V8QI, V8QI))
+
 DEF_MIPS_FTYPE (2, (VOID, SI, SI))
 DEF_MIPS_FTYPE (2, (VOID, V2HI, V2HI))
 DEF_MIPS_FTYPE (2, (VOID, V4QI, V4QI))
--- config/mips/mips-protos.h	(/local/gcc-trunk/gcc)	(revision 373)
+++ config/mips/mips-protos.h	(/local/gcc-2/gcc)	(revision 373)
@@ -303,4 +303,6 @@ union mips_gen_fn_ptrs
 extern void mips_expand_atomic_qihi (union mips_gen_fn_ptrs,
 				     rtx, rtx, rtx, rtx);
 
+extern void mips_expand_vector_init (rtx, rtx);
+
 #endif /* ! GCC_MIPS_PROTOS_H */
--- config/mips/loongson.h	(/local/gcc-trunk/gcc)	(revision 373)
+++ config/mips/loongson.h	(/local/gcc-2/gcc)	(revision 373)
@@ -0,0 +1,769 @@
+/* Intrinsics for ST Microelectronics Loongson-2E/2F SIMD operations.
+
+   Copyright (C) 2008 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 2, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING.  If not, write to the
+   Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston,
+   MA 02110-1301, USA.  */
+
+/* As a special exception, if you include this header file into source
+   files compiled by GCC, this header file does not by itself cause
+   the resulting executable to be covered by the GNU General Public
+   License.  This exception does not however invalidate any other
+   reasons why the executable file might be covered by the GNU General
+   Public License.  */
+
+#ifndef _GCC_LOONGSON_H
+#define _GCC_LOONGSON_H
+
+#if !defined(__mips_loongson_vector_rev)
+# error "You must select -march=loongson2e or -march=loongson2f to use loongson.h"
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+/* Vectors of unsigned bytes, halfwords and words.  */
+typedef uint8_t uint8x8_t __attribute__((vector_size (8)));
+typedef uint16_t uint16x4_t __attribute__((vector_size (8)));
+typedef uint32_t uint32x2_t __attribute__((vector_size (8)));
+
+/* Vectors of signed bytes, halfwords and words.  */
+typedef int8_t int8x8_t __attribute__((vector_size (8)));
+typedef int16_t int16x4_t __attribute__((vector_size (8)));
+typedef int32_t int32x2_t __attribute__((vector_size (8)));
+
+/* Helpers for loading and storing vectors.  */
+
+/* Load from memory.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vec_load_uw (uint32x2_t *src)
+{
+  return *src;
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vec_load_uh (uint16x4_t *src)
+{
+  return *src;
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vec_load_ub (uint8x8_t *src)
+{
+  return *src;
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vec_load_sw (int32x2_t *src)
+{
+  return *src;
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vec_load_sh (int16x4_t *src)
+{
+  return *src;
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vec_load_sb (int8x8_t *src)
+{
+  return *src;
+}
+
+/* Store to memory.  */
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vec_store_uw (uint32x2_t v, uint32x2_t *dest)
+{
+  *dest = v;
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vec_store_uh (uint16x4_t v, uint16x4_t *dest)
+{
+  *dest = v;
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vec_store_ub (uint8x8_t v, uint8x8_t *dest)
+{
+  *dest = v;
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vec_store_sw (int32x2_t v, int32x2_t *dest)
+{
+  *dest = v;
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vec_store_sh (int16x4_t v, int16x4_t *dest)
+{
+  *dest = v;
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vec_store_sb (int8x8_t v, int8x8_t *dest)
+{
+  *dest = v;
+}
+
+/* SIMD intrinsics.
+   Unless otherwise noted, calls to the functions below will expand into
+   precisely one machine instruction, modulo any moves required to
+   satisfy register allocation constraints.  */
+
+/* Pack with signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+packsswh (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_packsswh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+packsshb (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_packsshb (s, t);
+}
+
+/* Pack with unsigned saturation.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+packushb (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_packushb (s, t);
+}
+
+/* Vector addition, treating overflow by wraparound.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+paddw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_paddw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+paddh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_paddh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+paddb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_paddb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+paddw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_paddw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+paddh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_paddh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+paddb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_paddb_s (s, t);
+}
+
+/* Addition of doubleword integers, treating overflow by wraparound.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+paddd_u (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_paddd_u (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+paddd_s (int64_t s, int64_t t)
+{
+  return __builtin_loongson_paddd_s (s, t);
+}
+
+/* Vector addition, treating overflow by signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+paddsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_paddsh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+paddsb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_paddsb (s, t);
+}
+
+/* Vector addition, treating overflow by unsigned saturation.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+paddush (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_paddush (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+paddusb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_paddusb (s, t);
+}
+
+/* Logical AND NOT.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+pandn_ud (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_pandn_ud (s, t);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pandn_uw (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pandn_uw (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pandn_uh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pandn_uh (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pandn_ub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pandn_ub (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+pandn_sd (int64_t s, int64_t t)
+{
+  return __builtin_loongson_pandn_sd (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pandn_sw (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pandn_sw (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pandn_sh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pandn_sh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pandn_sb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pandn_sb (s, t);
+}
+
+/* Average.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pavgh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pavgh (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pavgb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pavgb (s, t);
+}
+
+/* Equality test.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pcmpeqw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pcmpeqw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pcmpeqh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pcmpeqh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pcmpeqb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pcmpeqb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pcmpeqw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pcmpeqw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pcmpeqh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pcmpeqh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pcmpeqb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pcmpeqb_s (s, t);
+}
+
+/* Greater-than test.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+pcmpgtw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pcmpgtw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pcmpgth_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pcmpgth_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pcmpgtb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pcmpgtb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pcmpgtw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_pcmpgtw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pcmpgth_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pcmpgth_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pcmpgtb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_pcmpgtb_s (s, t);
+}
+
+/* Extract halfword.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pextrh_u (uint16x4_t s, int field /* 0--3 */)
+{
+  return __builtin_loongson_pextrh_u (s, field);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pextrh_s (int16x4_t s, int field /* 0--3 */)
+{
+  return __builtin_loongson_pextrh_s (s, field);
+}
+
+/* Insert halfword.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_0_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_0_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_1_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_1_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_2_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_2_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pinsrh_3_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pinsrh_3_u (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_0_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_0_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_1_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_1_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_2_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_2_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pinsrh_3_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pinsrh_3_s (s, t);
+}
+
+/* Multiply and add.  */
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+pmaddhw (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmaddhw (s, t);
+}
+
+/* Maximum of signed halfwords.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmaxsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmaxsh (s, t);
+}
+
+/* Maximum of unsigned bytes.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pmaxub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pmaxub (s, t);
+}
+
+/* Minimum of signed halfwords.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pminsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pminsh (s, t);
+}
+
+/* Minimum of unsigned bytes.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pminub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pminub (s, t);
+}
+
+/* Move byte mask.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pmovmskb_u (uint8x8_t s)
+{
+  return __builtin_loongson_pmovmskb_u (s);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+pmovmskb_s (int8x8_t s)
+{
+  return __builtin_loongson_pmovmskb_s (s);
+}
+
+/* Multiply unsigned integers and store high result.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pmulhuh (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_pmulhuh (s, t);
+}
+
+/* Multiply signed integers and store high result.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmulhh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmulhh (s, t);
+}
+
+/* Multiply signed integers and store low result.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pmullh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_pmullh (s, t);
+}
+
+/* Multiply unsigned word integers.  */
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+pmuluw (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_pmuluw (s, t);
+}
+
+/* Absolute difference.  */
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+pasubub (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_pasubub (s, t);
+}
+
+/* Sum of unsigned byte integers.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+biadd (uint8x8_t s)
+{
+  return __builtin_loongson_biadd (s);
+}
+
+/* Sum of absolute differences.
+   Note that this intrinsic expands into two machine instructions:
+   PASUBUB followed by BIADD.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psadbh (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psadbh (s, t);
+}
+
+/* Shuffle halfwords.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order)
+{
+  return __builtin_loongson_pshufh_u (dest, s, order);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order)
+{
+  return __builtin_loongson_pshufh_s (dest, s, order);
+}
+
+/* Shift left logical.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psllh_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllh_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psllh_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllh_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psllw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psllw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psllw_s (s, amount);
+}
+
+/* Shift right logical.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psrlh_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlh_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psrlh_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlh_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psrlw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psrlw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrlw_s (s, amount);
+}
+
+/* Shift right arithmetic.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psrah_u (uint16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrah_u (s, amount);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psrah_s (int16x4_t s, uint8_t amount)
+{
+  return __builtin_loongson_psrah_s (s, amount);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psraw_u (uint32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psraw_u (s, amount);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psraw_s (int32x2_t s, uint8_t amount)
+{
+  return __builtin_loongson_psraw_s (s, amount);
+}
+
+/* Vector subtraction, treating overflow by wraparound.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+psubw_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_psubw_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psubh_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_psubh_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+psubb_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psubb_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+psubw_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_psubw_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psubh_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_psubh_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+psubb_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_psubb_s (s, t);
+}
+
+/* Subtraction of doubleword integers, treating overflow by wraparound.  */
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+psubd_u (uint64_t s, uint64_t t)
+{
+  return __builtin_loongson_psubd_u (s, t);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+psubd_s (int64_t s, int64_t t)
+{
+  return __builtin_loongson_psubd_s (s, t);
+}
+
+/* Vector subtraction, treating overflow by signed saturation.  */
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+psubsh (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_psubsh (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+psubsb (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_psubsb (s, t);
+}
+
+/* Vector subtraction, treating overflow by unsigned saturation.  */
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+psubush (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_psubush (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+psubusb (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_psubusb (s, t);
+}
+
+/* Unpack high data.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+punpckhwd_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_punpckhwd_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+punpckhhw_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_punpckhhw_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+punpckhbh_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_punpckhbh_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+punpckhwd_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_punpckhwd_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+punpckhhw_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_punpckhhw_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+punpckhbh_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_punpckhbh_s (s, t);
+}
+
+/* Unpack low data.  */
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+punpcklwd_u (uint32x2_t s, uint32x2_t t)
+{
+  return __builtin_loongson_punpcklwd_u (s, t);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+punpcklhw_u (uint16x4_t s, uint16x4_t t)
+{
+  return __builtin_loongson_punpcklhw_u (s, t);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+punpcklbh_u (uint8x8_t s, uint8x8_t t)
+{
+  return __builtin_loongson_punpcklbh_u (s, t);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+punpcklwd_s (int32x2_t s, int32x2_t t)
+{
+  return __builtin_loongson_punpcklwd_s (s, t);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+punpcklhw_s (int16x4_t s, int16x4_t t)
+{
+  return __builtin_loongson_punpcklhw_s (s, t);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+punpcklbh_s (int8x8_t s, int8x8_t t)
+{
+  return __builtin_loongson_punpcklbh_s (s, t);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
--- config/mips/mips.h	(/local/gcc-trunk/gcc)	(revision 373)
+++ config/mips/mips.h	(/local/gcc-2/gcc)	(revision 373)
@@ -266,6 +266,11 @@ enum mips_code_readable_setting {
 				     || mips_tune == PROCESSOR_74KF3_2)
 #define TUNE_20KC		    (mips_tune == PROCESSOR_20KC)
 
+/* Whether vector modes and intrinsics for ST Microelectronics
+   Loongson-2E/2F processors should be enabled.  In o32 pairs of
+   floating-point registers provide 64-bit values.  */
+#define HAVE_LOONGSON_VECTOR_MODES TARGET_LOONGSON_2EF
+
 /* True if the pre-reload scheduler should try to create chains of
    multiply-add or multiply-subtract instructions.  For example,
    suppose we have:
@@ -496,6 +501,10 @@ enum mips_code_readable_setting {
 	  builtin_define_std ("MIPSEL");				\
 	  builtin_define ("_MIPSEL");					\
 	}								\
+                                                                        \
+      /* Whether Loongson vector modes are enabled.  */                 \
+      if (HAVE_LOONGSON_VECTOR_MODES)                                   \
+        builtin_define ("__mips_loongson_vector_rev");                  \
 									\
       /* Macros dependent on the C dialect.  */				\
       if (preprocessing_asm_p ())					\
--- config/mips/mips-modes.def	(/local/gcc-trunk/gcc)	(revision 373)
+++ config/mips/mips-modes.def	(/local/gcc-2/gcc)	(revision 373)
@@ -26,6 +26,7 @@ RESET_FLOAT_FORMAT (DF, mips_double_form
 FLOAT_MODE (TF, 16, mips_quad_format);
 
 /* Vector modes.  */
+VECTOR_MODES (INT, 8);        /*       V8QI V4HI V2SI */
 VECTOR_MODES (FLOAT, 8);      /*            V4HF V2SF */
 VECTOR_MODES (INT, 4);        /*            V4QI V2HI */

Follow-Ups:
- Re: [MIPS][LS2][2/5] Vector intrinsics
  - From: Richard Sandiford

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]