SSE5 patches part 2

Michael Meissner michael.meissner@amd.com
Thu Sep 6 20:43:00 GMT 2007


I sent an early version of this patch out to Jan Hubicka and he had some
comments on it.  I figured I would send out this patch to the wider audience
to get other suggestions while I work on the suggestions Jan made.  This is
patch #2 of 2.  It assumes patch #1 of the SSE5 is applied (that isn't checked
in right now).

-- 
Michael Meissner, AMD
90 Central Street, MS 83-29, Boxborough, MA, 01719, USA
michael.meissner@amd.com
-------------- next part --------------
<gcc changes>
2007-09-06  Michael Meissner  <michael.meissner@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>
	    Tony Linthicum  <tony.linthicum@amd.com>

	* config/i386/i386.md (UNSPEC_PMADCSWD): New constant for SSE5
	instruction generation.
	(UNSPEC_PHADDBW): Ditto.
	(UNSPEC_PHADDBD): Ditto.
	(UNSPEC_PHADDBQ): Ditto.
	(UNSPEC_PHADDWD): Ditto.
	(UNSPEC_PHADDWQ): Ditto.
	(UNSPEC_PHADDDQ): Ditto.
	(UNSPEC_PHADDUBW): Ditto.
	(UNSPEC_PHADDUBD): Ditto.
	(UNSPEC_PHADDUBQ): Ditto.
	(UNSPEC_PHADDUWD): Ditto.
	(UNSPEC_PHADDUWQ): Ditto.
	(UNSPEC_PHADDUDQ): Ditto.
	(UNSPEC_PHSUBBW): Ditto.
	(UNSPEC_PHSUBWD): Ditto.
	(UNSPEC_PHSUBDQ): Ditto.
	(UNSPEC_PROTB): Ditto.
	(UNSPEC_PROTW): Ditto.
	(UNSPEC_PROTD): Ditto.
	(UNSPEC_PROTQ): Ditto.
	(UNSPEC_PSHLB): Ditto.
	(UNSPEC_PSHLW): Ditto.
	(UNSPEC_PSHLD): Ditto.
	(UNSPEC_PSHLQ): Ditto.
	(UNSPEC_PSHAB): Ditto.
	(UNSPEC_PSHAW): Ditto.
	(UNSPEC_PSHAD): Ditto.
	(UNSPEC_PSHAQ): Ditto.
	(UNSPEC_FRCZ): Ditto.
	(UNSPEC_CVTPH2PS): Ditto.
	(UNSPEC_CVTPS2PH): Ditto.
	(PCOM_FALSE): Ditto.
	(PCOM_FALSE): Ditto.
	(PCOM_FALSE): Ditto.
	(PCOM_FALSE): Ditto.
	(PCOM_FALSE): Ditto.
	(PCOM_FALSE): Ditto.
	(PCOM_FALSE): Ditto.
	(PCOM_FALSE): Ditto.
	(PCOM_TRUE): Ditto.
	(PCOM_TRUE): Ditto.
	(PCOM_TRUE): Ditto.
	(PCOM_TRUE): Ditto.
	(PCOM_TRUE): Ditto.
	(PCOM_TRUE): Ditto.
	(PCOM_TRUE): Ditto.
	(PCOM_TRUE): Ditto.
	(COM_FALSE): Ditto.
	(COM_FALSE): Ditto.
	(COM_TRUE): Ditto.
	(COM_TRUE): Ditto.
	(UNSPEC_PPERM): New constant for integer
	multiply and add instructions.
	(UNSPEC_PERMPS): Ditto.
	(UNSPEC_PERMPD): Ditto.
	(UNSPEC_PMACSSWW): Ditto.
	(UNSPEC_PMACSWW): Ditto.
	(UNSPEC_PMACSSWD): Ditto.
	(UNSPEC_PMACSWD): Ditto.
	(UNSPEC_PMACSSDD): Ditto.
	(UNSPEC_PMACSDD): Ditto.
	(UNSPEC_PMACSSDQL): Ditto.
	(UNSPEC_PMACSSDQH): Ditto.
	(UNSPEC_PMACSDQL): Ditto.
	(UNSPEC_PMACSDQH): Ditto.
	(UNSPEC_PMADCSSWD): Ditto.
	(UNSPEC_PMADCSWD): Ditto.
	(UNSPEC_INTRINSIC_UNS): New constant for unsigned comparison
	intrinsics.
	(UNSPEC_TRUEFALSE): New constant for comtrue, comfalse, pcomtrue,
	and pcomfalse instruction intrinsics.

	* config/i386/sse.md (sse5_phaddbw): Add SSE5 horizontal
	add/subtract instruction support.
	(sse5_phaddbd): Ditto.
	(sse5_phaddbq): Ditto.
	(sse5_phaddwd): Ditto.
	(sse5_phaddwq): Ditto.
	(sse5_phadddq): Ditto.
	(sse5_phaddubw): Ditto.
	(sse5_phaddubd): Ditto.
	(sse5_phaddubq): Ditto.
	(sse5_phadduwd): Ditto.
	(sse5_phadduwq): Ditto.
	(sse5_phaddudq): Ditto.
	(sse5_phsubbw): Ditto.
	(sse5_phsubwd): Ditto.
	(sse5_phsubdq): Ditto.
	(rotl<mode>3): Add SSE5 rotate instruction support.
	(sse5_protw_imm): Ditto.
	(sse5_protd_imm): Ditto.
	(sse5_protb): Ditto.
	(sse5_protw): Ditto.
	(sse5_protd): Ditto.
	(sse5_protq): Ditto.
	(sse5_pshlb): Add SSE5 shift instruction support.
	(sse5_pshlw): Ditto.
	(sse5_pshld): Ditto.
	(sse5_pshab): Ditto.
	(sse5_pshaw): Ditto.
	(sse5_pshad): Ditto.
	(sse5_pshaq): Ditto.
	(floorv4sf2): Add vector versions of round functions.
	(nearbyintv4sf2): Ditto.
	(ceilv4sf2): Ditto.
	(floorv2df2): Ditto.
	(nearbyintv2df2): Ditto.
	(ceilv2df2): Ditto.
	(sse5_frczpd): Add SSE5 FRCZ instruction support.
	(sse5_frczps): Ditto.
	(sse5_frczsd): Ditto.
	(sse5_frczss): Ditto.
	(sse5_cvtph2ps): Add SSE5 convert 16-bit floating instruction
	support.
	(sse5_cvtps2ph): Ditto.
	(sse5_maskcmp_s_<mode>): Add SSE5 vector compare instruction
	support.
	(sse5_com_tf<mode>3): Ditto.
	(sse5_maskcmp<mode>3): Ditto.
	(sse5_maskcmp_uns<mode>): Ditto.
	(sse5_maskcmp_uns2<mode>3): Ditto.
	(sse5_setccv2di_8): Ditto.
	(sse5_setccv2di_16): Ditto.
	(sse5_setccv2di_32): Ditto.
	(sse5_setccv2di_u8): Ditto.
	(sse5_setccv2di_u16): Ditto.
	(sse5_setccv2di_u32): Ditto.
	(sse5_pcom_tfdi3): Ditto.
	(sse5_mulv16qi3): New expander for vector 8-bit multiplies on SSE5.
	(mulv16qi3): Add SSE5 support.
	(mulv4si3): Ditto.
	(vec_unpacku_hi_v16qi): Ditto.
	(vec_unpacks_hi_v16qi): Ditto.
	(vec_unpacku_lo_v16qi): Ditto.
	(vec_unpacks_lo_v16qi): Ditto.
	(vec_unpacku_hi_v8hi): Ditto.
	(vec_unpacks_hi_v8hi): Ditto.
	(vec_unpacku_lo_v8hi): Ditto.
	(vec_unpacks_lo_v8hi): Ditto.
	(vec_unpacku_hi_v4si): Ditto.
	(vec_unpacks_hi_v4si): Ditto.
	(vec_unpacku_lo_v4si): Ditto.
	(vec_unpacks_lo_v4si): Ditto.
	(sse5_mulv4si3): New insn to fake 32-bit vector multiply with a
	32-bit vector multiply/add.
	(sse5_pmacsww_vector): Add SSE5 integer multiply/add instructions.
	(sse5_pmacsww_vector_b): Ditto.
	(sse5_pmacssww_vector): Ditto.
	(sse5_pmacssww_vector_b): Ditto.
	(sse5_pmacsdd_vector): Ditto.
	(sse5_pmacsdd_vector_b): Ditto.
	(sse5_pmacssdd_vector): Ditto.
	(sse5_pmacsdd_vector_b): Ditto.
	(sse5_pmacssww): Ditto.
	(sse5_pmacsww): Ditto.
	(sse5_pmacsswd): Ditto.
	(sse5_pmacswd): Ditto.
	(sse5_pmacssdd): Ditto.
	(sse5_pmacsdd): Ditto.
	(sse5_pmacssdql): Ditto.
	(sse5_pmacssdqh): Ditto.
	(sse5_pmacsdql): Ditto.
	(sse5_pmacsdqh): Ditto.
	(sse5_pmadcsswd): Ditto.
	(sse5_pmadcswd): Ditto.
	(sse5_pperm): Add SSE5 permute instructions.
	(sse5_pperm_unpack): Ditto.
	(sse5_permps): Ditto.
	(sse5_permpd): Ditto.

	* config/i386/i386-protos.h (ix86_expand_sse5_unpack): Add
	prototype.
	(ix86_expand_sse5_pack): Ditto.

	* config/i386/i386.c (ix86_expand_sse5_unpack): New function to
	unpack a vector int to the next larger size using the SSE5 pperm
	instruction.
	(ix86_expand_sse5_pack): New function to pack a vector int to the
	next smaller size using the SSE5 pperm instruction.
	(enum ix86_builtins): Add builtins for the SSE5 instructions.
	(enum multi_arg_type): New enum to describe SSE5 intrinsic
	function arguments.
	(bdesc_multi_arg): New array to describe the SSE5 intrinsic
	functions.
	(ix86_init_mmx_sse_builtins): Add SSE5 intrinsic support.
	(ix86_expand_multi_arg_builtin): New function to build the
	SSE5 intrinsics.
	(ix86_expand_builtin): Call ix86_expand_multi_arg_builtin.

	* config/i386/predicates.md (const_0_to_31_operand): New predicate
	to recognize 0..31.

	* config.gcc (i[34567]86-*-*): Include bmmintrin.h.
	(x86_64-*-*): Ditto.

	* config/i386/cpuid.h (bit_SSE5): Define SSE5 bit.

	* config/i386/driver-i386.c (host_detect_local_cpu): Add basic
	SSE5 support.

	* config/i386/bmmintrin.h: New file, provide common x86 compiler
	intrinisics for SSE5.

	* doc/extend.texi (x86 intrinsics): Document new SSE5 intrinsics.

<gcc/testsuite changes>
2007-09-05  Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>
	    Michael Meissner  <michael.meissner@amd.com>

	* gcc.target/i386/sse5-hadduX.c: Add support for SSE5 tests.
	* gcc.target/i386/sse5-hsubX.c: Ditto.
	* gcc.target/i386/sse5-permpX.c: Ditto.
	* gcc.target/i386/sse5-haddX.c: Ditto.
	* gcc.target/i386/sse5-maccXX.c: Ditto.
	* gcc.target/i386/sse5-msubXX.c: Ditto.
	* gcc.target/i386/sse5-nmaccXX.c: Ditto.
	* gcc.target/i386/sse5-nmsubXX.c: Ditto.

	* gcc.target/i386/sse5-pcmov.c: New file to make sure the compiler
	optimizes floating point conditional moves into the pcmov
	instruction on SSE5.
	* gcc.target/i386/sse5-pcmov2.c: Ditto.

	* gcc.target/i386/sse5-ima-vector.c: New file to make sure the
	compiler optimizes vector 32-bit int (a*b)+c into pmacsdd on
	SSE5.

	* gcc.target/i386/sse5-fma-vector.c: New file to make sure the
	compiler optimizes vector (a*b)+c into fmadd on SSE5.

	* gcc.target/i386/sse5-fma.c: New file to make sure the compiler
	optimizes (a*b)+c into fmadd on SSE5.

	* gcc.target/i386/i386.exp (check_effective_target_sse5): Check
	whether the SSE5 instructions can be generated.

	* gcc.target/i386/sse5-check.h: New. Add support for 
	SSE5 tests.

*** gcc/config/i386/i386.md.~1~	2007-09-06 13:52:32.432974000 -0400
--- gcc/config/i386/i386.md	2007-09-06 11:24:19.827783000 -0400
***************
*** 181,186 ****
--- 181,232 ----
     (UNSPEC_SSE5_INTRINSIC_P	150)
     (UNSPEC_SSE5_INTRINSIC_S	151)
     (UNSPEC_SSE5_INTRINSIC_UNS	152)
+    (UNSPEC_SSE5_TRUEFALSE	153)
+    (UNSPEC_PPERM		154)
+    (UNSPEC_PERMPS		155)
+    (UNSPEC_PERMPD		156)
+    (UNSPEC_PMACSSWW		157)
+    (UNSPEC_PMACSWW		158)
+    (UNSPEC_PMACSSWD		159)
+    (UNSPEC_PMACSWD		160)
+    (UNSPEC_PMACSSDD		161)
+    (UNSPEC_PMACSDD		162)
+    (UNSPEC_PMACSSDQL		163)
+    (UNSPEC_PMACSSDQH		164)
+    (UNSPEC_PMACSDQL		165)
+    (UNSPEC_PMACSDQH		166)
+    (UNSPEC_PMADCSSWD		167)
+    (UNSPEC_PMADCSWD		168)
+    (UNSPEC_PHADDBW		169)
+    (UNSPEC_PHADDBD		170)
+    (UNSPEC_PHADDBQ		171)
+    (UNSPEC_PHADDWD		172)
+    (UNSPEC_PHADDWQ		173)
+    (UNSPEC_PHADDDQ		174)
+    (UNSPEC_PHADDUBW		175)
+    (UNSPEC_PHADDUBD		176)
+    (UNSPEC_PHADDUBQ		177)
+    (UNSPEC_PHADDUWD		178)
+    (UNSPEC_PHADDUWQ		179)
+    (UNSPEC_PHADDUDQ		180)
+    (UNSPEC_PHSUBBW		181)
+    (UNSPEC_PHSUBWD		182)
+    (UNSPEC_PHSUBDQ		183)
+    (UNSPEC_PROTB		184)
+    (UNSPEC_PROTW		185)
+    (UNSPEC_PROTD		186)
+    (UNSPEC_PROTQ		187)
+    (UNSPEC_PSHLB		188)
+    (UNSPEC_PSHLW		189)
+    (UNSPEC_PSHLD		190)
+    (UNSPEC_PSHLQ		191)
+    (UNSPEC_PSHAB		192)
+    (UNSPEC_PSHAW		193)
+    (UNSPEC_PSHAD		194)
+    (UNSPEC_PSHAQ		195)
+    (UNSPEC_FRCZ			196)
+    (UNSPEC_CVTPH2PS		197)
+    (UNSPEC_CVTPS2PH		198)
    ])
  
  (define_constants
***************
*** 201,206 ****
--- 247,275 ----
     (UNSPECV_PROLOGUE_USE	14)
    ])
  
+ ;; Constants to represent pcomtrue/pcomfalse varients
+ (define_constants
+   [(PCOM_FALSE_B		0)
+    (PCOM_FALSE_W		1)
+    (PCOM_FALSE_D		2)
+    (PCOM_FALSE_Q		3)
+    (PCOM_FALSE_UB		4)
+    (PCOM_FALSE_UW		5)
+    (PCOM_FALSE_UD		6)
+    (PCOM_FALSE_UQ		7)
+    (PCOM_TRUE_B			8)
+    (PCOM_TRUE_W			9)
+    (PCOM_TRUE_D			10)
+    (PCOM_TRUE_Q			11)
+    (PCOM_TRUE_UB		12)
+    (PCOM_TRUE_UW		13)
+    (PCOM_TRUE_UD		14)
+    (PCOM_TRUE_UQ		15)
+    (COM_FALSE_S			16)
+    (COM_FALSE_P			17)
+    (COM_TRUE_S			18)
+    (COM_TRUE_P			19)])
+ 
  ;; Registers by name.
  (define_constants
    [(BP_REG			 6)
*** gcc/config/i386/sse.md.~1~	2007-09-06 13:52:32.674863000 -0400
--- gcc/config/i386/sse.md	2007-09-06 11:25:10.687491000 -0400
***************
*** 3062,3067 ****
--- 3062,3106 ----
     (set_attr "prefix_data16" "1")
     (set_attr "mode" "TI")])
  
+ ;; SSE5 version of muls16qi3 that uses pperm to do the unpacking and repacking
+ (define_expand "sse5_mulv16qi3"
+   [(set (match_operand:V16QI 0 "register_operand" "")
+ 	(mult:V16QI (match_operand:V16QI 1 "register_operand" "")
+ 		    (match_operand:V16QI 2 "register_operand" "")))]
+   "TARGET_SSE5"
+ {
+   rtx t[6];
+   rtx op[3];
+   int i;
+ 
+   for (i = 0; i < 6; ++i)
+     t[i] = gen_reg_rtx (V8HImode);
+ 
+   /* Unpack data such that we've got a source byte in each low byte of
+      each word.  We don't care what goes into the high byte, so put 0
+      there.  */
+   for (i = 0; i < 2; i++)
+     {
+       op[0] = t[i];
+       op[1] = operands[i+1];
+       ix86_expand_sse5_unpack (op, true, true);		/* high bytes */
+ 
+       op[0] = t[i+2];
+       ix86_expand_sse5_unpack (op, true, false);		/* low bytes */
+     }
+ 
+   /* Multiply words.  */
+   emit_insn (gen_mulv8hi3 (t[4], t[0], t[1]));		/* high bytes */
+   emit_insn (gen_mulv8hi3 (t[5], t[2], t[3]));		/* low  bytes */
+ 
+   /* Pack the low byte of each word back into a single xmm */
+   op[0] = operands[0];
+   op[1] = t[5];
+   op[2] = t[4];
+   ix86_expand_sse5_pack (op);
+   DONE;
+ })
+ 
  (define_expand "mulv16qi3"
    [(set (match_operand:V16QI 0 "register_operand" "")
  	(mult:V16QI (match_operand:V16QI 1 "register_operand" "")
***************
*** 3071,3076 ****
--- 3110,3121 ----
    rtx t[12], op0;
    int i;
  
+   if (TARGET_SSE5)
+     {
+       emit_insn (gen_sse5_mulv16qi3 (operands[0], operands[1], operands[2]));
+       DONE;
+     }
+ 
    for (i = 0; i < 12; ++i)
      t[i] = gen_reg_rtx (V16QImode);
  
***************
*** 3258,3264 ****
  		   (match_operand:V4SI 2 "register_operand" "")))]
    "TARGET_SSE2"
  {
!   if (TARGET_SSE4_1)
      ix86_fixup_binary_operands_no_copy (MULT, V4SImode, operands);
   else
     {
--- 3303,3309 ----
  		   (match_operand:V4SI 2 "register_operand" "")))]
    "TARGET_SSE2"
  {
!   if (TARGET_SSE4_1 || TARGET_SSE5)
      ix86_fixup_binary_operands_no_copy (MULT, V4SImode, operands);
   else
     {
***************
*** 3316,3321 ****
--- 3361,3387 ----
     (set_attr "prefix_extra" "1")
     (set_attr "mode" "TI")])
  
+ ;; We don't have a straight 32-bit parallel multiply on SSE5, so fake it with a
+ ;; multiply/add
+ (define_insn_and_split "*sse5_mulv4si3"
+   [(set (match_operand:V4SI 0 "register_operand" "=&x")
+ 	(mult:V4SI (match_operand:V4SI 1 "register_operand" "%x")
+ 		   (match_operand:V4SI 2 "nonimmediate_operand" "xm")))]
+   "TARGET_SSE5"
+   "#"
+   "&& reload_completed"
+   [(set (match_dup 0)
+ 	(match_dup 3))
+    (set (match_dup 0)
+ 	(plus:V4SI (mult:V4SI (match_dup 1)
+ 			      (match_dup 2))
+ 		   (match_dup 0)))]
+ {
+  operands[3] = CONST0_RTX (V4SImode);
+ }
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")])
+ 
  (define_expand "mulv2di3"
    [(set (match_operand:V2DI 0 "register_operand" "")
  	(mult:V2DI (match_operand:V2DI 1 "register_operand" "")
***************
*** 5148,5153 ****
--- 5214,5221 ----
  {
    if (TARGET_SSE4_1)
      ix86_expand_sse4_unpack (operands, true, true);
+   else if (TARGET_SSE5)
+     ix86_expand_sse5_unpack (operands, true, true);
    else
      ix86_expand_sse_unpack (operands, true, true);
    DONE;
***************
*** 5160,5165 ****
--- 5228,5235 ----
  {
    if (TARGET_SSE4_1)
      ix86_expand_sse4_unpack (operands, false, true);
+   else if (TARGET_SSE5)
+     ix86_expand_sse5_unpack (operands, false, true);
    else
      ix86_expand_sse_unpack (operands, false, true);
    DONE;
***************
*** 5172,5177 ****
--- 5242,5249 ----
  {
    if (TARGET_SSE4_1)
      ix86_expand_sse4_unpack (operands, true, false);
+   else if (TARGET_SSE5)
+     ix86_expand_sse5_unpack (operands, true, false);
    else
      ix86_expand_sse_unpack (operands, true, false);
    DONE;
***************
*** 5184,5189 ****
--- 5256,5263 ----
  {
    if (TARGET_SSE4_1)
      ix86_expand_sse4_unpack (operands, false, false);
+   else if (TARGET_SSE5)
+     ix86_expand_sse5_unpack (operands, false, false);
    else
      ix86_expand_sse_unpack (operands, false, false);
    DONE;
***************
*** 5196,5201 ****
--- 5270,5277 ----
  {
    if (TARGET_SSE4_1)
      ix86_expand_sse4_unpack (operands, true, true);
+   else if (TARGET_SSE5)
+     ix86_expand_sse5_unpack (operands, true, true);
    else
      ix86_expand_sse_unpack (operands, true, true);
    DONE;
***************
*** 5208,5213 ****
--- 5284,5291 ----
  {
    if (TARGET_SSE4_1)
      ix86_expand_sse4_unpack (operands, false, true);
+   else if (TARGET_SSE5)
+     ix86_expand_sse5_unpack (operands, false, true);
    else
      ix86_expand_sse_unpack (operands, false, true);
    DONE;
***************
*** 5220,5225 ****
--- 5298,5305 ----
  {
    if (TARGET_SSE4_1)
      ix86_expand_sse4_unpack (operands, true, false);
+   else if (TARGET_SSE5)
+     ix86_expand_sse5_unpack (operands, true, false);
    else
      ix86_expand_sse_unpack (operands, true, false);
    DONE;
***************
*** 5232,5237 ****
--- 5312,5319 ----
  {
    if (TARGET_SSE4_1)
      ix86_expand_sse4_unpack (operands, false, false);
+   else if (TARGET_SSE5)
+     ix86_expand_sse5_unpack (operands, false, false);
    else
      ix86_expand_sse_unpack (operands, false, false);
    DONE;
***************
*** 5244,5249 ****
--- 5326,5333 ----
  {
    if (TARGET_SSE4_1)
      ix86_expand_sse4_unpack (operands, true, true);
+   else if (TARGET_SSE5)
+     ix86_expand_sse5_unpack (operands, true, true);
    else
      ix86_expand_sse_unpack (operands, true, true);
    DONE;
***************
*** 5256,5261 ****
--- 5340,5347 ----
  {
    if (TARGET_SSE4_1)
      ix86_expand_sse4_unpack (operands, false, true);
+   else if (TARGET_SSE5)
+     ix86_expand_sse5_unpack (operands, false, true);
    else
      ix86_expand_sse_unpack (operands, false, true);
    DONE;
***************
*** 5268,5273 ****
--- 5354,5361 ----
  {
    if (TARGET_SSE4_1)
      ix86_expand_sse4_unpack (operands, true, false);
+   else if (TARGET_SSE5)
+     ix86_expand_sse5_unpack (operands, true, false);
    else
      ix86_expand_sse_unpack (operands, true, false);
    DONE;
***************
*** 5280,5285 ****
--- 5368,5375 ----
  {
    if (TARGET_SSE4_1)
      ix86_expand_sse4_unpack (operands, false, false);
+   else if (TARGET_SSE5)
+     ix86_expand_sse5_unpack (operands, false, false);
    else
      ix86_expand_sse_unpack (operands, false, false);
    DONE;
***************
*** 7041,7046 ****
--- 7131,7433 ----
     (set_attr "memory" "none,load,none,load")
     (set_attr "mode" "TI")])
  
+ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+ ;;
+ ;; SSE5 instructions
+ ;;
+ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+ 
+ ;; SSE5 parallel multiply instructions.
+ 
+ ;; Note the instruction does not allow the value being added to be a memory
+ ;; operation.  However by pretending via the nonimmediate_operand predicate
+ ;; that it does and splitting it later allows the following to be recognized:
+ ;;	a[i] = b[i] * c[i] + d[i];
+ (define_insn "*sse5_pmacsww_vector"
+   [(set (match_operand:V8HI 0 "register_operand" "=x,x,x")
+         (plus:V8HI (mult:V8HI (match_operand:V8HI 1 "nonimmediate_operand" "%x,x,m")
+ 			      (match_operand:V8HI 2 "nonimmediate_operand" "x,m,x"))
+ 		   (match_operand:V8HI 3 "nonimmediate_operand" "0,0,0")))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, false)"
+   "@
+    pmacsww\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsww\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsww\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")
+    (set_attr "memory" "none,load,load")])
+ 
+ (define_insn "*sse5_pmacsww_vector_b"
+   [(set (match_operand:V8HI 0 "register_operand" "=x,x,x")
+         (plus:V8HI (match_operand:V8HI 3 "nonimmediate_operand" "0,0,0")
+ 		   (mult:V8HI (match_operand:V8HI 1 "nonimmediate_operand" "%x,x,m")
+ 			      (match_operand:V8HI 2 "nonimmediate_operand" "x,m,x"))))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, false)"
+   "@
+    pmacsww\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsww\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsww\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")
+    (set_attr "memory" "none,load,load")])
+ 
+ (define_insn "*sse5_pmacssww_vector"
+   [(set (match_operand:V8HI 0 "register_operand" "=x,x,x")
+         (ss_plus:V8HI (mult:V8HI (match_operand:V8HI 1 "nonimmediate_operand" "%x,x,m")
+ 				 (match_operand:V8HI 2 "nonimmediate_operand" "x,m,x"))
+ 		      (match_operand:V8HI 3 "nonimmediate_operand" "0,0,0")))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, false)"
+   "@
+    pmacssww\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacssww\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacssww\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")
+    (set_attr "memory" "none,load,load")])
+ 
+ (define_insn "*sse5_pmacssww_vector_b"
+   [(set (match_operand:V8HI 0 "register_operand" "=x,x,x")
+         (ss_plus:V8HI (match_operand:V8HI 3 "nonimmediate_operand" "0,0,0")
+ 		      (mult:V8HI (match_operand:V8HI 1 "nonimmediate_operand" "%x,x,m")
+ 				 (match_operand:V8HI 2 "nonimmediate_operand" "x,m,x"))))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, false)"
+   "@
+    pmacssww\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacssww\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacssww\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")
+    (set_attr "memory" "none,load,load")])
+ 
+ ;; Note the instruction does not allow the value being added to be a memory
+ ;; operation.  However by pretending via the nonimmediate_operand predicate
+ ;; that it does and splitting it later allows the following to be recognized:
+ ;;	a[i] = b[i] * c[i] + d[i];
+ (define_insn "*sse5_pmacsdd_vector"
+   [(set (match_operand:V4SI 0 "register_operand" "=x,x,x")
+         (plus:V4SI (mult:V4SI (match_operand:V4SI 1 "nonimmediate_operand" "%x,x,m")
+ 			      (match_operand:V4SI 2 "nonimmediate_operand" "x,m,x"))
+ 		   (match_operand:V4SI 3 "nonimmediate_operand" "0,0,0")))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, false)"
+   "@
+    pmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsdd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")
+    (set_attr "memory" "none,load,load")])
+ 
+ (define_insn "*sse5_pmacsdd_vector_b"
+   [(set (match_operand:V4SI 0 "register_operand" "=x,x,x")
+         (plus:V4SI (match_operand:V4SI 3 "nonimmediate_operand" "0,0,0")
+ 		   (mult:V4SI (match_operand:V4SI 1 "nonimmediate_operand" "%x,x,m")
+ 			      (match_operand:V4SI 2 "nonimmediate_operand" "x,m,x"))))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, false)"
+   "@
+    pmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsdd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")
+    (set_attr "memory" "none,load,load")])
+ 
+ (define_insn "*sse5_pmacssdd_vector"
+   [(set (match_operand:V4SI 0 "register_operand" "=x,x,x")
+         (ss_plus:V4SI (mult:V4SI (match_operand:V4SI 1 "nonimmediate_operand" "%x,x,m")
+ 				 (match_operand:V4SI 2 "nonimmediate_operand" "x,m,x"))
+ 		      (match_operand:V4SI 3 "nonimmediate_operand" "0,0,0")))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, false)"
+   "@
+    pmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsdd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")
+    (set_attr "memory" "none,load,load")])
+ 
+ (define_insn "*sse5_pmacsdd_vector_b"
+   [(set (match_operand:V4SI 0 "register_operand" "=x,x,x")
+         (ss_plus:V4SI (match_operand:V4SI 3 "nonimmediate_operand" "0,0,0")
+ 		      (mult:V4SI (match_operand:V4SI 1 "nonimmediate_operand" "%x,x,m")
+ 				 (match_operand:V4SI 2 "nonimmediate_operand" "x,m,x"))))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, false)"
+   "@
+    pmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsdd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")
+    (set_attr "memory" "none,load,load")])
+ 
+ ;; SSE5 parallel integer mutliply/add instructions for the intrinisics
+ (define_insn "sse5_pmacssww"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,x,m")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "x,m,x")
+ 		      (match_operand:V2DI 3 "register_operand" "0,0,0")] UNSPEC_PMACSSWW))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, false)"
+   "@
+    pmacssww\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacssww\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacssww\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")
+    (set_attr "memory" "none,load,load")])
+ 
+ (define_insn "sse5_pmacsww"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,x,m")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "x,m,x")
+ 		      (match_operand:V2DI 3 "register_operand" "0,0,0")] UNSPEC_PMACSWW))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, false)"
+   "@
+    pmacsww\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsww\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsww\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")
+    (set_attr "memory" "none,load,load")])
+ 
+ (define_insn "sse5_pmacsswd"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,x,m")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "x,m,x")
+ 		      (match_operand:V2DI 3 "register_operand" "0,0,0")] UNSPEC_PMACSSWD))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, false)"
+   "@
+    pmacsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsswd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")
+    (set_attr "memory" "none,load,load")])
+ 
+ (define_insn "sse5_pmacswd"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,x,m")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "x,m,x")
+ 		      (match_operand:V2DI 3 "register_operand" "0,0,0")] UNSPEC_PMACSWD))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, false)"
+   "@
+    pmacswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacswd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")
+    (set_attr "memory" "none,load,load")])
+ 
+ (define_insn "sse5_pmacssdd"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,x,m")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "x,m,x")
+ 		      (match_operand:V2DI 3 "register_operand" "0,0,0")] UNSPEC_PMACSSDD))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, false)"
+   "@
+    pmacssdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacssdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacssdd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")
+    (set_attr "memory" "none,load,load")])
+ 
+ (define_insn "sse5_pmacsdd"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,x,m")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "x,m,x")
+ 		      (match_operand:V2DI 3 "register_operand" "0,0,0")] UNSPEC_PMACSDD))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, false)"
+   "@
+    pmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsdd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")
+    (set_attr "memory" "none,load,load")])
+ 
+ (define_insn "sse5_pmacssdql"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,x,m")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "x,m,x")
+ 		      (match_operand:V2DI 3 "register_operand" "0,0,0")] UNSPEC_PMACSSDQL))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, false)"
+   "@
+    pmacssdql\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacssdql\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacssdql\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")
+    (set_attr "memory" "none,load,load")])
+ 
+ (define_insn "sse5_pmacssdqh"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,x,m")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "x,m,x")
+ 		      (match_operand:V2DI 3 "register_operand" "0,0,0")] UNSPEC_PMACSSDQH))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, false)"
+   "@
+    pmacssdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacssdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacssdqh\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")
+    (set_attr "memory" "none,load,load")])
+ 
+ (define_insn "sse5_pmacsdql"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,x,m")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "x,m,x")
+ 		      (match_operand:V2DI 3 "register_operand" "0,0,0")] UNSPEC_PMACSDQL))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, false)"
+   "@
+    pmacsdql\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsdql\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsdql\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")
+    (set_attr "memory" "none,load,load")])
+ 
+ (define_insn "sse5_pmacsdqh"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,x,m")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" ",x,x")
+ 		      (match_operand:V2DI 3 "register_operand" "0,0,0")] UNSPEC_PMACSDQH))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, false)"
+   "@
+    pmacsdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmacsdqh\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")
+    (set_attr "memory" "none,load,load")])
+ 
+ (define_insn "sse5_pmadcsswd"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,x,m")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "x,m,x")
+ 		      (match_operand:V2DI 3 "register_operand" "0,0,0")] UNSPEC_PMADCSSWD))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, false)"
+   "@
+    pmadcsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmadcsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmadcsswd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")
+    (set_attr "memory" "none,load,load")])
+ 
+ (define_insn "sse5_pmadcswd"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,x,m")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "x,m,x")
+ 		      (match_operand:V2DI 3 "register_operand" "0,0,0")] UNSPEC_PMADCSWD))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, false)"
+   "@
+    pmadcswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmadcswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+    pmadcswd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "TI")
+    (set_attr "memory" "none,load,load")])
+ 
  ;; SSE5 parallel XMM conditional moves
  (define_insn "sse5_pcmov_<mode>"
    [(set (match_operand:SSEMODE 0 "register_operand" "=x,x,x,x,x,x")
***************
*** 7058,7060 ****
--- 7445,8026 ----
     andnps\t{%1, %0|%0, %1}"
    [(set_attr "type" "sse4arg")])
  
+ ;; SSE5 horizontal add/subtract instructions
+ (define_insn "sse5_phaddbw"
+   [(set (match_operand:V2DI 0 "register_operand" "=x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")] UNSPEC_PHADDBW))]
+   "TARGET_SSE5"
+   "phaddbw\t{%1, %0|%0, %1}"
+   [(set_attr "type" "sseiadd1")])
+ 
+ (define_insn "sse5_phaddbd"
+   [(set (match_operand:V2DI 0 "register_operand" "=x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")] UNSPEC_PHADDBD))]
+   "TARGET_SSE5"
+   "phaddbd\t{%1, %0|%0, %1}"
+   [(set_attr "type" "sseiadd1")])
+ 
+ (define_insn "sse5_phaddbq"
+   [(set (match_operand:V2DI 0 "register_operand" "=x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")] UNSPEC_PHADDBQ))]
+   "TARGET_SSE5"
+   "phaddbq\t{%1, %0|%0, %1}"
+   [(set_attr "type" "sseiadd1")])
+ 
+ (define_insn "sse5_phaddwd"
+   [(set (match_operand:V2DI 0 "register_operand" "=x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")] UNSPEC_PHADDWD))]
+   "TARGET_SSE5"
+   "phaddwd\t{%1, %0|%0, %1}"
+   [(set_attr "type" "sseiadd1")])
+ 
+ (define_insn "sse5_phaddwq"
+   [(set (match_operand:V2DI 0 "register_operand" "=x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")] UNSPEC_PHADDWQ))]
+   "TARGET_SSE5"
+   "phaddwq\t{%1, %0|%0, %1}"
+   [(set_attr "type" "sseiadd1")])
+ 
+ (define_insn "sse5_phadddq"
+   [(set (match_operand:V2DI 0 "register_operand" "=x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")] UNSPEC_PHADDDQ))]
+   "TARGET_SSE5"
+   "phadddq\t{%1, %0|%0, %1}"
+   [(set_attr "type" "sseiadd1")])
+ 
+ (define_insn "sse5_phaddubw"
+   [(set (match_operand:V2DI 0 "register_operand" "=x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")] UNSPEC_PHADDUBW))]
+   "TARGET_SSE5"
+   "phaddubw\t{%1, %0|%0, %1}"
+   [(set_attr "type" "sseiadd1")])
+ 
+ (define_insn "sse5_phaddubd"
+   [(set (match_operand:V2DI 0 "register_operand" "=x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")] UNSPEC_PHADDUBD))]
+   "TARGET_SSE5"
+   "phaddubd\t{%1, %0|%0, %1}"
+   [(set_attr "type" "sseiadd1")])
+ 
+ (define_insn "sse5_phaddubq"
+   [(set (match_operand:V2DI 0 "register_operand" "=x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")] UNSPEC_PHADDUBQ))]
+   "TARGET_SSE5"
+   "phaddubq\t{%1, %0|%0, %1}"
+   [(set_attr "type" "sseiadd1")])
+ 
+ (define_insn "sse5_phadduwd"
+   [(set (match_operand:V2DI 0 "register_operand" "=x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")] UNSPEC_PHADDUWD))]
+   "TARGET_SSE5"
+   "phadduwd\t{%1, %0|%0, %1}"
+   [(set_attr "type" "sseiadd1")])
+ 
+ (define_insn "sse5_phadduwq"
+   [(set (match_operand:V2DI 0 "register_operand" "=x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")] UNSPEC_PHADDUWQ))]
+   "TARGET_SSE5"
+   "phadduwq\t{%1, %0|%0, %1}"
+   [(set_attr "type" "sseiadd1")])
+ 
+ (define_insn "sse5_phaddudq"
+   [(set (match_operand:V2DI 0 "register_operand" "=x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")] UNSPEC_PHADDUDQ))]
+   "TARGET_SSE5"
+   "phaddudq\t{%1, %0|%0, %1}"
+   [(set_attr "type" "sseiadd1")])
+ 
+ (define_insn "sse5_phsubbw"
+   [(set (match_operand:V2DI 0 "register_operand" "=x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")] UNSPEC_PHSUBBW))]
+   "TARGET_SSE5"
+   "phsubbw\t{%1, %0|%0, %1}"
+   [(set_attr "type" "sseiadd1")])
+ 
+ (define_insn "sse5_phsubwd"
+   [(set (match_operand:V2DI 0 "register_operand" "=x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")] UNSPEC_PHSUBWD))]
+   "TARGET_SSE5"
+   "phsubwd\t{%1, %0|%0, %1}"
+   [(set_attr "type" "sseiadd1")])
+ 
+ (define_insn "sse5_phsubdq"
+   [(set (match_operand:V2DI 0 "register_operand" "=x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")] UNSPEC_PHSUBDQ))]
+   "TARGET_SSE5"
+   "phsubdq\t{%1, %0|%0, %1}"
+   [(set_attr "type" "sseiadd1")])
+ 
+ ;; SSE5 permute instructions
+ (define_insn "sse5_pperm"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,0,xm,xm")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "x,xm,0,x")
+ 		      (match_operand:V2DI 3 "nonimmediate_operand" "xm,x,x,0")] UNSPEC_PPERM))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, true)"
+   "pperm\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+   [(set_attr "type" "sse4arg")
+    (set_attr "mode" "TI")])
+ 
+ ;; This is for unpack which doesn't need the first source operand, so we can
+ ;; just use the output operand
+ (define_insn "sse5_pperm_unpack"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm,x")
+ 		      (match_operand:V16QI 2 "nonimmediate_operand" "x,xm")] UNSPEC_PPERM))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, true)"
+   "pperm\t{%2, %1, %0, %0|%0, %0, %1, %2}"
+   [(set_attr "type" "sse4arg")
+    (set_attr "mode" "TI")])
+ 
+ (define_insn "sse5_permps"
+   [(set (match_operand:V4SF 0 "register_operand" "=x,x,x,x")
+ 	(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "0,0,xm,xm")
+ 		      (match_operand:V4SF 2 "nonimmediate_operand" "x,xm,0,x")
+ 		      (match_operand:V2DI 3 "nonimmediate_operand" "xm,x,x,0")] UNSPEC_PERMPS))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, true)"
+   "permps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+   [(set_attr "type" "sse4arg")
+    (set_attr "mode" "V4SF")])
+ 
+ (define_insn "sse5_permpd"
+   [(set (match_operand:V2DF 0 "register_operand" "=x,x,x,x")
+ 	(unspec:V2DF [(match_operand:V2DF 1 "register_operand" "0,0,xm,xm")
+ 		      (match_operand:V2DF 2 "nonimmediate_operand" "x,xm,0,x")
+ 		      (match_operand:V2DI 3 "nonimmediate_operand" "xm,x,x,0")] UNSPEC_PERMPD))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, true)"
+   "permpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+   [(set_attr "type" "sse4arg")
+    (set_attr "mode" "V2DF")])
+ 
+ ;; SSE5 packed rotate instructions
+ (define_insn "rotl<mode>3"
+   [(set (match_operand:SSEMODE1248 0 "register_operand" "=x")
+ 	(rotate:SSEMODE1248 (match_operand:SSEMODE1248 1 "nonimmediate_operand" "xm")
+ 			    (match_operand:SI 2 "const_0_to_63_operand" "n")))]
+   "TARGET_SSE5"
+   "prot<ssevecsize>\t{%2, %1, %0|%0, %1, %2}"
+   [(set_attr "type" "sseishft")
+    (set_attr "mode" "TI")])
+ 
+ (define_expand "sse5_protb_imm"
+   [(set (match_operand:V2DI 0 "register_operand" "")
+ 	(rotate:V2DI (match_operand:V2DI 1 "nonimmediate_operand" "")
+ 		     (match_operand:SI 2 "const_0_to_7_operand" "n")))]
+   "TARGET_SSE5"
+ {
+   rtx op0 = gen_rtx_SUBREG (V16QImode, operands[0], 0);
+   rtx op1 = gen_rtx_SUBREG (V16QImode, operands[1], 0);
+ 
+   emit_insn (gen_rotlv16qi3 (op0, op1, operands[2]));
+   DONE;
+ })
+ 
+ (define_expand "sse5_protw_imm"
+   [(set (match_operand:V2DI 0 "register_operand" "")
+ 	(rotate:V2DI (match_operand:V2DI 1 "nonimmediate_operand" "")
+ 		     (match_operand:SI 2 "const_0_to_15_operand" "n")))]
+   "TARGET_SSE5"
+ {
+   rtx op0 = gen_rtx_SUBREG (V8HImode, operands[0], 0);
+   rtx op1 = gen_rtx_SUBREG (V8HImode, operands[1], 0);
+ 
+   emit_insn (gen_rotlv8hi3 (op0, op1, operands[2]));
+   DONE;
+ })
+ 
+ (define_expand "sse5_protd_imm"
+   [(set (match_operand:V2DI 0 "register_operand" "")
+ 	(rotate:V2DI (match_operand:V2DI 1 "nonimmediate_operand" "")
+ 		     (match_operand:SI 2 "const_0_to_31_operand" "n")))]
+   "TARGET_SSE5"
+ {
+   rtx op0 = gen_rtx_SUBREG (V4SImode, operands[0], 0);
+   rtx op1 = gen_rtx_SUBREG (V4SImode, operands[1], 0);
+ 
+   emit_insn (gen_rotlv4si3 (op0, op1, operands[2]));
+   DONE;
+ })
+ 
+ ;; XXX these should not use UNSPEC, but should use the appropriate rotate rtl
+ (define_insn "sse5_protb"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,xm")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "xm,x")] UNSPEC_PROTB))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 3, true)"
+   "protb\t{%2, %1, %0|%0, %1, %2}"
+   [(set_attr "type" "sseishft")
+    (set_attr "mode" "TI")])
+ 
+ (define_insn "sse5_protw"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,xm")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "xm,x")] UNSPEC_PROTW))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 3, true)"
+   "protw\t{%2, %1, %0|%0, %1, %2}"
+   [(set_attr "type" "sseishft")
+    (set_attr "mode" "TI")])
+ 
+ (define_insn "sse5_protd"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,xm")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "xm,x")] UNSPEC_PROTD))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 3, true)"
+   "protd\t{%2, %1, %0|%0, %1, %2}"
+   [(set_attr "type" "sseishft")
+    (set_attr "mode" "TI")])
+ 
+ (define_insn "sse5_protq"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,xm")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "xm,x")] UNSPEC_PROTQ))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 3, true)"
+   "protq\t{%2, %1, %0|%0, %1, %2}"
+   [(set_attr "type" "sseishft")
+    (set_attr "mode" "TI")])
+ 
+ ;; SSE5 packed shift logical instructions, XXX these should not use UNSPEC, but should use the appropriate shift rtl
+ (define_insn "sse5_pshlb"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,xm")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "xm,x")] UNSPEC_PSHLB))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 3, true)"
+   "pshlb\t{%2, %1, %0|%0, %1, %2}"
+   [(set_attr "type" "sseishft")])
+ 
+ (define_insn "sse5_pshlw"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,xm")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "xm,x")] UNSPEC_PSHLW))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 3, true)"
+   "pshlw\t{%2, %1, %0|%0, %1, %2}"
+   [(set_attr "type" "sseishft")])
+ 
+ (define_insn "sse5_pshld"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,xm")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "xm,x")] UNSPEC_PSHLD))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 3, true)"
+   "pshld\t{%2, %1, %0|%0, %1, %2}"
+   [(set_attr "type" "sseishft")])
+ 
+ (define_insn "sse5_pshlq"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,xm")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "xm,x")] UNSPEC_PSHLQ))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 3, true)"
+   "pshlq\t{%2, %1, %0|%0, %1, %2}"
+   [(set_attr "type" "sseishft")])
+ 
+ ;; SSE5 packed shift arithmetic instructions, XXX these should not use UNSPEC, but should use the appropriate shift rtl
+ (define_insn "sse5_pshab"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,xm")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "xm,x")] UNSPEC_PSHAB))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 3, true)"
+   "pshab\t{%2, %1, %0|%0, %1, %2}"
+   [(set_attr "type" "sseishft")])
+ 
+ (define_insn "sse5_pshaw"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,xm")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "xm,x")] UNSPEC_PSHAW))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 3, true)"
+   "pshaw\t{%2, %1, %0|%0, %1, %2}"
+   [(set_attr "type" "sseishft")])
+ 
+ (define_insn "sse5_pshad"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,xm")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "xm,x")] UNSPEC_PSHAD))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 3, true)"
+   "pshad\t{%2, %1, %0|%0, %1, %2}"
+   [(set_attr "type" "sseishft")])
+ 
+ (define_insn "sse5_pshaq"
+   [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "x,xm")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "xm,x")] UNSPEC_PSHAQ))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 3, true)"
+   "pshaq\t{%2, %1, %0|%0, %1, %2}"
+   [(set_attr "type" "sseishft")])
+ 
+ ;; SSE5 FRCZ support
+ (define_insn "sse5_frczpd"
+   [(set (match_operand:V2DF 0 "register_operand" "=x")
+ 	(unspec:V2DF [(match_operand:V2DF 1 "nonimmediate_operand" "xm")]
+ 		     UNSPEC_FRCZ))]
+   "TARGET_SSE5"
+   "frczpd\t{%1, %0|%0, %1}"
+   [(set_attr "type" "ssecvt1")
+    (set_attr "prefix_extra" "1")
+    (set_attr "mode" "V2DF")])
+ 
+ (define_insn "sse5_frczps"
+   [(set (match_operand:V4SF 0 "register_operand" "=x")
+ 	(unspec:V4SF [(match_operand:V4SF 1 "nonimmediate_operand" "xm")]
+ 		     UNSPEC_FRCZ))]
+   "TARGET_SSE5"
+   "frczps\t{%1, %0|%0, %1}"
+   [(set_attr "type" "ssecvt1")
+    (set_attr "prefix_extra" "1")
+    (set_attr "mode" "V4SF")])
+ 
+ (define_insn "sse5_frczsd"
+   [(set (match_operand:V2DF 0 "register_operand" "=x")
+ 	(vec_merge:V2DF
+ 	  (unspec:V2DF [(match_operand:V2DF 2 "register_operand" "x")]
+ 		       UNSPEC_FRCZ)
+ 	  (match_operand:V2DF 1 "register_operand" "0")
+ 	  (const_int 1)))]
+   "TARGET_SSE5"
+   "frczsd\t{%2, %0|%0, %2}"
+   [(set_attr "type" "ssecvt1")
+    (set_attr "prefix_extra" "1")
+    (set_attr "mode" "V2DF")])
+ 
+ (define_insn "sse5_frczss"
+   [(set (match_operand:V4SF 0 "register_operand" "=x")
+ 	(vec_merge:V4SF
+ 	  (unspec:V4SF [(match_operand:V4SF 2 "register_operand" "x")]
+ 		       UNSPEC_FRCZ)
+ 	  (match_operand:V4SF 1 "register_operand" "0")
+ 	  (const_int 1)))]
+   "TARGET_SSE5"
+   "frczss\t{%2, %0|%0, %2}"
+   [(set_attr "type" "ssecvt1")
+    (set_attr "mode" "V4SF")])
+ 
+ (define_insn "sse5_cvtph2ps"
+   [(set (match_operand:V4SF 0 "register_operand" "=x")
+ 	(unspec:V4SF [(match_operand:V4SF 1 "nonimmediate_operand" "xm")]
+ 		     UNSPEC_CVTPH2PS))]
+   "TARGET_SSE5"
+   "cvtph2ps\t{%1, %0|%0, %1}"
+   [(set_attr "type" "ssecvt")
+    (set_attr "mode" "V4SF")])
+ 
+ (define_insn "sse5_cvtps2ph"
+   [(set (match_operand:V4SF 0 "register_operand" "=x")
+ 	(unspec:V4SF [(match_operand:V4SF 1 "nonimmediate_operand" "xm")]
+ 		     UNSPEC_CVTPS2PH))]
+   "TARGET_SSE5"
+   "cvtps2ph\t{%1, %0|%0, %1}"
+   [(set_attr "type" "ssecvt")
+    (set_attr "mode" "V4SF")])
+ 
+ ;; Scalar versions of the com instructions that use vector types that are called
+ ;; from the intrinsics
+ (define_insn "sse5_maskcmp_s_<mode>"
+   [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x")
+ 	(vec_merge:SSEMODEF2P
+ 	 (match_operator:SSEMODEF2P 1 "sse5_comparison_float_operator"
+ 				    [(match_operand:SSEMODEF2P 2 "register_operand" "x")
+ 				     (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm")])
+ 	  (match_dup 2)
+ 	  (const_int 1)))]
+   "TARGET_SSE5"
+   "com%Y1<ssemodesuffixf2s>\t{%3, %2, %0|%0, %2, %3}"
+   [(set_attr "type" "sse4arg")
+    (set_attr "mode" "<ssescalarmode>")])
+ 
+ ;; We don't have a comparison operator that always returns true/false, so
+ ;; handle comfalse and comtrue specially.
+ (define_insn "sse5_com_tf<mode>3"
+   [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x")
+ 	(unspec:SSEMODEF2P [(match_operand:SSEMODEF2P 1 "register_operand" "x")
+ 			    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "xm")
+ 			    (match_operand:SI 3 "const_int_operand" "n")]
+ 			   UNSPEC_SSE5_TRUEFALSE))]
+   "TARGET_SSE5"
+ {
+   const char *ret = NULL;
+ 
+   switch (INTVAL (operands[3]))
+     {
+     case COM_FALSE_S:  ret = \"comfalses<ssemodesuffixf2c>\t{%2, %1, %0|%0, %1, %2}\";  break;
+     case COM_FALSE_P:  ret = \"comfalsep<ssemodesuffixf2c>\t{%2, %1, %0|%0, %1, %2}\";  break;
+     case COM_TRUE_S:   ret = \"comfalses<ssemodesuffixf2c>\t{%2, %1, %0|%0, %1, %2}\";  break;
+     case COM_TRUE_P:   ret = \"comfalsep<ssemodesuffixf2c>\t{%2, %1, %0|%0, %1, %2}\";  break;
+     default:
+       gcc_unreachable ();
+     }
+ 
+   return ret;
+ }
+   [(set_attr "type" "sse4arg")
+    (set_attr "mode" "<MODE>")])
+ 
+ (define_insn "sse5_maskcmp<mode>3"
+   [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x")
+ 	(match_operator:SSEMODEF2P 1 "sse5_comparison_float_operator"
+ 				   [(match_operand:SSEMODEF2P 2 "register_operand" "x")
+ 				    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm")]))]
+   "TARGET_SSE5"
+   "com%Y1<ssemodesuffixf4>\t{%3, %2, %0|%0, %2, %3}"
+   [(set_attr "type" "sse4arg")
+    (set_attr "mode" "<MODE>")])
+ 
+ (define_insn "sse5_maskcmp<mode>3"
+   [(set (match_operand:SSEMODE1248 0 "register_operand" "=x")
+ 	(match_operator:SSEMODE1248 1 "ix86_comparison_int_operator"
+ 				    [(match_operand:SSEMODE1248 2 "register_operand" "x")
+ 				     (match_operand:SSEMODE1248 3 "nonimmediate_operand" "xm")]))]
+   "TARGET_SSE5"
+   "pcom%Y1<ssevecsize>\t{%3, %2, %0|%0, %2, %3}"
+   [(set_attr "type" "sse4arg")
+    (set_attr "mode" "TI")])
+ 
+ (define_insn "sse5_maskcmp_uns<mode>"
+   [(set (match_operand:SSEMODE1248 0 "register_operand" "=x")
+ 	(match_operator:SSEMODE1248 1 "ix86_comparison_uns_operator"
+ 				    [(match_operand:SSEMODE1248 2 "register_operand" "x")
+ 				     (match_operand:SSEMODE1248 3 "nonimmediate_operand" "xm")]))]
+   "TARGET_SSE5"
+   "pcom%Y1u<ssevecsize>\t{%3, %2, %0|%0, %2, %3}"
+   [(set_attr "type" "sse4arg")
+    (set_attr "mode" "TI")])
+ 
+ ;; Version of pcom*u* that is called from the intrinsics that allows pcomequ*
+ ;; and pcomneu* not to be converted to the signed ones in case somebody needs
+ ;; the exact instruction generated for the intrinsic.
+ (define_insn "sse5_maskcmp_uns2<mode>3"
+   [(set (match_operand:SSEMODE1248 0 "register_operand" "=x")
+ 	(unspec:SSEMODE1248 [(match_operator:SSEMODE1248 1 "ix86_comparison_uns_operator"
+ 							 [(match_operand:SSEMODE1248 2 "register_operand" "x")
+ 							  (match_operand:SSEMODE1248 3 "nonimmediate_operand" "xm")])]
+ 			    UNSPEC_SSE5_INTRINSIC_UNS))]
+   "TARGET_SSE5"
+   "pcom%Y1u<ssevecsize>\t{%3, %2, %0|%0, %2, %3}"
+   [(set_attr "type" "sse4arg")
+    (set_attr "mode" "TI")])
+ 
+ ;; Mapper functions to map the intrinsics V2DI type into the real vector type.
+ (define_expand "sse5_setccv2di_8"
+   [(set (match_operand:V2DI 0 "register_operand" "")
+ 	(match_operator:V2DI 1 "ix86_comparison_int_operator"
+ 					   [(match_operand:V2DI 2 "register_operand" "")
+ 					    (match_operand:V2DI 3 "nonimmediate_operand" "")]))]
+   "TARGET_SSE5"
+ {
+   rtx op0 = gen_rtx_SUBREG (V16QImode, operands[0], 0);
+   rtx op2 = gen_rtx_SUBREG (V16QImode, operands[2], 0);
+   rtx op3 = gen_rtx_SUBREG (V16QImode, operands[3], 0);
+ 
+   emit_insn (gen_sse5_maskcmpv16qi3 (op0, operands[1], op2, op3));
+   DONE;
+ })
+ 
+ (define_expand "sse5_setccv2di_16"
+   [(set (match_operand:V2DI 0 "register_operand" "")
+ 	(match_operator:V2DI 1 "ix86_comparison_int_operator"
+ 					   [(match_operand:V2DI 2 "register_operand" "")
+ 					    (match_operand:V2DI 3 "nonimmediate_operand" "")]))]
+   "TARGET_SSE5"
+ {
+   rtx op0 = gen_rtx_SUBREG (V8HImode, operands[0], 0);
+   rtx op2 = gen_rtx_SUBREG (V8HImode, operands[2], 0);
+   rtx op3 = gen_rtx_SUBREG (V8HImode, operands[3], 0);
+ 
+   emit_insn (gen_sse5_maskcmpv8hi3 (op0, operands[1], op2, op3));
+   DONE;
+ })
+ 
+ (define_expand "sse5_setccv2di_32"
+   [(set (match_operand:V2DI 0 "register_operand" "")
+ 	(match_operator:V2DI 1 "ix86_comparison_int_operator"
+ 					   [(match_operand:V2DI 2 "register_operand" "")
+ 					    (match_operand:V2DI 3 "nonimmediate_operand" "")]))]
+   "TARGET_SSE5"
+ {
+   rtx op0 = gen_rtx_SUBREG (V4SImode, operands[0], 0);
+   rtx op2 = gen_rtx_SUBREG (V4SImode, operands[2], 0);
+   rtx op3 = gen_rtx_SUBREG (V4SImode, operands[3], 0);
+ 
+   emit_insn (gen_sse5_maskcmpv4si3 (op0, operands[1], op2, op3));
+   DONE;
+ })
+ 
+ (define_expand "sse5_setccv2di_u8"
+   [(set (match_operand:V2DI 0 "register_operand" "")
+ 	(match_operator:V2DI 1 "ix86_comparison_int_operator"
+ 					   [(match_operand:V2DI 2 "register_operand" "")
+ 					    (match_operand:V2DI 3 "nonimmediate_operand" "")]))]
+   "TARGET_SSE5"
+ {
+   rtx op0 = gen_rtx_SUBREG (V16QImode, operands[0], 0);
+   rtx op2 = gen_rtx_SUBREG (V16QImode, operands[2], 0);
+   rtx op3 = gen_rtx_SUBREG (V16QImode, operands[3], 0);
+ 
+   emit_insn (gen_sse5_maskcmp_uns2v16qi3 (op0, operands[1], op2, op3));
+   DONE;
+ })
+ 
+ (define_expand "sse5_setccv2di_u16"
+   [(set (match_operand:V2DI 0 "register_operand" "")
+ 	(match_operator:V2DI 1 "ix86_comparison_int_operator"
+ 					   [(match_operand:V2DI 2 "register_operand" "")
+ 					    (match_operand:V2DI 3 "nonimmediate_operand" "")]))]
+   "TARGET_SSE5"
+ {
+   rtx op0 = gen_rtx_SUBREG (V8HImode, operands[0], 0);
+   rtx op2 = gen_rtx_SUBREG (V8HImode, operands[2], 0);
+   rtx op3 = gen_rtx_SUBREG (V8HImode, operands[3], 0);
+ 
+   emit_insn (gen_sse5_maskcmp_uns2v8hi3 (op0, operands[1], op2, op3));
+   DONE;
+ })
+ 
+ (define_expand "sse5_setccv2di_u32"
+   [(set (match_operand:V2DI 0 "register_operand" "")
+ 	(match_operator:V2DI 1 "ix86_comparison_int_operator"
+ 					   [(match_operand:V2DI 2 "register_operand" "")
+ 					    (match_operand:V2DI 3 "nonimmediate_operand" "")]))]
+   "TARGET_SSE5"
+ {
+   rtx op0 = gen_rtx_SUBREG (V4SImode, operands[0], 0);
+   rtx op2 = gen_rtx_SUBREG (V4SImode, operands[2], 0);
+   rtx op3 = gen_rtx_SUBREG (V4SImode, operands[3], 0);
+ 
+   emit_insn (gen_sse5_maskcmp_uns2v4si3 (op0, operands[1], op2, op3));
+   DONE;
+ })
+ 
+ ;; Pcomtrue and pcomfalse support.  These are useless instructions, but are
+ ;; being added here to be complete.
+ (define_insn "sse5_pcom_tfdi3"
+   [(set (match_operand:V2DI 0 "register_operand" "=x")
+ 	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "x")
+ 		      (match_operand:V2DI 2 "nonimmediate_operand" "xm")
+ 		      (match_operand:SI 3 "const_int_operand" "n")]
+ 		     UNSPEC_SSE5_TRUEFALSE))]
+   "TARGET_SSE5"
+ {
+   const char *ret = NULL;
+ 
+   switch (INTVAL (operands[3]))
+     {
+     case PCOM_FALSE_B:  ret = \"pcomfalseb\t{%2, %1, %0|%0, %1, %2}\";  break;
+     case PCOM_FALSE_W:  ret = \"pcomfalsew\t{%2, %1, %0|%0, %1, %2}\";  break;
+     case PCOM_FALSE_D:  ret = \"pcomfalsed\t{%2, %1, %0|%0, %1, %2}\";  break;
+     case PCOM_FALSE_Q:  ret = \"pcomfalseq\t{%2, %1, %0|%0, %1, %2}\";  break;
+     case PCOM_FALSE_UB: ret = \"pcomfalseub\t{%2, %1, %0|%0, %1, %2}\"; break;
+     case PCOM_FALSE_UW: ret = \"pcomfalseuw\t{%2, %1, %0|%0, %1, %2}\"; break;
+     case PCOM_FALSE_UD: ret = \"pcomfalseud\t{%2, %1, %0|%0, %1, %2}\"; break;
+     case PCOM_FALSE_UQ: ret = \"pcomfalseuq\t{%2, %1, %0|%0, %1, %2}\"; break;
+     case PCOM_TRUE_B:   ret = \"pcomtrueb\t{%2, %1, %0|%0, %1, %2}\";   break;
+     case PCOM_TRUE_W:   ret = \"pcomtruew\t{%2, %1, %0|%0, %1, %2}\";   break;
+     case PCOM_TRUE_D:   ret = \"pcomtrued\t{%2, %1, %0|%0, %1, %2}\";   break;
+     case PCOM_TRUE_Q:   ret = \"pcomtrueq\t{%2, %1, %0|%0, %1, %2}\";   break;
+     case PCOM_TRUE_UB:  ret = \"pcomtrueub\t{%2, %1, %0|%0, %1, %2}\";  break;
+     case PCOM_TRUE_UW:  ret = \"pcomtrueuw\t{%2, %1, %0|%0, %1, %2}\";  break;
+     case PCOM_TRUE_UD:  ret = \"pcomtrueud\t{%2, %1, %0|%0, %1, %2}\";  break;
+     case PCOM_TRUE_UQ:  ret = \"pcomtrueuq\t{%2, %1, %0|%0, %1, %2}\";  break;
+     default:
+       gcc_unreachable ();
+     }
+ 
+   return ret;
+ }
+   [(set_attr "type" "sse4arg")
+    (set_attr "mode" "TI")])
*** gcc/config/i386/i386-protos.h.~1~	2007-09-06 13:52:32.803731000 -0400
--- gcc/config/i386/i386-protos.h	2007-09-05 18:05:48.622877000 -0400
*************** extern bool ix86_expand_fp_vcond (rtx[])
*** 112,117 ****
--- 112,119 ----
  extern bool ix86_expand_int_vcond (rtx[]);
  extern void ix86_expand_sse_unpack (rtx[], bool, bool);
  extern void ix86_expand_sse4_unpack (rtx[], bool, bool);
+ extern void ix86_expand_sse5_unpack (rtx[], bool, bool);
+ extern void ix86_expand_sse5_pack (rtx[]);
  extern int ix86_expand_int_addcc (rtx[]);
  extern void ix86_expand_call (rtx, rtx, rtx, rtx, rtx, int);
  extern void x86_initialize_trampoline (rtx, rtx, rtx);
*** gcc/config/i386/i386.c.~1~	2007-09-06 13:52:32.891642000 -0400
--- gcc/config/i386/i386.c	2007-09-06 13:07:10.023366000 -0400
*************** ix86_expand_sse_movcc (rtx dest, rtx cmp
*** 12907,12913 ****
    enum machine_mode mode = GET_MODE (dest);
    rtx t2, t3, x;
  
!   if (op_false == CONST0_RTX (mode))
      {
        op_true = force_reg (mode, op_true);
        x = gen_rtx_AND (mode, cmp, op_true);
--- 12907,12921 ----
    enum machine_mode mode = GET_MODE (dest);
    rtx t2, t3, x;
  
!   if (TARGET_SSE5)
!     {
!       rtx pcmov = gen_rtx_SET (mode, dest,
! 			       gen_rtx_IF_THEN_ELSE (mode, cmp,
! 						     op_true,
! 						     op_false));
!       emit_insn (pcmov);
!     }
!   else if (op_false == CONST0_RTX (mode))
      {
        op_true = force_reg (mode, op_true);
        x = gen_rtx_AND (mode, cmp, op_true);
*************** ix86_expand_sse4_unpack (rtx operands[2]
*** 13279,13284 ****
--- 13287,13460 ----
    emit_insn (unpack (dest, src));
  }
  
+ /* This function performs the same task as ix86_expand_sse_unpack,
+    but with amdfam15 instructions.  */
+ 
+ #define PPERM_SRC	0x00		/* copy source */
+ #define PPERM_INVERT	0x20		/* invert source */
+ #define PPERM_REVERSE	0x40		/* bit reverse source */
+ #define PPERM_REV_INV	0x60		/* bit reverse & invert src */
+ #define PPERM_ZERO	0x80		/* all 0's */
+ #define PPERM_ONES	0xa0		/* all 1's */
+ #define PPERM_SIGN	0xc0		/* propigate sign bit */
+ #define PPERM_INV_SIGN	0xe0		/* invert & propigate sign */
+ 
+ #define PPERM_SRC1	0x00		/* use first source byte */
+ #define PPERM_SRC2	0x10		/* use second source byte */
+ 
+ void
+ ix86_expand_sse5_unpack (rtx operands[2], bool unsigned_p, bool high_p)
+ {
+   enum machine_mode imode = GET_MODE (operands[1]);
+   int pperm_bytes[16];
+   int i;
+   int h = (high_p) ? 8 : 0;
+   int sign_extend;
+   rtvec v = rtvec_alloc (16);
+   rtx x;
+   rtx op0, op1;
+ 
+   switch (imode)
+     {
+     case V16QImode:
+       for (i = 0; i < 8; i++)
+ 	{
+ 	  pperm_bytes[2*i+0] = PPERM_SRC | PPERM_SRC2 | i | h;
+ 	  pperm_bytes[2*i+1] = ((unsigned_p)
+ 				? PPERM_ZERO
+ 				: PPERM_SIGN | PPERM_SRC2 | i | h);
+ 	}
+       break;
+ 
+     case V8HImode:
+       for (i = 0; i < 4; i++)
+ 	{
+ 	  sign_extend = ((unsigned_p)
+ 			 ? PPERM_ZERO
+ 			 : PPERM_SIGN | PPERM_SRC2 | ((2*i) + 1 + h));
+ 	  pperm_bytes[4*i+0] = PPERM_SRC | PPERM_SRC2 | ((2*i) + 0 + h);
+ 	  pperm_bytes[4*i+1] = PPERM_SRC | PPERM_SRC2 | ((2*i) + 1 + h);
+ 	  pperm_bytes[4*i+2] = sign_extend;
+ 	  pperm_bytes[4*i+3] = sign_extend;
+ 	}
+       break;
+ 
+     case V4SImode:
+       for (i = 0; i < 2; i++)
+ 	{
+ 	  sign_extend = ((unsigned_p)
+ 			 ? PPERM_ZERO
+ 			 : PPERM_SIGN | PPERM_SRC2 | ((4*i) + 3 + h));
+ 	  pperm_bytes[8*i+0] = PPERM_SRC | PPERM_SRC2 | ((4*i) + 0 + h);
+ 	  pperm_bytes[8*i+1] = PPERM_SRC | PPERM_SRC2 | ((4*i) + 1 + h);
+ 	  pperm_bytes[8*i+2] = PPERM_SRC | PPERM_SRC2 | ((4*i) + 2 + h);
+ 	  pperm_bytes[8*i+3] = PPERM_SRC | PPERM_SRC2 | ((4*i) + 3 + h);
+ 	  pperm_bytes[8*i+4] = sign_extend;
+ 	  pperm_bytes[8*i+5] = sign_extend;
+ 	  pperm_bytes[8*i+6] = sign_extend;
+ 	  pperm_bytes[8*i+7] = sign_extend;
+ 	}
+       break;
+ 
+     default:
+       gcc_unreachable ();
+     }
+ 
+   for (i = 0; i < 16; i++)
+     {
+       RTVEC_ELT (v, i) = GEN_INT (pperm_bytes[i]);
+     }
+ 
+   x = force_reg (V16QImode, gen_rtx_CONST_VECTOR (V16QImode, v));
+ 
+   /* Use paradoxical subregs to change type */
+   op0 = ((GET_MODE (operands[0]) == V2DImode)
+ 	 ? operands[0]
+ 	 : gen_lowpart (V2DImode, operands[0]));
+ 
+   op1 = ((GET_MODE (operands[1]) == V2DImode)
+ 	 ? operands[1]
+ 	 : gen_lowpart (V2DImode, operands[1]));
+ 
+   emit_insn (gen_sse5_pperm_unpack (op0, op1, x));
+   return;
+ }
+ 
+ /* Pack the high bits from OPERANDS[1] and low bits from OPERANDS[2] into the
+    next narrower integer vector type */
+ void
+ ix86_expand_sse5_pack (rtx operands[3])
+ {
+   enum machine_mode imode = GET_MODE (operands[0]);
+   int pperm_bytes[16];
+   int i;
+   rtvec v = rtvec_alloc (16);
+   rtx x;
+   rtx op0, op1, op2;
+ 
+   switch (imode)
+     {
+     case V16QImode:
+       for (i = 0; i < 8; i++)
+ 	{
+ 	  pperm_bytes[i+0] = PPERM_SRC | PPERM_SRC1 | (i*2);
+ 	  pperm_bytes[i+8] = PPERM_SRC | PPERM_SRC2 | (i*2);
+ 	}
+       break;
+ 
+     case V8HImode:
+       for (i = 0; i < 4; i++)
+ 	{
+ 	  pperm_bytes[(2*i)+0] = PPERM_SRC | PPERM_SRC1 | ((i*4) + 0);
+ 	  pperm_bytes[(2*i)+1] = PPERM_SRC | PPERM_SRC1 | ((i*4) + 1);
+ 	  pperm_bytes[(2*i)+8] = PPERM_SRC | PPERM_SRC2 | ((i*4) + 0);
+ 	  pperm_bytes[(2*i)+9] = PPERM_SRC | PPERM_SRC2 | ((i*4) + 1);
+ 	}
+       break;
+ 
+     case V4SImode:
+       for (i = 0; i < 2; i++)
+ 	{
+ 	  pperm_bytes[(4*i)+0]  = PPERM_SRC | PPERM_SRC1 | ((i*8) + 0);
+ 	  pperm_bytes[(4*i)+1]  = PPERM_SRC | PPERM_SRC1 | ((i*8) + 1);
+ 	  pperm_bytes[(4*i)+2]  = PPERM_SRC | PPERM_SRC1 | ((i*8) + 2);
+ 	  pperm_bytes[(4*i)+3]  = PPERM_SRC | PPERM_SRC1 | ((i*8) + 3);
+ 	  pperm_bytes[(4*i)+8]  = PPERM_SRC | PPERM_SRC2 | ((i*8) + 0);
+ 	  pperm_bytes[(4*i)+9]  = PPERM_SRC | PPERM_SRC2 | ((i*8) + 1);
+ 	  pperm_bytes[(4*i)+10] = PPERM_SRC | PPERM_SRC2 | ((i*8) + 2);
+ 	  pperm_bytes[(4*i)+11] = PPERM_SRC | PPERM_SRC2 | ((i*8) + 3);
+ 	}
+       break;
+ 
+     default:
+       gcc_unreachable ();
+     }
+ 
+   for (i = 0; i < 16; i++)
+     {
+       RTVEC_ELT (v, i) = GEN_INT (pperm_bytes[i]);
+     }
+ 
+   x = gen_lowpart (V2DImode,
+ 		   force_reg (V16QImode, gen_rtx_CONST_VECTOR (V16QImode, v)));
+ 
+   /* Use paradoxical subregs to change type */
+   op0 = ((GET_MODE (operands[0]) == V2DImode)
+ 	 ? operands[0]
+ 	 : gen_lowpart (V2DImode, operands[0]));
+ 
+   op1 = ((GET_MODE (operands[1]) == V2DImode)
+ 	 ? operands[1]
+ 	 : gen_lowpart (V2DImode, operands[1]));
+ 
+   op2 = ((GET_MODE (operands[2]) == V2DImode)
+ 	 ? operands[2]
+ 	 : gen_lowpart (V2DImode, operands[2]));
+ 
+   emit_insn (gen_sse5_pperm (op0, op1, op2, x));
+   return;
+ }
+ 
  /* Expand conditional increment or decrement using adb/sbb instructions.
     The default case using setcc followed by the conditional move can be
     done by generic code.  */
*************** enum ix86_builtins
*** 17012,17017 ****
--- 17188,17400 ----
    IX86_BUILTIN_FABSQ,
    IX86_BUILTIN_COPYSIGNQ,
  
+   /* SSE5 instructions */
+   IX86_BUILTIN_FMADDSS,
+   IX86_BUILTIN_FMADDSD,
+   IX86_BUILTIN_FMADDPS,
+   IX86_BUILTIN_FMADDPD,
+   IX86_BUILTIN_FMSUBSS,
+   IX86_BUILTIN_FMSUBSD,
+   IX86_BUILTIN_FMSUBPS,
+   IX86_BUILTIN_FMSUBPD,
+   IX86_BUILTIN_FNMADDSS,
+   IX86_BUILTIN_FNMADDSD,
+   IX86_BUILTIN_FNMADDPS,
+   IX86_BUILTIN_FNMADDPD,
+   IX86_BUILTIN_FNMSUBSS,
+   IX86_BUILTIN_FNMSUBSD,
+   IX86_BUILTIN_FNMSUBPS,
+   IX86_BUILTIN_FNMSUBPD,
+   IX86_BUILTIN_PCMOV,
+   IX86_BUILTIN_PCMOV_V4SF,
+   IX86_BUILTIN_PCMOV_V2DF,
+   IX86_BUILTIN_PPERM,
+   IX86_BUILTIN_PERMPS,
+   IX86_BUILTIN_PERMPD,
+   IX86_BUILTIN_PMACSSWW,
+   IX86_BUILTIN_PMACSWW,
+   IX86_BUILTIN_PMACSSWD,
+   IX86_BUILTIN_PMACSWD,
+   IX86_BUILTIN_PMACSSDD,
+   IX86_BUILTIN_PMACSDD,
+   IX86_BUILTIN_PMACSSDQL,
+   IX86_BUILTIN_PMACSSDQH,
+   IX86_BUILTIN_PMACSDQL,
+   IX86_BUILTIN_PMACSDQH,
+   IX86_BUILTIN_PMADCSSWD,
+   IX86_BUILTIN_PMADCSWD,
+   IX86_BUILTIN_PHADDBW,
+   IX86_BUILTIN_PHADDBD,
+   IX86_BUILTIN_PHADDBQ,
+   IX86_BUILTIN_PHADDWD,
+   IX86_BUILTIN_PHADDWQ,
+   IX86_BUILTIN_PHADDDQ,
+   IX86_BUILTIN_PHADDUBW,
+   IX86_BUILTIN_PHADDUBD,
+   IX86_BUILTIN_PHADDUBQ,
+   IX86_BUILTIN_PHADDUWD,
+   IX86_BUILTIN_PHADDUWQ,
+   IX86_BUILTIN_PHADDUDQ,
+   IX86_BUILTIN_PHSUBBW,
+   IX86_BUILTIN_PHSUBWD,
+   IX86_BUILTIN_PHSUBDQ,
+   IX86_BUILTIN_PROTB,
+   IX86_BUILTIN_PROTW,
+   IX86_BUILTIN_PROTD,
+   IX86_BUILTIN_PROTQ,
+   IX86_BUILTIN_PROTB_IMM,
+   IX86_BUILTIN_PROTW_IMM,
+   IX86_BUILTIN_PROTD_IMM,
+   IX86_BUILTIN_PROTQ_IMM,
+   IX86_BUILTIN_PSHLB,
+   IX86_BUILTIN_PSHLW,
+   IX86_BUILTIN_PSHLD,
+   IX86_BUILTIN_PSHLQ,
+   IX86_BUILTIN_PSHAB,
+   IX86_BUILTIN_PSHAW,
+   IX86_BUILTIN_PSHAD,
+   IX86_BUILTIN_PSHAQ,
+   IX86_BUILTIN_FRCZSS,
+   IX86_BUILTIN_FRCZSD,
+   IX86_BUILTIN_FRCZPS,
+   IX86_BUILTIN_FRCZPD,
+   IX86_BUILTIN_CVTPH2PS,
+   IX86_BUILTIN_CVTPS2PH,
+ 
+   IX86_BUILTIN_COMEQSS,
+   IX86_BUILTIN_COMNESS,
+   IX86_BUILTIN_COMLTSS,
+   IX86_BUILTIN_COMLESS,
+   IX86_BUILTIN_COMGTSS,
+   IX86_BUILTIN_COMGESS,
+   IX86_BUILTIN_COMUEQSS,
+   IX86_BUILTIN_COMUNESS,
+   IX86_BUILTIN_COMULTSS,
+   IX86_BUILTIN_COMULESS,
+   IX86_BUILTIN_COMUGTSS,
+   IX86_BUILTIN_COMUGESS,
+   IX86_BUILTIN_COMORDSS,
+   IX86_BUILTIN_COMUNORDSS,
+   IX86_BUILTIN_COMFALSESS,
+   IX86_BUILTIN_COMTRUESS,
+ 
+   IX86_BUILTIN_COMEQSD,
+   IX86_BUILTIN_COMNESD,
+   IX86_BUILTIN_COMLTSD,
+   IX86_BUILTIN_COMLESD,
+   IX86_BUILTIN_COMGTSD,
+   IX86_BUILTIN_COMGESD,
+   IX86_BUILTIN_COMUEQSD,
+   IX86_BUILTIN_COMUNESD,
+   IX86_BUILTIN_COMULTSD,
+   IX86_BUILTIN_COMULESD,
+   IX86_BUILTIN_COMUGTSD,
+   IX86_BUILTIN_COMUGESD,
+   IX86_BUILTIN_COMORDSD,
+   IX86_BUILTIN_COMUNORDSD,
+   IX86_BUILTIN_COMFALSESD,
+   IX86_BUILTIN_COMTRUESD,
+ 
+   IX86_BUILTIN_COMEQPS,
+   IX86_BUILTIN_COMNEPS,
+   IX86_BUILTIN_COMLTPS,
+   IX86_BUILTIN_COMLEPS,
+   IX86_BUILTIN_COMGTPS,
+   IX86_BUILTIN_COMGEPS,
+   IX86_BUILTIN_COMUEQPS,
+   IX86_BUILTIN_COMUNEPS,
+   IX86_BUILTIN_COMULTPS,
+   IX86_BUILTIN_COMULEPS,
+   IX86_BUILTIN_COMUGTPS,
+   IX86_BUILTIN_COMUGEPS,
+   IX86_BUILTIN_COMORDPS,
+   IX86_BUILTIN_COMUNORDPS,
+   IX86_BUILTIN_COMFALSEPS,
+   IX86_BUILTIN_COMTRUEPS,
+ 
+   IX86_BUILTIN_COMEQPD,
+   IX86_BUILTIN_COMNEPD,
+   IX86_BUILTIN_COMLTPD,
+   IX86_BUILTIN_COMLEPD,
+   IX86_BUILTIN_COMGTPD,
+   IX86_BUILTIN_COMGEPD,
+   IX86_BUILTIN_COMUEQPD,
+   IX86_BUILTIN_COMUNEPD,
+   IX86_BUILTIN_COMULTPD,
+   IX86_BUILTIN_COMULEPD,
+   IX86_BUILTIN_COMUGTPD,
+   IX86_BUILTIN_COMUGEPD,
+   IX86_BUILTIN_COMORDPD,
+   IX86_BUILTIN_COMUNORDPD,
+   IX86_BUILTIN_COMFALSEPD,
+   IX86_BUILTIN_COMTRUEPD,
+ 
+   IX86_BUILTIN_PCOMEQUB,
+   IX86_BUILTIN_PCOMNEUB,
+   IX86_BUILTIN_PCOMLTUB,
+   IX86_BUILTIN_PCOMLEUB,
+   IX86_BUILTIN_PCOMGTUB,
+   IX86_BUILTIN_PCOMGEUB,
+   IX86_BUILTIN_PCOMFALSEUB,
+   IX86_BUILTIN_PCOMTRUEUB,
+   IX86_BUILTIN_PCOMEQUW,
+   IX86_BUILTIN_PCOMNEUW,
+   IX86_BUILTIN_PCOMLTUW,
+   IX86_BUILTIN_PCOMLEUW,
+   IX86_BUILTIN_PCOMGTUW,
+   IX86_BUILTIN_PCOMGEUW,
+   IX86_BUILTIN_PCOMFALSEUW,
+   IX86_BUILTIN_PCOMTRUEUW,
+   IX86_BUILTIN_PCOMEQUD,
+   IX86_BUILTIN_PCOMNEUD,
+   IX86_BUILTIN_PCOMLTUD,
+   IX86_BUILTIN_PCOMLEUD,
+   IX86_BUILTIN_PCOMGTUD,
+   IX86_BUILTIN_PCOMGEUD,
+   IX86_BUILTIN_PCOMFALSEUD,
+   IX86_BUILTIN_PCOMTRUEUD,
+   IX86_BUILTIN_PCOMEQUQ,
+   IX86_BUILTIN_PCOMNEUQ,
+   IX86_BUILTIN_PCOMLTUQ,
+   IX86_BUILTIN_PCOMLEUQ,
+   IX86_BUILTIN_PCOMGTUQ,
+   IX86_BUILTIN_PCOMGEUQ,
+   IX86_BUILTIN_PCOMFALSEUQ,
+   IX86_BUILTIN_PCOMTRUEUQ,
+ 
+   IX86_BUILTIN_PCOMEQB,
+   IX86_BUILTIN_PCOMNEB,
+   IX86_BUILTIN_PCOMLTB,
+   IX86_BUILTIN_PCOMLEB,
+   IX86_BUILTIN_PCOMGTB,
+   IX86_BUILTIN_PCOMGEB,
+   IX86_BUILTIN_PCOMFALSEB,
+   IX86_BUILTIN_PCOMTRUEB,
+   IX86_BUILTIN_PCOMEQW,
+   IX86_BUILTIN_PCOMNEW,
+   IX86_BUILTIN_PCOMLTW,
+   IX86_BUILTIN_PCOMLEW,
+   IX86_BUILTIN_PCOMGTW,
+   IX86_BUILTIN_PCOMGEW,
+   IX86_BUILTIN_PCOMFALSEW,
+   IX86_BUILTIN_PCOMTRUEW,
+   IX86_BUILTIN_PCOMEQD,
+   IX86_BUILTIN_PCOMNED,
+   IX86_BUILTIN_PCOMLTD,
+   IX86_BUILTIN_PCOMLED,
+   IX86_BUILTIN_PCOMGTD,
+   IX86_BUILTIN_PCOMGED,
+   IX86_BUILTIN_PCOMFALSED,
+   IX86_BUILTIN_PCOMTRUED,
+   IX86_BUILTIN_PCOMEQQ,
+   IX86_BUILTIN_PCOMNEQ,
+   IX86_BUILTIN_PCOMLTQ,
+   IX86_BUILTIN_PCOMLEQ,
+   IX86_BUILTIN_PCOMGTQ,
+   IX86_BUILTIN_PCOMGEQ,
+   IX86_BUILTIN_PCOMFALSEQ,
+   IX86_BUILTIN_PCOMTRUEQ,
+ 
    IX86_BUILTIN_MAX
  };
  
*************** static const struct builtin_description 
*** 17537,17542 ****
--- 17920,18175 ----
    { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundps, 0, IX86_BUILTIN_ROUNDPS, UNKNOWN, 0 },
  };
  
+ /* SSE5 */
+ enum multi_arg_type {
+   MULTI_ARG_UNKNOWN,
+   MULTI_ARG_3_SF,
+   MULTI_ARG_3_DF,
+   MULTI_ARG_3_DI,
+   MULTI_ARG_3_PERMPS,
+   MULTI_ARG_3_PERMPD,
+   MULTI_ARG_2_SF,
+   MULTI_ARG_2_DF,
+   MULTI_ARG_2_DI,
+   MULTI_ARG_2_DI_IMM,
+   MULTI_ARG_1_SF,
+   MULTI_ARG_1_DF,
+   MULTI_ARG_1_DI,
+   MULTI_ARG_2_SF_CMP,
+   MULTI_ARG_2_DF_CMP,
+   MULTI_ARG_2_DI_CMP,
+   MULTI_ARG_2_DI_TF,
+   MULTI_ARG_2_SF_TF,
+   MULTI_ARG_2_DF_TF
+ };
+ 
+ static const struct builtin_description bdesc_multi_arg[] =
+ {
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5is_fmaddv4sf4,      "__builtin_ia32_fmaddss",    IX86_BUILTIN_FMADDSS,    0,            (int)MULTI_ARG_3_SF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5is_fmaddv2df4,      "__builtin_ia32_fmaddsd",    IX86_BUILTIN_FMADDSD,    0,            (int)MULTI_ARG_3_DF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5ip_fmaddv4sf4,      "__builtin_ia32_fmaddps",    IX86_BUILTIN_FMADDPS,    0,            (int)MULTI_ARG_3_SF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5ip_fmaddv2df4,      "__builtin_ia32_fmaddpd",    IX86_BUILTIN_FMADDPD,    0,            (int)MULTI_ARG_3_DF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5is_fmsubv4sf4,      "__builtin_ia32_fmsubss",    IX86_BUILTIN_FMSUBSS,    0,            (int)MULTI_ARG_3_SF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5is_fmsubv2df4,      "__builtin_ia32_fmsubsd",    IX86_BUILTIN_FMSUBSD,    0,            (int)MULTI_ARG_3_DF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5ip_fmsubv4sf4,      "__builtin_ia32_fmsubps",    IX86_BUILTIN_FMSUBPS,    0,            (int)MULTI_ARG_3_SF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5ip_fmsubv2df4,      "__builtin_ia32_fmsubpd",    IX86_BUILTIN_FMSUBPD,    0,            (int)MULTI_ARG_3_DF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5is_fnmaddv4sf4,     "__builtin_ia32_fnmaddss",   IX86_BUILTIN_FNMADDSS,   0,            (int)MULTI_ARG_3_SF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5is_fnmaddv2df4,     "__builtin_ia32_fnmaddsd",   IX86_BUILTIN_FNMADDSD,   0,            (int)MULTI_ARG_3_DF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5ip_fnmaddv4sf4,     "__builtin_ia32_fnmaddps",   IX86_BUILTIN_FNMADDPS,   0,            (int)MULTI_ARG_3_SF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5ip_fnmaddv2df4,     "__builtin_ia32_fnmaddpd",   IX86_BUILTIN_FNMADDPD,   0,            (int)MULTI_ARG_3_DF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5is_fnmsubv4sf4,     "__builtin_ia32_fnmsubss",   IX86_BUILTIN_FNMSUBSS,   0,            (int)MULTI_ARG_3_SF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5is_fnmsubv2df4,     "__builtin_ia32_fnmsubsd",   IX86_BUILTIN_FNMSUBSD,   0,            (int)MULTI_ARG_3_DF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5ip_fnmsubv4sf4,     "__builtin_ia32_fnmsubps",   IX86_BUILTIN_FNMSUBPS,   0,            (int)MULTI_ARG_3_SF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5ip_fnmsubv2df4,     "__builtin_ia32_fnmsubpd",   IX86_BUILTIN_FNMSUBPD,   0,            (int)MULTI_ARG_3_DF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pcmov_v2di,        "__builtin_ia32_pcmov",      IX86_BUILTIN_PCMOV,      0,            (int)MULTI_ARG_3_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pcmov_v2df,        "__builtin_ia32_pcmov_v2df", IX86_BUILTIN_PCMOV_V2DF, 0,            (int)MULTI_ARG_3_DF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pcmov_v4sf,        "__builtin_ia32_pcmov_v4sf", IX86_BUILTIN_PCMOV_V4SF, 0,            (int)MULTI_ARG_3_SF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pperm,             "__builtin_ia32_pperm",      IX86_BUILTIN_PPERM,      0,            (int)MULTI_ARG_3_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_permps,            "__builtin_ia32_permps",     IX86_BUILTIN_PERMPS,     0,            (int)MULTI_ARG_3_PERMPS },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_permpd,            "__builtin_ia32_permpd",     IX86_BUILTIN_PERMPD,     0,            (int)MULTI_ARG_3_PERMPD },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pmacssww,          "__builtin_ia32_pmacssww",   IX86_BUILTIN_PMACSSWW,   0,            (int)MULTI_ARG_3_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pmacsww,           "__builtin_ia32_pmacsww",    IX86_BUILTIN_PMACSWW,    0,            (int)MULTI_ARG_3_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pmacsswd,          "__builtin_ia32_pmacsswd",   IX86_BUILTIN_PMACSSWD,   0,            (int)MULTI_ARG_3_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pmacswd,           "__builtin_ia32_pmacswd",    IX86_BUILTIN_PMACSWD,    0,            (int)MULTI_ARG_3_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pmacssdd,          "__builtin_ia32_pmacssdd",   IX86_BUILTIN_PMACSSDD,   0,            (int)MULTI_ARG_3_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pmacsdd,           "__builtin_ia32_pmacsdd",    IX86_BUILTIN_PMACSDD,    0,            (int)MULTI_ARG_3_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pmacssdql,         "__builtin_ia32_pmacssdql",  IX86_BUILTIN_PMACSSDQL,  0,            (int)MULTI_ARG_3_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pmacssdqh,         "__builtin_ia32_pmacssdqh",  IX86_BUILTIN_PMACSSDQH,  0,            (int)MULTI_ARG_3_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pmacsdql,          "__builtin_ia32_pmacsdql",   IX86_BUILTIN_PMACSDQL,   0,            (int)MULTI_ARG_3_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pmacsdqh,          "__builtin_ia32_pmacsdqh",   IX86_BUILTIN_PMACSDQH,   0,            (int)MULTI_ARG_3_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pmadcsswd,         "__builtin_ia32_pmadcsswd",  IX86_BUILTIN_PMADCSSWD,  0,            (int)MULTI_ARG_3_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pmadcswd,          "__builtin_ia32_pmadcswd",   IX86_BUILTIN_PMADCSWD,   0,            (int)MULTI_ARG_3_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_protb,             "__builtin_ia32_protb",      IX86_BUILTIN_PROTB,      0,            (int)MULTI_ARG_2_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_protw,             "__builtin_ia32_protw",      IX86_BUILTIN_PROTW,      0,            (int)MULTI_ARG_2_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_protd,             "__builtin_ia32_protd",      IX86_BUILTIN_PROTD,      0,            (int)MULTI_ARG_2_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_protq,             "__builtin_ia32_protq",      IX86_BUILTIN_PROTQ,      0,            (int)MULTI_ARG_2_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_protb_imm,         "__builtin_ia32_protbi",     IX86_BUILTIN_PROTB_IMM,  0,            (int)MULTI_ARG_2_DI_IMM },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_protw_imm,         "__builtin_ia32_protwi",     IX86_BUILTIN_PROTW_IMM,  0,            (int)MULTI_ARG_2_DI_IMM },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_protd_imm,         "__builtin_ia32_protdi",     IX86_BUILTIN_PROTD_IMM,  0,            (int)MULTI_ARG_2_DI_IMM },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_rotlv2di3,              "__builtin_ia32_protqi",     IX86_BUILTIN_PROTQ_IMM,  0,            (int)MULTI_ARG_2_DI_IMM },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pshlb,             "__builtin_ia32_pshlb",      IX86_BUILTIN_PSHLB,      0,            (int)MULTI_ARG_2_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pshlw,             "__builtin_ia32_pshlw",      IX86_BUILTIN_PSHLW,      0,            (int)MULTI_ARG_2_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pshld,             "__builtin_ia32_pshld",      IX86_BUILTIN_PSHLD,      0,            (int)MULTI_ARG_2_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pshlq,             "__builtin_ia32_pshlq",      IX86_BUILTIN_PSHLQ,      0,            (int)MULTI_ARG_2_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pshab,             "__builtin_ia32_pshab",      IX86_BUILTIN_PSHAB,      0,            (int)MULTI_ARG_2_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pshaw,             "__builtin_ia32_pshaw",      IX86_BUILTIN_PSHAW,      0,            (int)MULTI_ARG_2_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pshad,             "__builtin_ia32_pshad",      IX86_BUILTIN_PSHAD,      0,            (int)MULTI_ARG_2_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pshaq,             "__builtin_ia32_pshaq",      IX86_BUILTIN_PSHAQ,      0,            (int)MULTI_ARG_2_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_frczss,            "__builtin_ia32_frczss",     IX86_BUILTIN_FRCZSS,     0,            (int)MULTI_ARG_1_SF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_frczsd,            "__builtin_ia32_frczsd",     IX86_BUILTIN_FRCZSD,     0,            (int)MULTI_ARG_1_DF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_frczps,            "__builtin_ia32_frczps",     IX86_BUILTIN_FRCZPS,     0,            (int)MULTI_ARG_1_SF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_frczpd,            "__builtin_ia32_frczpd",     IX86_BUILTIN_FRCZPD,     0,            (int)MULTI_ARG_1_DF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_cvtph2ps,          "__builtin_ia32_cvtph2ps",   IX86_BUILTIN_CVTPH2PS,   0,            (int)MULTI_ARG_2_SF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_cvtps2ph,          "__builtin_ia32_cvtps2ph",   IX86_BUILTIN_CVTPS2PH,   0,            (int)MULTI_ARG_1_SF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_phaddbw,           "__builtin_ia32_phaddbw",    IX86_BUILTIN_PHADDBW,    0,            (int)MULTI_ARG_1_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_phaddbd,           "__builtin_ia32_phaddbd",    IX86_BUILTIN_PHADDBD,    0,            (int)MULTI_ARG_1_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_phaddbq,           "__builtin_ia32_phaddbq",    IX86_BUILTIN_PHADDBQ,    0,            (int)MULTI_ARG_1_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_phaddwd,           "__builtin_ia32_phaddwd",    IX86_BUILTIN_PHADDWD,    0,            (int)MULTI_ARG_1_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_phaddwq,           "__builtin_ia32_phaddwq",    IX86_BUILTIN_PHADDWQ,    0,            (int)MULTI_ARG_1_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_phadddq,           "__builtin_ia32_phadddq",    IX86_BUILTIN_PHADDDQ,    0,            (int)MULTI_ARG_1_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_phaddubw,          "__builtin_ia32_phaddubw",   IX86_BUILTIN_PHADDUBW,   0,            (int)MULTI_ARG_1_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_phaddubd,          "__builtin_ia32_phaddubd",   IX86_BUILTIN_PHADDUBD,   0,            (int)MULTI_ARG_1_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_phaddubq,          "__builtin_ia32_phaddubq",   IX86_BUILTIN_PHADDUBQ,   0,            (int)MULTI_ARG_1_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_phadduwd,          "__builtin_ia32_phadduwd",   IX86_BUILTIN_PHADDUWD,   0,            (int)MULTI_ARG_1_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_phadduwq,          "__builtin_ia32_phadduwq",   IX86_BUILTIN_PHADDUWQ,   0,            (int)MULTI_ARG_1_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_phaddudq,          "__builtin_ia32_phaddudq",   IX86_BUILTIN_PHADDUDQ,   0,            (int)MULTI_ARG_1_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_phsubbw,           "__builtin_ia32_phsubbw",    IX86_BUILTIN_PHSUBBW,    0,            (int)MULTI_ARG_1_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_phsubwd,           "__builtin_ia32_phsubwd",    IX86_BUILTIN_PHSUBWD,    0,            (int)MULTI_ARG_1_DI },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_phsubdq,           "__builtin_ia32_phsubdq",    IX86_BUILTIN_PHSUBDQ,    0,            (int)MULTI_ARG_1_DI },
+ 
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v4sf,    "__builtin_ia32_comeqss",    IX86_BUILTIN_COMEQSS,    EQ,           (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v4sf,    "__builtin_ia32_comness",    IX86_BUILTIN_COMNESS,    NE,           (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v4sf,    "__builtin_ia32_comneqss",   IX86_BUILTIN_COMNESS,    NE,           (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v4sf,    "__builtin_ia32_comltss",    IX86_BUILTIN_COMLTSS,    LT,           (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v4sf,    "__builtin_ia32_comless",    IX86_BUILTIN_COMLESS,    LE,           (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v4sf,    "__builtin_ia32_comgtss",    IX86_BUILTIN_COMGTSS,    GT,           (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v4sf,    "__builtin_ia32_comgess",    IX86_BUILTIN_COMGESS,    GE,           (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v4sf,    "__builtin_ia32_comueqss",   IX86_BUILTIN_COMUEQSS,   UNEQ,         (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v4sf,    "__builtin_ia32_comuness",   IX86_BUILTIN_COMUNESS,   LTGT,         (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v4sf,    "__builtin_ia32_comuneqss",  IX86_BUILTIN_COMUNESS,   LTGT,         (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v4sf,    "__builtin_ia32_comunltss",  IX86_BUILTIN_COMULTSS,   UNLT,         (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v4sf,    "__builtin_ia32_comunless",  IX86_BUILTIN_COMULESS,   UNLE,         (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v4sf,    "__builtin_ia32_comungtss",  IX86_BUILTIN_COMUGTSS,   UNGT,         (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v4sf,    "__builtin_ia32_comungess",  IX86_BUILTIN_COMUGESS,   UNGE,         (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v4sf,    "__builtin_ia32_comordss",   IX86_BUILTIN_COMORDSS,   ORDERED,      (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v4sf,    "__builtin_ia32_comunordss", IX86_BUILTIN_COMUNORDSS, UNORDERED,    (int)MULTI_ARG_2_SF_CMP },
+ 
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v2df,    "__builtin_ia32_comeqsd",    IX86_BUILTIN_COMEQSD,    EQ,           (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v2df,    "__builtin_ia32_comnesd",    IX86_BUILTIN_COMNESD,    NE,           (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v2df,    "__builtin_ia32_comneqsd",   IX86_BUILTIN_COMNESD,    NE,           (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v2df,    "__builtin_ia32_comltsd",    IX86_BUILTIN_COMLTSD,    LT,           (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v2df,    "__builtin_ia32_comlesd",    IX86_BUILTIN_COMLESD,    LE,           (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v2df,    "__builtin_ia32_comgtsd",    IX86_BUILTIN_COMGTSD,    GT,           (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v2df,    "__builtin_ia32_comgesd",    IX86_BUILTIN_COMGESD,    GE,           (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v2df,    "__builtin_ia32_comueqsd",   IX86_BUILTIN_COMUEQSD,   UNEQ,         (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v2df,    "__builtin_ia32_comunesd",   IX86_BUILTIN_COMUNESD,   LTGT,         (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v2df,    "__builtin_ia32_comuneqsd",  IX86_BUILTIN_COMUNESD,   LTGT,         (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v2df,    "__builtin_ia32_comunltsd",  IX86_BUILTIN_COMULTSD,   UNLT,         (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v2df,    "__builtin_ia32_comunlesd",  IX86_BUILTIN_COMULESD,   UNLE,         (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v2df,    "__builtin_ia32_comungtsd",  IX86_BUILTIN_COMUGTSD,   UNGT,         (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v2df,    "__builtin_ia32_comungesd",  IX86_BUILTIN_COMUGESD,   UNGE,         (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v2df,    "__builtin_ia32_comordsd",   IX86_BUILTIN_COMORDSD,   ORDERED,      (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_s_v2df,    "__builtin_ia32_comunordsd", IX86_BUILTIN_COMUNORDSD, UNORDERED,    (int)MULTI_ARG_2_DF_CMP },
+ 
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv4sf3,      "__builtin_ia32_comeqps",    IX86_BUILTIN_COMEQPS,    EQ,           (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv4sf3,      "__builtin_ia32_comneps",    IX86_BUILTIN_COMNEPS,    NE,           (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv4sf3,      "__builtin_ia32_comneqps",   IX86_BUILTIN_COMNEPS,    NE,           (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv4sf3,      "__builtin_ia32_comltps",    IX86_BUILTIN_COMLTPS,    LT,           (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv4sf3,      "__builtin_ia32_comleps",    IX86_BUILTIN_COMLEPS,    LE,           (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv4sf3,      "__builtin_ia32_comgtps",    IX86_BUILTIN_COMGTPS,    GT,           (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv4sf3,      "__builtin_ia32_comgeps",    IX86_BUILTIN_COMGEPS,    GE,           (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv4sf3,      "__builtin_ia32_comueqps",   IX86_BUILTIN_COMUEQPS,   UNEQ,         (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv4sf3,      "__builtin_ia32_comuneps",   IX86_BUILTIN_COMUNEPS,   LTGT,         (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv4sf3,      "__builtin_ia32_comuneqps",  IX86_BUILTIN_COMUNEPS,   LTGT,         (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv4sf3,      "__builtin_ia32_comunltps",  IX86_BUILTIN_COMULTPS,   UNLT,         (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv4sf3,      "__builtin_ia32_comunleps",  IX86_BUILTIN_COMULEPS,   UNLE,         (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv4sf3,      "__builtin_ia32_comungtps",  IX86_BUILTIN_COMUGTPS,   UNGT,         (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv4sf3,      "__builtin_ia32_comungeps",  IX86_BUILTIN_COMUGEPS,   UNGE,         (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv4sf3,      "__builtin_ia32_comordps",   IX86_BUILTIN_COMORDPS,   ORDERED,      (int)MULTI_ARG_2_SF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv4sf3,      "__builtin_ia32_comunordps", IX86_BUILTIN_COMUNORDPS, UNORDERED,    (int)MULTI_ARG_2_SF_CMP },
+ 
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2df3,      "__builtin_ia32_comeqpd",    IX86_BUILTIN_COMEQPD,    EQ,           (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2df3,      "__builtin_ia32_comnepd",    IX86_BUILTIN_COMNEPD,    NE,           (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2df3,      "__builtin_ia32_comneqpd",   IX86_BUILTIN_COMNEPD,    NE,           (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2df3,      "__builtin_ia32_comltpd",    IX86_BUILTIN_COMLTPD,    LT,           (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2df3,      "__builtin_ia32_comlepd",    IX86_BUILTIN_COMLEPD,    LE,           (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2df3,      "__builtin_ia32_comgtpd",    IX86_BUILTIN_COMGTPD,    GT,           (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2df3,      "__builtin_ia32_comgepd",    IX86_BUILTIN_COMGEPD,    GE,           (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2df3,      "__builtin_ia32_comueqpd",   IX86_BUILTIN_COMUEQPD,   UNEQ,         (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2df3,      "__builtin_ia32_comunepd",   IX86_BUILTIN_COMUNEPD,   LTGT,         (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2df3,      "__builtin_ia32_comuneqpd",  IX86_BUILTIN_COMUNEPD,   LTGT,         (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2df3,      "__builtin_ia32_comunltpd",  IX86_BUILTIN_COMULTPD,   UNLT,         (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2df3,      "__builtin_ia32_comunlepd",  IX86_BUILTIN_COMULEPD,   UNLE,         (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2df3,      "__builtin_ia32_comungtpd",  IX86_BUILTIN_COMUGTPD,   UNGT,         (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2df3,      "__builtin_ia32_comungepd",  IX86_BUILTIN_COMUGEPD,   UNGE,         (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2df3,      "__builtin_ia32_comordpd",   IX86_BUILTIN_COMORDPD,   ORDERED,      (int)MULTI_ARG_2_DF_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2df3,      "__builtin_ia32_comunordpd", IX86_BUILTIN_COMUNORDPD, UNORDERED,    (int)MULTI_ARG_2_DF_CMP },
+ 
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_8,       "__builtin_ia32_pcomeqb",    IX86_BUILTIN_PCOMEQB,    EQ,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_8,       "__builtin_ia32_pcomneb",    IX86_BUILTIN_PCOMNEB,    NE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_8,       "__builtin_ia32_pcomneqb",   IX86_BUILTIN_PCOMNEB,    NE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_8,       "__builtin_ia32_pcomltb",    IX86_BUILTIN_PCOMLTB,    LT,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_8,       "__builtin_ia32_pcomleb",    IX86_BUILTIN_PCOMLEB,    LE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_8,       "__builtin_ia32_pcomgtb",    IX86_BUILTIN_PCOMGTB,    GT,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_8,       "__builtin_ia32_pcomgeb",    IX86_BUILTIN_PCOMGEB,    GE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_16,      "__builtin_ia32_pcomeqw",    IX86_BUILTIN_PCOMEQW,    EQ,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_16,      "__builtin_ia32_pcomnew",    IX86_BUILTIN_PCOMNEW,    NE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_16,      "__builtin_ia32_pcomneqw",   IX86_BUILTIN_PCOMNEW,    NE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_16,      "__builtin_ia32_pcomltw",    IX86_BUILTIN_PCOMLTW,    LT,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_16,      "__builtin_ia32_pcomlew",    IX86_BUILTIN_PCOMLEW,    LE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_16,      "__builtin_ia32_pcomgtw",    IX86_BUILTIN_PCOMGTW,    GT,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_16,      "__builtin_ia32_pcomgew",    IX86_BUILTIN_PCOMGEW,    GE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_32,      "__builtin_ia32_pcomeqd",    IX86_BUILTIN_PCOMEQB,    EQ,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_32,      "__builtin_ia32_pcomned",    IX86_BUILTIN_PCOMNED,    NE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_32,      "__builtin_ia32_pcomneqd",   IX86_BUILTIN_PCOMNED,    NE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_32,      "__builtin_ia32_pcomltd",    IX86_BUILTIN_PCOMLTD,    LT,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_32,      "__builtin_ia32_pcomled",    IX86_BUILTIN_PCOMLED,    LE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_32,      "__builtin_ia32_pcomgtd",    IX86_BUILTIN_PCOMGTD,    GT,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_32,      "__builtin_ia32_pcomged",    IX86_BUILTIN_PCOMGED,    GE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2di3,      "__builtin_ia32_pcomeqq",    IX86_BUILTIN_PCOMEQQ,    EQ,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2di3,      "__builtin_ia32_pcomneq",    IX86_BUILTIN_PCOMNEQ,    NE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2di3,      "__builtin_ia32_pcomneqq",   IX86_BUILTIN_PCOMNEQ,    NE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2di3,      "__builtin_ia32_pcomltq",    IX86_BUILTIN_PCOMLTQ,    LT,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2di3,      "__builtin_ia32_pcomleq",    IX86_BUILTIN_PCOMLEQ,    LE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2di3,      "__builtin_ia32_pcomgtq",    IX86_BUILTIN_PCOMGTQ,    GT,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmpv2di3,      "__builtin_ia32_pcomgeq",    IX86_BUILTIN_PCOMGEQ,    GE,           (int)MULTI_ARG_2_DI_CMP },
+ 
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u8,      "__builtin_ia32_pcomequb",   IX86_BUILTIN_PCOMEQUB,   EQ,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u8,      "__builtin_ia32_pcomneub",   IX86_BUILTIN_PCOMNEUB,   NE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u8,      "__builtin_ia32_pcomnequb",  IX86_BUILTIN_PCOMNEUB,   NE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u8,      "__builtin_ia32_pcomltub",   IX86_BUILTIN_PCOMLTUB,   LTU,          (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u8,      "__builtin_ia32_pcomleub",   IX86_BUILTIN_PCOMLEUB,   LEU,          (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u8,      "__builtin_ia32_pcomgtub",   IX86_BUILTIN_PCOMGTUB,   GTU,          (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u8,      "__builtin_ia32_pcomgeub",   IX86_BUILTIN_PCOMGEUB,   GEU,          (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u16,     "__builtin_ia32_pcomequw",   IX86_BUILTIN_PCOMEQUW,   EQ,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u16,     "__builtin_ia32_pcomneuw",   IX86_BUILTIN_PCOMNEUW,   NE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u16,     "__builtin_ia32_pcomnequw",  IX86_BUILTIN_PCOMNEUW,   NE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u16,     "__builtin_ia32_pcomltuw",   IX86_BUILTIN_PCOMLTUW,   LTU,          (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u16,     "__builtin_ia32_pcomleuw",   IX86_BUILTIN_PCOMLEUW,   LEU,          (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u16,     "__builtin_ia32_pcomgtuw",   IX86_BUILTIN_PCOMGTUW,   GTU,          (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u16,     "__builtin_ia32_pcomgeuw",   IX86_BUILTIN_PCOMGEUW,   GEU,          (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u32,     "__builtin_ia32_pcomequd",   IX86_BUILTIN_PCOMEQUB,   EQ,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u32,     "__builtin_ia32_pcomneud",   IX86_BUILTIN_PCOMNEUD,   NE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u32,     "__builtin_ia32_pcomnequd",  IX86_BUILTIN_PCOMNEUD,   NE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u32,     "__builtin_ia32_pcomltud",   IX86_BUILTIN_PCOMLTUD,   LTU,          (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u32,     "__builtin_ia32_pcomleud",   IX86_BUILTIN_PCOMLEUD,   LEU,          (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u32,     "__builtin_ia32_pcomgtud",   IX86_BUILTIN_PCOMGTUD,   GTU,          (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_setccv2di_u32,     "__builtin_ia32_pcomgeud",   IX86_BUILTIN_PCOMGEUD,   GEU,          (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_uns2v2di3, "__builtin_ia32_pcomequq",   IX86_BUILTIN_PCOMEQUQ,   EQ,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_uns2v2di3, "__builtin_ia32_pcomneuq",   IX86_BUILTIN_PCOMNEUQ,   NE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_uns2v2di3, "__builtin_ia32_pcomnequq",  IX86_BUILTIN_PCOMNEUQ,   NE,           (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_uns2v2di3, "__builtin_ia32_pcomltuq",   IX86_BUILTIN_PCOMLTUQ,   LTU,          (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_uns2v2di3, "__builtin_ia32_pcomleuq",   IX86_BUILTIN_PCOMLEUQ,   LEU,          (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_uns2v2di3, "__builtin_ia32_pcomgtuq",   IX86_BUILTIN_PCOMGTUQ,   GTU,          (int)MULTI_ARG_2_DI_CMP },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_maskcmp_uns2v2di3, "__builtin_ia32_pcomgeuq",   IX86_BUILTIN_PCOMGEUQ,   GEU,          (int)MULTI_ARG_2_DI_CMP },
+ 
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_com_tfv4sf3,       "__builtin_ia32_comfalsess", IX86_BUILTIN_COMFALSESS, COM_FALSE_S,  (int)MULTI_ARG_2_SF_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_com_tfv4sf3,       "__builtin_ia32_comtruess",  IX86_BUILTIN_COMTRUESS,  COM_TRUE_S,   (int)MULTI_ARG_2_SF_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_com_tfv4sf3,       "__builtin_ia32_comfalseps", IX86_BUILTIN_COMFALSEPS, COM_FALSE_P,  (int)MULTI_ARG_2_SF_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_com_tfv4sf3,       "__builtin_ia32_comtrueps",  IX86_BUILTIN_COMTRUEPS,  COM_TRUE_P,   (int)MULTI_ARG_2_SF_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_com_tfv2df3,       "__builtin_ia32_comfalsesd", IX86_BUILTIN_COMFALSESD, COM_FALSE_S,  (int)MULTI_ARG_2_DF_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_com_tfv2df3,       "__builtin_ia32_comtruesd",  IX86_BUILTIN_COMTRUESD,  COM_TRUE_S,   (int)MULTI_ARG_2_DF_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_com_tfv2df3,       "__builtin_ia32_comfalsepd", IX86_BUILTIN_COMFALSEPD, COM_FALSE_P,  (int)MULTI_ARG_2_DF_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_com_tfv2df3,       "__builtin_ia32_comtruepd",  IX86_BUILTIN_COMTRUEPD,  COM_TRUE_P,   (int)MULTI_ARG_2_DF_TF },
+ 
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pcom_tfdi3,        "__builtin_ia32_pcomfalseb", IX86_BUILTIN_PCOMFALSEB, PCOM_FALSE_B, (int)MULTI_ARG_2_DI_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pcom_tfdi3,        "__builtin_ia32_pcomtrueb",  IX86_BUILTIN_PCOMTRUEB,  PCOM_TRUE_UB, (int)MULTI_ARG_2_DI_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pcom_tfdi3,        "__builtin_ia32_pcomfalsew", IX86_BUILTIN_PCOMFALSEW, PCOM_FALSE_W, (int)MULTI_ARG_2_DI_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pcom_tfdi3,        "__builtin_ia32_pcomtruew",  IX86_BUILTIN_PCOMTRUEW,  PCOM_TRUE_W,  (int)MULTI_ARG_2_DI_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pcom_tfdi3,        "__builtin_ia32_pcomfalsed", IX86_BUILTIN_PCOMFALSED, PCOM_FALSE_D, (int)MULTI_ARG_2_DI_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pcom_tfdi3,        "__builtin_ia32_pcomtrued",  IX86_BUILTIN_PCOMTRUED,  PCOM_TRUE_D,  (int)MULTI_ARG_2_DI_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pcom_tfdi3,        "__builtin_ia32_pcomfalseq", IX86_BUILTIN_PCOMFALSEQ, PCOM_FALSE_Q, (int)MULTI_ARG_2_DI_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pcom_tfdi3,        "__builtin_ia32_pcomtrueq",  IX86_BUILTIN_PCOMTRUEQ,  PCOM_TRUE_Q,  (int)MULTI_ARG_2_DI_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pcom_tfdi3,        "__builtin_ia32_pcomfalseub",IX86_BUILTIN_PCOMFALSEUB,PCOM_FALSE_UB,(int)MULTI_ARG_2_DI_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pcom_tfdi3,        "__builtin_ia32_pcomtrueub", IX86_BUILTIN_PCOMTRUEUB, PCOM_TRUE_UB, (int)MULTI_ARG_2_DI_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pcom_tfdi3,        "__builtin_ia32_pcomfalseuw",IX86_BUILTIN_PCOMFALSEUW,PCOM_FALSE_UW,(int)MULTI_ARG_2_DI_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pcom_tfdi3,        "__builtin_ia32_pcomtrueuw", IX86_BUILTIN_PCOMTRUEUW, PCOM_TRUE_UW, (int)MULTI_ARG_2_DI_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pcom_tfdi3,        "__builtin_ia32_pcomfalseud",IX86_BUILTIN_PCOMFALSEUQ,PCOM_FALSE_UD,(int)MULTI_ARG_2_DI_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pcom_tfdi3,        "__builtin_ia32_pcomtrueud", IX86_BUILTIN_PCOMTRUEUQ, PCOM_TRUE_UD, (int)MULTI_ARG_2_DI_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pcom_tfdi3,        "__builtin_ia32_pcomfalseuq",IX86_BUILTIN_PCOMFALSEUQ,PCOM_FALSE_UQ,(int)MULTI_ARG_2_DI_TF },
+   { OPTION_MASK_ISA_SSE5, CODE_FOR_sse5_pcom_tfdi3,        "__builtin_ia32_pcomtrueuq", IX86_BUILTIN_PCOMTRUEUQ, PCOM_TRUE_UQ, (int)MULTI_ARG_2_DI_TF },
+ };
+ 
  /* Set up all the MMX/SSE builtins.  This is not called if TARGET_MMX
     is zero.  Otherwise, if TARGET_SSE is not set, only expand the MMX
     builtins.  */
*************** ix86_init_mmx_sse_builtins (void)
*** 17915,17920 ****
--- 18548,18585 ----
  				V16QI_type_node,
  				integer_type_node,
  				NULL_TREE);
+ 
+   /* SSE5 instructions */
+   tree v2di_ftype_v2di_v2di_v2di
+     = build_function_type_list (V2DI_type_node,
+ 				V2DI_type_node,
+ 				V2DI_type_node,
+ 				V2DI_type_node,
+ 				NULL_TREE);
+ 
+   tree v2df_ftype_v2df_v2df_v2di
+     = build_function_type_list (V2DF_type_node,
+ 				V2DF_type_node,
+ 				V2DF_type_node,
+ 				V2DI_type_node,
+ 				NULL_TREE);
+ 
+   tree v4sf_ftype_v4sf_v4sf_v2di
+     = build_function_type_list (V4SF_type_node,
+ 				V4SF_type_node,
+ 				V4SF_type_node,
+ 				V2DI_type_node,
+ 				NULL_TREE);
+ 
+   tree v2di_ftype_v2di_si
+     = build_function_type_list (V2DI_type_node,
+ 				V2DI_type_node,
+ 				integer_type_node,
+ 				NULL_TREE);
+ 
+   tree v2di_ftype_v2di
+     = build_function_type_list (V2DI_type_node, V2DI_type_node, NULL_TREE);
+ 
    tree ftype;
  
    /* The __float80 type.  */
*************** ix86_init_mmx_sse_builtins (void)
*** 18480,18485 ****
--- 19145,19188 ----
  				    intQI_type_node,
  				    integer_type_node, NULL_TREE);
    def_builtin_const (OPTION_MASK_ISA_SSE4_1, "__builtin_ia32_vec_set_v16qi", ftype, IX86_BUILTIN_VEC_SET_V16QI);
+ 
+   /* Add SSE5 multi-arg argument instructions */
+   for (i = 0, d = bdesc_multi_arg; i < ARRAY_SIZE (bdesc_multi_arg); i++, d++)
+     {
+       tree mtype = NULL_TREE;
+ 
+       if (d->name == 0)
+ 	continue;
+ 
+       switch ((enum multi_arg_type)d->flag)
+ 	{
+ 	case MULTI_ARG_3_SF:     mtype = v4sf_ftype_v4sf_v4sf_v4sf; break;
+ 	case MULTI_ARG_3_DF:     mtype = v2df_ftype_v2df_v2df_v2df; break;
+ 	case MULTI_ARG_3_DI:     mtype = v2di_ftype_v2di_v2di_v2di; break;
+ 	case MULTI_ARG_3_PERMPS: mtype = v4sf_ftype_v4sf_v4sf_v2di; break;
+ 	case MULTI_ARG_3_PERMPD: mtype = v2df_ftype_v2df_v2df_v2di; break;
+ 	case MULTI_ARG_2_SF:     mtype = v4sf_ftype_v4sf_v4sf;      break;
+ 	case MULTI_ARG_2_DF:     mtype = v2df_ftype_v2df_v2df;      break;
+ 	case MULTI_ARG_2_DI:     mtype = v2di_ftype_v2di_v2di;      break;
+ 	case MULTI_ARG_2_DI_IMM: mtype = v2di_ftype_v2di_si;        break;
+ 	case MULTI_ARG_1_SF:     mtype = v4sf_ftype_v4sf;           break;
+ 	case MULTI_ARG_1_DF:     mtype = v2df_ftype_v2df;           break;
+ 	case MULTI_ARG_1_DI:     mtype = v2di_ftype_v2di;           break;
+ 	case MULTI_ARG_2_SF_CMP: mtype = v4sf_ftype_v4sf_v4sf;      break;
+ 	case MULTI_ARG_2_DF_CMP: mtype = v2df_ftype_v2df_v2df;      break;
+ 	case MULTI_ARG_2_DI_CMP: mtype = v2di_ftype_v2di_v2di;      break;
+ 	case MULTI_ARG_2_SF_TF:  mtype = v4sf_ftype_v4sf_v4sf;      break;
+ 	case MULTI_ARG_2_DF_TF:  mtype = v2df_ftype_v2df_v2df;      break;
+ 	case MULTI_ARG_2_DI_TF:  mtype = v2di_ftype_v2di_v2di;      break;
+ 
+ 	case MULTI_ARG_UNKNOWN:
+ 	default:
+ 	  gcc_unreachable ();
+ 	}
+ 
+       if (mtype)
+ 	def_builtin_const (d->mask, d->name, mtype, d->code);
+     }
  }
  
  static void
*************** ix86_expand_binop_builtin (enum insn_cod
*** 18663,18668 ****
--- 19366,19508 ----
    return target;
  }
  
+ /* Subroutine of ix86_expand_builtin to take care of 2-4 argument insns.  */
+ 
+ static rtx
+ ix86_expand_multi_arg_builtin (enum insn_code icode, tree exp, rtx target,
+ 			       enum multi_arg_type m_type,
+ 			       enum insn_code sub_code)
+ {
+   rtx pat;
+   int i;
+   int nargs;
+   bool comparison_p = false;
+   bool tf_p = false;
+   bool last_arg_constant = (m_type == MULTI_ARG_2_DI_IMM);
+   struct {
+     rtx op;
+     enum machine_mode mode;
+   } args[4];
+ 
+   enum machine_mode tmode = insn_data[icode].operand[0].mode;
+ 
+   switch (m_type)
+     {
+     case MULTI_ARG_3_SF:
+     case MULTI_ARG_3_DF:
+     case MULTI_ARG_3_DI:
+     case MULTI_ARG_3_PERMPS:
+     case MULTI_ARG_3_PERMPD:
+       nargs = 3;
+       break;
+ 
+     case MULTI_ARG_2_SF:
+     case MULTI_ARG_2_DF:
+     case MULTI_ARG_2_DI:
+     case MULTI_ARG_2_DI_IMM:
+       nargs = 2;
+       break;
+ 
+     case MULTI_ARG_1_SF:
+     case MULTI_ARG_1_DF:
+     case MULTI_ARG_1_DI:
+       nargs = 1;
+       break;
+ 
+     case MULTI_ARG_2_SF_CMP:
+     case MULTI_ARG_2_DF_CMP:
+     case MULTI_ARG_2_DI_CMP:
+       nargs = 2;
+       comparison_p = true;
+       break;
+ 
+     case MULTI_ARG_2_SF_TF:
+     case MULTI_ARG_2_DF_TF:
+     case MULTI_ARG_2_DI_TF:
+       nargs = 2;
+       tf_p = true;
+       break;
+ 
+     case MULTI_ARG_UNKNOWN:
+     default:
+       gcc_unreachable ();
+     }
+ 
+   if (optimize || !target
+       || GET_MODE (target) != tmode
+       || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
+     target = gen_reg_rtx (tmode);
+ 
+   gcc_assert (nargs <= 4);
+ 
+   for (i = 0; i < nargs; i++)
+     {
+       tree arg = CALL_EXPR_ARG (exp, i);
+       rtx op = expand_normal (arg);
+       int adjust = (comparison_p) ? 1 : 0;
+       enum machine_mode mode = insn_data[icode].operand[i+adjust+1].mode;
+ 
+       if (last_arg_constant && i == nargs-1)
+ 	{
+ 	  if (GET_CODE (op) != CONST_INT)
+ 	    {
+ 	      error ("last argument must be an immediate");
+ 	      return gen_reg_rtx (tmode);
+ 	    }
+ 	}
+       else
+ 	{
+ 	  if (VECTOR_MODE_P (mode))
+ 	    op = safe_vector_operand (op, mode);
+ 
+ 	  if (optimize 
+ 	      || GET_MODE (op) != mode
+ 	      || ! (*insn_data[icode].operand[i+1].predicate) (op, mode))
+ 	    op = force_reg (mode, op);
+ 	}
+ 
+       gcc_assert (GET_MODE (op) == mode || GET_MODE (op) == VOIDmode);
+       args[i].op = op;
+       args[i].mode = mode;
+     }
+ 
+   switch (nargs)
+     {
+     case 1:
+       pat = GEN_FCN (icode) (target, args[0].op);
+       break;
+ 
+     case 2:
+       if (tf_p)
+ 	pat = GEN_FCN (icode) (target, args[0].op, args[1].op,
+ 			       GEN_INT ((int)sub_code));
+       else if (! comparison_p)
+ 	pat = GEN_FCN (icode) (target, args[0].op, args[1].op);
+       else
+ 	{
+ 	  rtx cmp_op = gen_rtx_fmt_ee (sub_code, GET_MODE (target),
+ 				       args[0].op,
+ 				       args[1].op);
+ 
+ 	  pat = GEN_FCN (icode) (target, cmp_op, args[0].op, args[1].op);
+ 	}
+       break;
+ 
+     case 3:
+       pat = GEN_FCN (icode) (target, args[0].op, args[1].op, args[2].op);
+       break;
+ 
+     default:
+       gcc_unreachable ();
+     }
+ 
+   if (! pat)
+     return 0;
+ 
+   emit_insn (pat);
+   return target;
+ }
+ 
  /* Subroutine of ix86_expand_builtin to take care of stores.  */
  
  static rtx
*************** ix86_expand_builtin (tree exp, rtx targe
*** 19995,20000 ****
--- 20835,20846 ----
      if (d->code == fcode)
        return ix86_expand_sse_pcmpistr (d, exp, target);
  
+   for (i = 0, d = bdesc_multi_arg; i < ARRAY_SIZE (bdesc_multi_arg); i++, d++)
+     if (d->code == fcode)
+       return ix86_expand_multi_arg_builtin (d->icode, exp, target,
+ 					    (enum multi_arg_type)d->flag,
+ 					    d->comparison);
+ 
    gcc_unreachable ();
  }
  
*** gcc/config/i386/predicates.md.~1~	2007-09-06 13:52:33.053480000 -0400
--- gcc/config/i386/predicates.md	2007-09-05 18:29:12.383269000 -0400
***************
*** 600,605 ****
--- 600,610 ----
    (and (match_code "const_int")
         (match_test "IN_RANGE (INTVAL (op), 0, 15)")))
  
+ ;; Match 0 to 31.
+ (define_predicate "const_0_to_31_operand"
+   (and (match_code "const_int")
+        (match_test "IN_RANGE (INTVAL (op), 0, 31)")))
+ 
  ;; Match 0 to 63.
  (define_predicate "const_0_to_63_operand"
    (and (match_code "const_int")
*** gcc/config/i386/cpuid.h.~1~	2007-09-06 13:52:33.083459000 -0400
--- gcc/config/i386/cpuid.h	2007-09-06 13:29:00.166796000 -0400
***************
*** 51,56 ****
--- 51,57 ----
  /* %ecx */
  #define bit_LAHF_LM	(1 << 0)
  #define bit_SSE4a	(1 << 6)
+ #define bit_SSE5	(1 << 11)
  
  /* %edx */
  #define bit_LM		(1 << 29)
*** gcc/config/i386/driver-i386.c.~1~	2007-09-06 13:52:33.103459000 -0400
--- gcc/config/i386/driver-i386.c	2007-09-06 13:31:06.913409000 -0400
*************** const char *host_detect_local_cpu (int a
*** 182,188 ****
    unsigned int has_cmpxchg8b, has_cmov, has_mmx, has_sse, has_sse2;
  
    /* Extended features */
!   unsigned int has_lahf_lm = 0, has_sse4a = 0;
    unsigned int has_longmode = 0, has_3dnowp = 0, has_3dnow = 0;
  
    bool arch;
--- 182,188 ----
    unsigned int has_cmpxchg8b, has_cmov, has_mmx, has_sse, has_sse2;
  
    /* Extended features */
!   unsigned int has_lahf_lm = 0, has_sse4a = 0, has_sse5 = 0;
    unsigned int has_longmode = 0, has_3dnowp = 0, has_3dnow = 0;
  
    bool arch;
*************** const char *host_detect_local_cpu (int a
*** 223,228 ****
--- 223,229 ----
  
        has_lahf_lm = ecx & bit_LAHF_LM;
        has_sse4a = ecx & bit_SSE4a;
+       has_sse5 = ecx & bit_SSE5;
  
        has_longmode = edx & bit_LM;
        has_3dnowp = edx & bit_3DNOWP;
*************** const char *host_detect_local_cpu (int a
*** 382,387 ****
--- 383,390 ----
  	options = concat (options, "-mcx16 ", NULL);
        if (has_lahf_lm)
  	options = concat (options, "-msahf ", NULL);
+       if (has_sse5)
+ 	options = concat (options, "-msse5 ", NULL);
      }
  
  done:
*** gcc/config.gcc.~1~	2007-09-06 13:52:33.132461000 -0400
--- gcc/config.gcc	2007-09-06 13:24:12.769094000 -0400
*************** i[34567]86-*-*)
*** 280,292 ****
  	cpu_type=i386
  	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
  		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
! 		       nmmintrin.h"
  	;;
  x86_64-*-*)
  	cpu_type=i386
  	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
  		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
! 		       nmmintrin.h"
  	need_64bit_hwint=yes
  	;;
  ia64-*-*)
--- 280,292 ----
  	cpu_type=i386
  	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
  		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
! 		       nmmintrin.h bmmintrin.h"
  	;;
  x86_64-*-*)
  	cpu_type=i386
  	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
  		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
! 		       nmmintrin.h bmmintrin.h"
  	need_64bit_hwint=yes
  	;;
  ia64-*-*)
*** gcc/doc/extend.texi.~1~	2007-09-06 13:52:33.222459000 -0400
--- gcc/doc/extend.texi	2007-09-06 13:06:56.500902000 -0400
*************** v2di __builtin_ia32_insertq (v2di, v2di)
*** 7755,7760 ****
--- 7755,7972 ----
  v2di __builtin_ia32_insertqi (v2di, v2di, const unsigned int, const unsigned int)
  @end smallexample
  
+ The following built-in functions are available when @option{-msse5} is used.
+ All of them generate the machine instruction that is part of the name
+ with MMX registers.
+ 
+ @smallexample
+ v2df __builtin_ia32_comeqpd (v2df, v2df)
+ v2df __builtin_ia32_comeqps (v2df, v2df)
+ v4sf __builtin_ia32_comeqsd (v4sf, v4sf)
+ v4sf __builtin_ia32_comeqss (v4sf, v4sf)
+ v2df __builtin_ia32_comfalsepd (v2df, v2df)
+ v2df __builtin_ia32_comfalseps (v2df, v2df)
+ v4sf __builtin_ia32_comfalsesd (v4sf, v4sf)
+ v4sf __builtin_ia32_comfalsess (v4sf, v4sf)
+ v2df __builtin_ia32_comgepd (v2df, v2df)
+ v2df __builtin_ia32_comgeps (v2df, v2df)
+ v4sf __builtin_ia32_comgesd (v4sf, v4sf)
+ v4sf __builtin_ia32_comgess (v4sf, v4sf)
+ v2df __builtin_ia32_comgtpd (v2df, v2df)
+ v2df __builtin_ia32_comgtps (v2df, v2df)
+ v4sf __builtin_ia32_comgtsd (v4sf, v4sf)
+ v4sf __builtin_ia32_comgtss (v4sf, v4sf)
+ v2df __builtin_ia32_comlepd (v2df, v2df)
+ v2df __builtin_ia32_comleps (v2df, v2df)
+ v4sf __builtin_ia32_comlesd (v4sf, v4sf)
+ v4sf __builtin_ia32_comless (v4sf, v4sf)
+ v2df __builtin_ia32_comltpd (v2df, v2df)
+ v2df __builtin_ia32_comltps (v2df, v2df)
+ v4sf __builtin_ia32_comltsd (v4sf, v4sf)
+ v4sf __builtin_ia32_comltss (v4sf, v4sf)
+ v2df __builtin_ia32_comnepd (v2df, v2df)
+ v2df __builtin_ia32_comneps (v2df, v2df)
+ v4sf __builtin_ia32_comnesd (v4sf, v4sf)
+ v4sf __builtin_ia32_comness (v4sf, v4sf)
+ v2df __builtin_ia32_comordpd (v2df, v2df)
+ v2df __builtin_ia32_comordps (v2df, v2df)
+ v4sf __builtin_ia32_comordsd (v4sf, v4sf)
+ v4sf __builtin_ia32_comordss (v4sf, v4sf)
+ v2df __builtin_ia32_comtruepd (v2df, v2df)
+ v2df __builtin_ia32_comtrueps (v2df, v2df)
+ v4sf __builtin_ia32_comtruesd (v4sf, v4sf)
+ v4sf __builtin_ia32_comtruess (v4sf, v4sf)
+ v2df __builtin_ia32_comueqpd (v2df, v2df)
+ v2df __builtin_ia32_comueqps (v2df, v2df)
+ v4sf __builtin_ia32_comueqsd (v4sf, v4sf)
+ v4sf __builtin_ia32_comueqss (v4sf, v4sf)
+ v2df __builtin_ia32_comugepd (v2df, v2df)
+ v2df __builtin_ia32_comugeps (v2df, v2df)
+ v4sf __builtin_ia32_comugesd (v4sf, v4sf)
+ v4sf __builtin_ia32_comugess (v4sf, v4sf)
+ v2df __builtin_ia32_comugtpd (v2df, v2df)
+ v2df __builtin_ia32_comugtps (v2df, v2df)
+ v4sf __builtin_ia32_comugtsd (v4sf, v4sf)
+ v4sf __builtin_ia32_comugtss (v4sf, v4sf)
+ v2df __builtin_ia32_comulepd (v2df, v2df)
+ v2df __builtin_ia32_comuleps (v2df, v2df)
+ v4sf __builtin_ia32_comulesd (v4sf, v4sf)
+ v4sf __builtin_ia32_comuless (v4sf, v4sf)
+ v2df __builtin_ia32_comultpd (v2df, v2df)
+ v2df __builtin_ia32_comultps (v2df, v2df)
+ v4sf __builtin_ia32_comultsd (v4sf, v4sf)
+ v4sf __builtin_ia32_comultss (v4sf, v4sf)
+ v2df __builtin_ia32_comunepd (v2df, v2df)
+ v2df __builtin_ia32_comuneps (v2df, v2df)
+ v4sf __builtin_ia32_comunesd (v4sf, v4sf)
+ v4sf __builtin_ia32_comuness (v4sf, v4sf)
+ v2df __builtin_ia32_comunordpd (v2df, v2df)
+ v2df __builtin_ia32_comunordps (v2df, v2df)
+ v4sf __builtin_ia32_comunordsd (v4sf, v4sf)
+ v4sf __builtin_ia32_comunordss (v4sf, v4sf)
+ v2df __builtin_ia32_fmaddpd (v2df, v2df, v2df)
+ v4sf __builtin_ia32_fmaddps (v4sf, v4sf, v4sf)
+ v2df __builtin_ia32_fmaddsd (v2df, v2df, v2df)
+ v4sf __builtin_ia32_fmaddss (v4sf, v4sf, v4sf)
+ v2df __builtin_ia32_fmsubpd (v2df, v2df, v2df)
+ v4sf __builtin_ia32_fmsubps (v4sf, v4sf, v4sf)
+ v2df __builtin_ia32_fmsubsd (v2df, v2df, v2df)
+ v4sf __builtin_ia32_fmsubss (v4sf, v4sf, v4sf)
+ v2df __builtin_ia32_fnmaddpd (v2df, v2df, v2df)
+ v4sf __builtin_ia32_fnmaddps (v4sf, v4sf, v4sf)
+ v2df __builtin_ia32_fnmaddsd (v2df, v2df, v2df)
+ v4sf __builtin_ia32_fnmaddss (v4sf, v4sf, v4sf)
+ v2df __builtin_ia32_fnmsubpd (v2df, v2df, v2df)
+ v4sf __builtin_ia32_fnmsubps (v4sf, v4sf, v4sf)
+ v2df __builtin_ia32_fnmsubsd (v2df, v2df, v2df)
+ v4sf __builtin_ia32_fnmsubss (v4sf, v4sf, v4sf)
+ v2df __builtin_ia32_frczpd (v2df)
+ v4sf __builtin_ia32_frczps (v4sf)
+ v2df __builtin_ia32_frczsd (v2df)
+ v4sf __builtin_ia32_frczss (v4sf)
+ v2di __builtin_ia32_pcmov (v2di, v2di, v2di)
+ v2df __builtin_ia32_pcmov_v2df (v2df, v2df, v2df)
+ v4sf __builtin_ia32_pcmov_v4sf (v4sf, v4sf, v4sf)
+ v2di __builtin_ia32_pcomeqb (v2di, v2di)
+ v2di __builtin_ia32_pcomeqd (v2di, v2di)
+ v2di __builtin_ia32_pcomeqq (v2di, v2di)
+ v2di __builtin_ia32_pcomequb (v2di, v2di)
+ v2di __builtin_ia32_pcomequd (v2di, v2di)
+ v2di __builtin_ia32_pcomequq (v2di, v2di)
+ v2di __builtin_ia32_pcomequw (v2di, v2di)
+ v2di __builtin_ia32_pcomeqw (v2di, v2di)
+ v2di __builtin_ia32_pcomfalseb (v2di, v2di)
+ v2di __builtin_ia32_pcomfalsed (v2di, v2di)
+ v2di __builtin_ia32_pcomfalseq (v2di, v2di)
+ v2di __builtin_ia32_pcomfalseub (v2di, v2di)
+ v2di __builtin_ia32_pcomfalseud (v2di, v2di)
+ v2di __builtin_ia32_pcomfalseuq (v2di, v2di)
+ v2di __builtin_ia32_pcomfalseuw (v2di, v2di)
+ v2di __builtin_ia32_pcomfalsew (v2di, v2di)
+ v2di __builtin_ia32_pcomgeb (v2di, v2di)
+ v2di __builtin_ia32_pcomged (v2di, v2di)
+ v2di __builtin_ia32_pcomgeq (v2di, v2di)
+ v2di __builtin_ia32_pcomgeub (v2di, v2di)
+ v2di __builtin_ia32_pcomgeud (v2di, v2di)
+ v2di __builtin_ia32_pcomgeuq (v2di, v2di)
+ v2di __builtin_ia32_pcomgeuw (v2di, v2di)
+ v2di __builtin_ia32_pcomgew (v2di, v2di)
+ v2di __builtin_ia32_pcomgtb (v2di, v2di)
+ v2di __builtin_ia32_pcomgtd (v2di, v2di)
+ v2di __builtin_ia32_pcomgtq (v2di, v2di)
+ v2di __builtin_ia32_pcomgtub (v2di, v2di)
+ v2di __builtin_ia32_pcomgtud (v2di, v2di)
+ v2di __builtin_ia32_pcomgtuq (v2di, v2di)
+ v2di __builtin_ia32_pcomgtuw (v2di, v2di)
+ v2di __builtin_ia32_pcomgtw (v2di, v2di)
+ v2di __builtin_ia32_pcomleb (v2di, v2di)
+ v2di __builtin_ia32_pcomled (v2di, v2di)
+ v2di __builtin_ia32_pcomleq (v2di, v2di)
+ v2di __builtin_ia32_pcomleub (v2di, v2di)
+ v2di __builtin_ia32_pcomleud (v2di, v2di)
+ v2di __builtin_ia32_pcomleuq (v2di, v2di)
+ v2di __builtin_ia32_pcomleuw (v2di, v2di)
+ v2di __builtin_ia32_pcomlew (v2di, v2di)
+ v2di __builtin_ia32_pcomltb (v2di, v2di)
+ v2di __builtin_ia32_pcomltd (v2di, v2di)
+ v2di __builtin_ia32_pcomltq (v2di, v2di)
+ v2di __builtin_ia32_pcomltub (v2di, v2di)
+ v2di __builtin_ia32_pcomltud (v2di, v2di)
+ v2di __builtin_ia32_pcomltuq (v2di, v2di)
+ v2di __builtin_ia32_pcomltuw (v2di, v2di)
+ v2di __builtin_ia32_pcomltw (v2di, v2di)
+ v2di __builtin_ia32_pcomneb (v2di, v2di)
+ v2di __builtin_ia32_pcomned (v2di, v2di)
+ v2di __builtin_ia32_pcomneq (v2di, v2di)
+ v2di __builtin_ia32_pcomneub (v2di, v2di)
+ v2di __builtin_ia32_pcomneud (v2di, v2di)
+ v2di __builtin_ia32_pcomneuq (v2di, v2di)
+ v2di __builtin_ia32_pcomneuw (v2di, v2di)
+ v2di __builtin_ia32_pcomnew (v2di, v2di)
+ v2di __builtin_ia32_pcomtrueb (v2di, v2di)
+ v2di __builtin_ia32_pcomtrued (v2di, v2di)
+ v2di __builtin_ia32_pcomtrueq (v2di, v2di)
+ v2di __builtin_ia32_pcomtrueub (v2di, v2di)
+ v2di __builtin_ia32_pcomtrueud (v2di, v2di)
+ v2di __builtin_ia32_pcomtrueuq (v2di, v2di)
+ v2di __builtin_ia32_pcomtrueuw (v2di, v2di)
+ v2di __builtin_ia32_pcomtruew (v2di, v2di)
+ v4df __builtin_ia32_permpd (v2df, v2df, v2di)
+ v4sf __builtin_ia32_permps (v4sf, v4sf, v2di)
+ v2di __builtin_ia32_phaddbd (v2di)
+ v2di __builtin_ia32_phaddbq (v2di)
+ v2di __builtin_ia32_phaddbw (v2di)
+ v2di __builtin_ia32_phadddq (v2di)
+ v2di __builtin_ia32_phaddubd (v2di)
+ v2di __builtin_ia32_phaddubq (v2di)
+ v2di __builtin_ia32_phaddubw (v2di)
+ v2di __builtin_ia32_phaddudq (v2di)
+ v2di __builtin_ia32_phadduwd (v2di)
+ v2di __builtin_ia32_phadduwq (v2di)
+ v2di __builtin_ia32_phaddwd (v2di)
+ v2di __builtin_ia32_phaddwq (v2di)
+ v2di __builtin_ia32_phsubbw (v2di)
+ v2di __builtin_ia32_phsubdq (v2di)
+ v2di __builtin_ia32_phsubwd (v2di)
+ v2di __builtin_ia32_pmacsdd (v2di, v2di, v2di)
+ v2di __builtin_ia32_pmacsdqh (v2di, v2di, v2di)
+ v2di __builtin_ia32_pmacsdql (v2di, v2di, v2di)
+ v2di __builtin_ia32_pmacssdd (v2di, v2di, v2di)
+ v2di __builtin_ia32_pmacssdqh (v2di, v2di, v2di)
+ v2di __builtin_ia32_pmacssdql (v2di, v2di, v2di)
+ v2di __builtin_ia32_pmacsswd (v2di, v2di, v2di)
+ v2di __builtin_ia32_pmacssww (v2di, v2di, v2di)
+ v2di __builtin_ia32_pmacswd (v2di, v2di, v2di)
+ v2di __builtin_ia32_pmacsww (v2di, v2di, v2di)
+ v2di __builtin_ia32_pmadcsswd (v2di, v2di, v2di)
+ v2di __builtin_ia32_pmadcswd (v2di, v2di, v2di)
+ v2di __builtin_ia32_pperm (v2di, v2di, v2di)
+ v2di __builtin_ia32_protb (v2di, v2di)
+ v2di __builtin_ia32_protd (v2di, v2di)
+ v2di __builtin_ia32_protq (v2di, v2di)
+ v2di __builtin_ia32_protw (v2di, v2di)
+ v2di __builtin_ia32_pshab (v2di, v2di)
+ v2di __builtin_ia32_pshad (v2di, v2di)
+ v2di __builtin_ia32_pshaq (v2di, v2di)
+ v2di __builtin_ia32_pshaw (v2di, v2di)
+ v2di __builtin_ia32_pshlb (v2di, v2di)
+ v2di __builtin_ia32_pshld (v2di, v2di)
+ v2di __builtin_ia32_pshlq (v2di, v2di)
+ v2di __builtin_ia32_pshlw (v2di, v2di)
+ @end smallexample
+ 
+ The following builtin-in functions are avaialble when @option{-msse5}
+ is used.  The second argument must be an integer constant and generate
+ the machine instruction that is part of the name with the @samp{_imm}
+ suffix removed.
+ 
+ @smallexample
+ v2di __builtin_ia32_protb_imm (v2di, int)
+ v2di __builtin_ia32_protd_imm (v2di, int)
+ v2di __builtin_ia32_protq_imm (v2di, int)
+ v2di __builtin_ia32_protw_imm (v2di, int)
+ @end smallexample
+ 
  The following built-in functions are available when @option{-m3dnow} is used.
  All of them generate the machine instruction that is part of the name.
  
*** gcc/testsuite/gcc.target/i386/i386.exp.~1~	2007-09-06 13:52:33.301459000 -0400
--- gcc/testsuite/gcc.target/i386/i386.exp	2007-09-06 13:46:14.900249000 -0400
*************** proc check_effective_target_sse4a { } {
*** 64,69 ****
--- 64,84 ----
      } "-O2 -msse4a" ]
  }
  
+ # Return 1 if sse5 instructions can be compiled.
+ proc check_effective_target_sse5 { } {
+     return [check_no_compiler_messages sse5 object {
+ 	typedef long long __m128i __attribute__ ((__vector_size__ (16)));
+ 	typedef long long __v2di __attribute__ ((__vector_size__ (16)));
+ 
+ 	__m128i _mm_maccs_epi16(__m128i __A, __m128i __B, __m128i __C)
+ 	{
+ 	    return (__m128i) __builtin_ia32_pmacssww ((__v2di)__A,
+ 						      (__v2di)__B,
+ 						      (__v2di)__C);
+ 	}
+     } "-O2 -msse5" ]
+ }
+ 
  # If a testcase doesn't have special options, use these.
  global DEFAULT_CFLAGS
  if ![info exists DEFAULT_CFLAGS] then {
*** gcc/config/i386/bmmintrin.h.~1~	2007-09-06 13:52:33.318458000 -0400
--- gcc/config/i386/bmmintrin.h	2007-09-06 12:58:13.600914000 -0400
***************
*** 0 ****
--- 1,1361 ----
+ /* Copyright (C) 2007 Free Software Foundation, Inc.
+ 
+    This file is part of GCC.
+ 
+    GCC is free software; you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation; either version 2, or (at your option)
+    any later version.
+ 
+    GCC is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+ 
+    You should have received a copy of the GNU General Public License
+    along with GCC; see the file COPYING.  If not, write to
+    the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+    Boston, MA 02110-1301, USA.  */
+ 
+ /* As a special exception, if you include this header file into source
+    files compiled by GCC, this header file does not by itself cause
+    the resulting executable to be covered by the GNU General Public
+    License.  This exception does not however invalidate any other
+    reasons why the executable file might be covered by the GNU General
+    Public License.  */
+ 
+ #ifndef _BMMINTRIN_H_INCLUDED
+ #define _BMMINTRIN_H_INCLUDED
+ 
+ #ifndef __SSE5__
+ # error "SSE5 instruction set not enabled"
+ #else
+ 
+ /* We need definitions from the SSE4A, SSE3, SSE2 and SSE header files.  */
+ #include <ammintrin.h>
+ 
+ /* Rounding mode macros. */
+ #define _MM_FROUND_TO_NEAREST_INT	0x00
+ #define _MM_FROUND_TO_NEG_INF		0x01
+ #define _MM_FROUND_TO_POS_INF		0x02
+ #define _MM_FROUND_TO_ZERO		0x03
+ #define _MM_FROUND_CUR_DIRECTION	0x04
+ 
+ #define _MM_FROUND_RAISE_EXC		0x00
+ #define _MM_FROUND_NO_EXC		0x08
+ 
+ #define _MM_FROUND_NINT		\
+   (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
+ #define _MM_FROUND_FLOOR	\
+   (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
+ #define _MM_FROUND_CEIL		\
+   (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
+ #define _MM_FROUND_TRUNC	\
+   (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
+ #define _MM_FROUND_RINT		\
+   (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
+ #define _MM_FROUND_NEARBYINT	\
+   (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC)
+ 
+ /* Floating point multiply/add type instructions */
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_macc_ps(__m128 __A, __m128 __B, __m128 __C)
+ {
+   return (__m128) __builtin_ia32_fmaddps ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_macc_pd(__m128d __A, __m128d __B, __m128d __C)
+ {
+   return (__m128d) __builtin_ia32_fmaddpd ((__v2df)__A, (__v2df)__B, (__v2df)__C);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_macc_ss(__m128 __A, __m128 __B, __m128 __C)
+ {
+   return  (__m128) __builtin_ia32_fmaddss ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_macc_sd(__m128d __A, __m128d __B, __m128d __C)
+ {
+   return (__m128d) __builtin_ia32_fmaddsd ((__v2df)__A, (__v2df)__B, (__v2df)__C);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_msub_ps(__m128 __A, __m128 __B, __m128 __C)
+ {
+   return (__m128) __builtin_ia32_fmsubps ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_msub_pd(__m128d __A, __m128d __B, __m128d __C)
+ {
+   return (__m128d) __builtin_ia32_fmsubpd ((__v2df)__A, (__v2df)__B, (__v2df)__C);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_msub_ss(__m128 __A, __m128 __B, __m128 __C)
+ {
+   return (__m128) __builtin_ia32_fmsubss ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_msub_sd(__m128d __A, __m128d __B, __m128d __C)
+ {
+   return (__m128d) __builtin_ia32_fmsubsd ((__v2df)__A, (__v2df)__B, (__v2df)__C);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_nmacc_ps(__m128 __A, __m128 __B, __m128 __C)
+ {
+   return (__m128) __builtin_ia32_fnmaddps ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_nmacc_pd(__m128d __A, __m128d __B, __m128d __C)
+ {
+   return (__m128d) __builtin_ia32_fnmaddpd ((__v2df)__A, (__v2df)__B, (__v2df)__C);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_nmacc_ss(__m128 __A, __m128 __B, __m128 __C)
+ {
+   return (__m128) __builtin_ia32_fnmaddss ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_nmacc_sd(__m128d __A, __m128d __B, __m128d __C)
+ {
+   return (__m128d) __builtin_ia32_fnmaddsd ((__v2df)__A, (__v2df)__B, (__v2df)__C);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_nmsub_ps(__m128 __A, __m128 __B, __m128 __C)
+ {
+   return (__m128) __builtin_ia32_fnmsubps ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_nmsub_pd(__m128d __A, __m128d __B, __m128d __C)
+ {
+   return (__m128d) __builtin_ia32_fnmsubpd ((__v2df)__A, (__v2df)__B, (__v2df)__C);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_nmsub_ss(__m128 __A, __m128 __B, __m128 __C)
+ {
+   return (__m128) __builtin_ia32_fnmsubss ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_nmsub_sd(__m128d __A, __m128d __B, __m128d __C)
+ {
+   return (__m128d) __builtin_ia32_fnmsubsd ((__v2df)__A, (__v2df)__B, (__v2df)__C);
+ }
+ 
+ /* Integer multiply/add intructions. */
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_maccs_epi16(__m128i __A, __m128i __B, __m128i __C)
+ {
+   return (__m128i) __builtin_ia32_pmacssww ((__v2di)__A,(__v2di)__B, (__v2di)__C);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_macc_epi16(__m128i __A, __m128i __B, __m128i __C)
+ {
+   return (__m128i) __builtin_ia32_pmacsww ((__v2di)__A,(__v2di)__B, (__v2di)__C);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_maccsd_epi16(__m128i __A, __m128i __B, __m128i __C)
+ {
+   return  (__m128i) __builtin_ia32_pmacsswd ((__v2di)__A,(__v2di)__B, (__v2di)__C);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_maccd_epi16(__m128i __A, __m128i __B, __m128i __C)
+ {
+   return  (__m128i) __builtin_ia32_pmacswd ((__v2di)__A,(__v2di)__B, (__v2di)__C); 
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_maccs_epi32(__m128i __A, __m128i __B, __m128i __C)
+ {
+   return  (__m128i) __builtin_ia32_pmacssdd ((__v2di)__A,(__v2di)__B, (__v2di)__C);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_macc_epi32(__m128i __A, __m128i __B, __m128i __C)
+ {
+   return  (__m128i) __builtin_ia32_pmacsdd ((__v2di)__A,(__v2di)__B, (__v2di)__C);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_maccslo_epi32(__m128i __A, __m128i __B, __m128i __C)
+ {
+   return  (__m128i) __builtin_ia32_pmacssdql ((__v2di)__A,(__v2di)__B, (__v2di)__C);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_macclo_epi32(__m128i __A, __m128i __B, __m128i __C)
+ {
+   return  (__m128i) __builtin_ia32_pmacsdql ((__v2di)__A,(__v2di)__B, (__v2di)__C); 
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_maccshi_epi32(__m128i __A, __m128i __B, __m128i __C)
+ {
+   return  (__m128i) __builtin_ia32_pmacssdqh ((__v2di)__A,(__v2di)__B, (__v2di)__C); 
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_macchi_epi32(__m128i __A, __m128i __B, __m128i __C)
+ {
+   return  (__m128i) __builtin_ia32_pmacsdqh ((__v2di)__A,(__v2di)__B, (__v2di)__C); 
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_maddsd_epi16(__m128i __A, __m128i __B, __m128i __C)
+ {
+   return  (__m128i) __builtin_ia32_pmadcsswd ((__v2di)__A,(__v2di)__B,(__v2di)__C);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_maddd_epi16(__m128i __A, __m128i __B, __m128i __C)
+ {
+   return  (__m128i) __builtin_ia32_pmadcswd ((__v2di)__A,(__v2di)__B,(__v2di)__C);
+ }
+ 
+ /* Packed Integer Horizontal Add and Subtract */
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_haddw_epi8(__m128i __A)
+ {
+   return  (__m128i) __builtin_ia32_phaddbw ((__v2di)__A);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_haddd_epi8(__m128i __A)
+ {
+   return  (__m128i) __builtin_ia32_phaddbd ((__v2di)__A);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_haddq_epi8(__m128i __A)
+ {
+   return  (__m128i) __builtin_ia32_phaddbq ((__v2di)__A);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_haddd_epi16(__m128i __A)
+ {
+   return  (__m128i) __builtin_ia32_phaddwd ((__v2di)__A);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_haddq_epi16(__m128i __A)
+ {
+   return  (__m128i) __builtin_ia32_phaddwq ((__v2di)__A);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_haddq_epi32(__m128i __A)
+ {
+   return  (__m128i) __builtin_ia32_phadddq ((__v2di)__A);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_haddw_epu8(__m128i __A)
+ {
+   return  (__m128i) __builtin_ia32_phaddubw ((__v2di)__A);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_haddd_epu8(__m128i __A)
+ {
+   return  (__m128i) __builtin_ia32_phaddubd ((__v2di)__A);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_haddq_epu8(__m128i __A)
+ {
+   return  (__m128i) __builtin_ia32_phaddubq ((__v2di)__A);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_haddd_epu16(__m128i __A)
+ {
+   return  (__m128i) __builtin_ia32_phadduwd ((__v2di)__A);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_haddq_epu16(__m128i __A)
+ {
+   return  (__m128i) __builtin_ia32_phadduwq ((__v2di)__A);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_haddq_epu32(__m128i __A)
+ {
+   return  (__m128i) __builtin_ia32_phaddudq ((__v2di)__A);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_hsubw_epi8(__m128i __A)
+ {
+   return  (__m128i) __builtin_ia32_phsubbw ((__v2di)__A);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_hsubd_epi16(__m128i __A)
+ {
+   return  (__m128i) __builtin_ia32_phsubwd ((__v2di)__A);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_hsubq_epi32(__m128i __A)
+ {
+   return  (__m128i) __builtin_ia32_phsubdq ((__v2di)__A);
+ }
+ 
+ /* Vector conditional move and permute */
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_cmov_si128(__m128i __A, __m128i __B, __m128i __C)
+ {
+   return  (__m128i) __builtin_ia32_pcmov (__A, __B, __C);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_perm_epi8(__m128i __A, __m128i __B, __m128i __C)
+ {
+   return  (__m128i) __builtin_ia32_pperm (__A, __B, __C);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_perm_ps(__m128 __A, __m128 __B, __m128i __C)
+ {
+   return  (__m128) __builtin_ia32_permps ((__v4sf)__A, (__v4sf)__B, (__v2di)__C);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_perm_pd(__m128d __A, __m128d __B, __m128i __C)
+ {
+   return  (__m128d) __builtin_ia32_permpd ((__v2df)__A, (__v2df)__B, (__v2di)__C);
+ }
+ 
+ /* Packed Integer Rotates and Shifts */
+ 
+ /* Rotates - Non-Immediate form */
+ static __inline __m128i __attribute__((__always_inline__)) 
+ _mm_rot_epi8(__m128i __A,  __m128i __B)
+ {
+   return  (__m128i) __builtin_ia32_protb ((__v2di)__A, (__v2di)__B);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__)) 
+ _mm_rot_epi16(__m128i __A,  __m128i __B)
+ {
+   return  (__m128i) __builtin_ia32_protw ((__v2di)__A, (__v2di)__B);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__)) 
+ _mm_rot_epi32(__m128i __A,  __m128i __B)
+ {
+   return  (__m128i) __builtin_ia32_protd ((__v2di)__A, (__v2di)__B);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__)) 
+ _mm_rot_epi64(__m128i __A,  __m128i __B)
+ {
+   return (__m128i)  __builtin_ia32_protq ((__v2di)__A, (__v2di)__B);
+ }
+ 
+ 
+ /* Rotates - Immediate form */
+ static __inline __m128i __attribute__((__always_inline__)) 
+ _mm_roti_epi8(__m128i __A,  int __B)
+ {
+   return  (__m128i) __builtin_ia32_protbi ((__v2di)__A, __B);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__)) 
+ _mm_roti_epi16(__m128i __A, int __B)
+ {
+   return  (__m128i) __builtin_ia32_protwi ((__v2di)__A, __B);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__)) 
+ _mm_roti_epi32(__m128i __A, int __B)
+ {
+   return  (__m128i) __builtin_ia32_protdi ((__v2di)__A, __B);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__)) 
+ _mm_roti_epi64(__m128i __A, int __B)
+ {
+   return  (__m128i) __builtin_ia32_protqi ((__v2di)__A, __B);
+ }
+ 
+ /* pshl */
+ 
+ static __inline __m128i __attribute__((__always_inline__)) 
+ _mm_shl_epi8(__m128i __A,  __m128i __B)
+ {
+   return  (__m128i) __builtin_ia32_pshlb ((__v2di)__A, (__v2di)__B);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__)) 
+ _mm_shl_epi16(__m128i __A,  __m128i __B)
+ {
+   return  (__m128i) __builtin_ia32_pshlw ((__v2di)__A, (__v2di)__B);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__)) 
+ _mm_shl_epi32(__m128i __A,  __m128i __B)
+ {
+   return  (__m128i) __builtin_ia32_pshld ((__v2di)__A, (__v2di)__B);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__)) 
+ _mm_shl_epi64(__m128i __A,  __m128i __B)
+ {
+   return  (__m128i) __builtin_ia32_pshlq ((__v2di)__A, (__v2di)__B);
+ }
+ 
+ /* psha */
+ static __inline __m128i __attribute__((__always_inline__)) 
+ _mm_sha_epi8(__m128i __A,  __m128i __B)
+ {
+   return  (__m128i) __builtin_ia32_pshab ((__v2di)__A, (__v2di)__B);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__)) 
+ _mm_sha_epi16(__m128i __A,  __m128i __B)
+ {
+   return  (__m128i) __builtin_ia32_pshaw ((__v2di)__A, (__v2di)__B);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__)) 
+ _mm_sha_epi32(__m128i __A,  __m128i __B)
+ {
+   return  (__m128i) __builtin_ia32_pshad ((__v2di)__A, (__v2di)__B);
+ }
+ 
+ static __inline __m128i __attribute__((__always_inline__)) 
+ _mm_sha_epi64(__m128i __A,  __m128i __B)
+ {
+   return  (__m128i) __builtin_ia32_pshaq ((__v2di)__A, (__v2di)__B);
+ }
+ 
+ /* Compare and Predicate Generation */
+ 
+ /* com (floating point, packed single) */
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_comeq_ps(__m128 __A, __m128 __B)
+ {
+   return  (__m128) __builtin_ia32_comeqps ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_comlt_ps(__m128 __A, __m128 __B)
+ {
+   return  (__m128) __builtin_ia32_comltps ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_comle_ps(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comleps ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_comunord_ps(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comunordps ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_comneq_ps(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comuneqps ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__)) 
+ _mm_comnlt_ps(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comunltps ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__)) 
+ _mm_comnle_ps(__m128 __A, __m128 __B) 
+ {
+   return (__m128)  __builtin_ia32_comunleps ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_comord_ps(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comordps ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_comueq_ps(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comueqps ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__)) 
+ _mm_comnge_ps(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comungeps ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__)) 
+ _mm_comngt_ps(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comungtps ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__)) 
+ _mm_comfalse_ps(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comfalseps ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__)) 
+ _mm_comoneq_ps(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comneqps ((__v4sf)__A, (__v4sf)__B); 
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__)) 
+ _mm_comge_ps(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comgeps ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__)) 
+ _mm_comgt_ps(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comgtps ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__)) 
+ _mm_comtrue_ps(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comtrueps ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ /* com (floating point, packed double) */
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_comeq_pd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comeqpd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_comlt_pd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comltpd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_comle_pd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comlepd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_comunord_pd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comunordpd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_comneq_pd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comuneqpd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__)) 
+ _mm_comnlt_pd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comunltpd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__)) 
+ _mm_comnle_pd(__m128d __A, __m128d __B) 
+ {
+   return (__m128d) __builtin_ia32_comunlepd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_comord_pd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comordpd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_comueq_pd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comueqpd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__)) 
+ _mm_comnge_pd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comungepd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__)) 
+ _mm_comngt_pd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comungtpd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__)) 
+ _mm_comfalse_pd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comfalsepd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_comoneq_pd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comneqpd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__)) 
+ _mm_comge_pd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comgepd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__)) 
+ _mm_comgt_pd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comgtpd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__)) 
+ _mm_comtrue_pd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comtruepd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ /* com (floating point, scalar single) */
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_comeq_ss(__m128 __A, __m128 __B)
+ {
+   return (__m128)  __builtin_ia32_comeqss ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_comlt_ss(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comltss ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_comle_ss(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comless ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_comunord_ss(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comunordss ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_comneq_ss(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comuneqss ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__)) 
+ _mm_comnlt_ss(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comunltss ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__)) 
+ _mm_comnle_ss(__m128 __A, __m128 __B) 
+ {
+   return (__m128) __builtin_ia32_comunless ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_comord_ss(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comordss ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_comueq_ss(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comueqss ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__)) 
+ _mm_comnge_ss(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comungess ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__)) 
+ _mm_comngt_ss(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comungtss ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__)) 
+ _mm_comfalse_ss(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comfalsess ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__)) 
+ _mm_comoneq_ss(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comneqss ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__)) 
+ _mm_comge_ss(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comgess ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__)) 
+ _mm_comgt_ss(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comgtss ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__)) 
+ _mm_comtrue_ss(__m128 __A, __m128 __B)
+ {
+   return (__m128) __builtin_ia32_comtruess ((__v4sf)__A, (__v4sf)__B);
+ }
+ 
+ /* com (floating point, scalar double) */
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_comeq_sd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comeqsd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_comlt_sd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comltsd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_comle_sd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comlesd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_comunord_sd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comunordsd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_comneq_sd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comuneqsd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__)) 
+ _mm_comnlt_sd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comunltsd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__)) 
+ _mm_comnle_sd(__m128d __A, __m128d __B) 
+ {
+   return (__m128d) __builtin_ia32_comunlesd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_comord_sd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comordsd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_comueq_sd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comueqsd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__)) 
+ _mm_comnge_sd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comungesd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__)) 
+ _mm_comngt_sd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comungtsd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__)) 
+ _mm_comfalse_sd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comfalsesd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_comoneq_sd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comneqsd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__)) 
+ _mm_comge_sd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comgesd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__)) 
+ _mm_comgt_sd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comgtsd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__)) 
+ _mm_comtrue_sd(__m128d __A, __m128d __B)
+ {
+   return (__m128d) __builtin_ia32_comtruesd ((__v2df)__A, (__v2df)__B);
+ }
+ 
+ 
+ /*pcom (integer, unsinged bytes) */
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comlt_epu8(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomltub ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comle_epu8(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomleub ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comgt_epu8(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomgtub ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comge_epu8(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomgeub ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comeq_epu8(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomequb ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comneq_epu8(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomnequb ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comfalse_epu8(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomfalseub ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comtrue_epu8(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomtrueub ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ /*pcom (integer, unsinged words) */
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comlt_epu16(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomltuw ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comle_epu16(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomleuw ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comgt_epu16(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomgtuw ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comge_epu16(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomgeuw ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comeq_epu16(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomequw ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comneq_epu16(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomnequw ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comfalse_epu16(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomfalseuw ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comtrue_epu16(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomtrueuw ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ /*pcom (integer, unsinged double words) */
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comlt_epu32(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomltud ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comle_epu32(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomleud ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comgt_epu32(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomgtud ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comge_epu32(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomgeud ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comeq_epu32(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomequd ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comneq_epu32(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomnequd ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comfalse_epu32(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomfalseud ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comtrue_epu32(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomtrueud ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ /*pcom (integer, unsinged quad words) */
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comlt_epu64(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomltuq ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comle_epu64(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomleuq ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comgt_epu64(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomgtuq ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comge_epu64(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomgeuq ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comeq_epu64(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomequq ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comneq_epu64(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomnequq ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comfalse_epu64(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomfalseuq ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comtrue_epu64(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomtrueuq ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ /*pcom (integer, signed bytes) */
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comlt_epi8(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomltb ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comle_epi8(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomleb ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comgt_epi8(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomgtb ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comge_epi8(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomgeb ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comeq_epi8(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomeqb ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comneq_epi8(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomneqb ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comfalse_epi8(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomfalseb ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comtrue_epi8(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomtrueb ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ /*pcom (integer, signed words) */
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comlt_epi16(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomltw ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comle_epi16(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomlew ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comgt_epi16(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomgtw ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comge_epi16(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomgew ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comeq_epi16(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomeqw ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comneq_epi16(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomneqw ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comfalse_epi16(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomfalsew ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comtrue_epi16(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomtruew ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ /*pcom (integer, signed double words) */
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comlt_epi32(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomltd ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comle_epi32(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomled ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comgt_epi32(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomgtd ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comge_epi32(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomged ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comeq_epi32(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomeqd ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comneq_epi32(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomneqd ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comfalse_epi32(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomfalsed ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comtrue_epi32(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomtrued ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ /*pcom (integer, signed quad words) */
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comlt_epi64(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomltq ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comle_epi64(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomleq ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comgt_epi64(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomgtq ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comge_epi64(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomgeq ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comeq_epi64(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomeqq ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comneq_epi64(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomneqq ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comfalse_epi64(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomfalseq ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ static __inline __m128i __attribute__((__always_inline__))
+ _mm_comtrue_epi64(__m128i __A, __m128i __B)
+ {
+   return (__m128i) __builtin_ia32_pcomtrueq ((__v2di)__A, (__v2di)__B);
+ } 
+ 
+ /* Test Instruction */ /* Already there in smmintrin.h */ 
+ /* Packed integer 128-bit bitwise comparison. Return 1 if
+    (__V & __M) == 0.  */
+ static __inline int __attribute__((__always_inline__))
+ _mm_testz_si128 (__m128i __M, __m128i __V)
+ {
+   return __builtin_ia32_ptestz128 ((__v2di)__M, (__v2di)__V);
+ }
+ 
+ /* Packed integer 128-bit bitwise comparison. Return 1 if
+    (__V & ~__M) == 0.  */
+ static __inline int __attribute__((__always_inline__))
+ _mm_testc_si128 (__m128i __M, __m128i __V)
+ {
+   return __builtin_ia32_ptestc128 ((__v2di)__M, (__v2di)__V);
+ }
+ 
+ /* Packed integer 128-bit bitwise comparison. Return 1 if
+    (__V & __M) != 0 && (__V & ~__M) != 0.  */
+ static __inline int __attribute__((__always_inline__))
+ _mm_testnzc_si128 (__m128i __M, __m128i __V)
+ {
+   return __builtin_ia32_ptestnzc128 ((__v2di)__M, (__v2di)__V);
+ }
+ 
+ 
+ /* Packed/scalar double precision floating point rounding.  */
+ 
+ #ifdef __OPTIMIZE__
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_round_pd (__m128d __V, const int __M)
+ {
+   return (__m128d) __builtin_ia32_roundpd ((__v2df)__V, __M);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_round_sd(__m128d __D, __m128d __V, const int __M)
+ {
+   return (__m128d) __builtin_ia32_roundsd ((__v2df)__D,
+ 					   (__v2df)__V,
+ 					   __M);
+ }
+ #else
+ #define _mm_round_pd(V, M) \
+   ((__m128d) __builtin_ia32_roundpd ((__v2df)(V), (M)))
+ 
+ #define _mm_round_sd(D, V, M) \
+   ((__m128d) __builtin_ia32_roundsd ((__v2df)(D), (__v2df)(V), (M)))
+ #endif
+ 
+ /* Packed/scalar single precision floating point rounding.  */
+ 
+ #ifdef __OPTIMIZE__
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_round_ps (__m128 __V, const int __M)
+ {
+   return (__m128) __builtin_ia32_roundps ((__v4sf)__V, __M);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_round_ss (__m128 __D, __m128 __V, const int __M)
+ {
+   return (__m128) __builtin_ia32_roundss ((__v4sf)__D,
+ 					  (__v4sf)__V,
+ 					  __M);
+ }
+ #else
+ #define _mm_round_ps(V, M) \
+   ((__m128) __builtin_ia32_roundps ((__v4sf)(V), (M)))
+ 
+ #define _mm_round_ss(D, V, M) \
+   ((__m128) __builtin_ia32_roundss ((__v4sf)(D), (__v4sf)(V), (M)))
+ #endif
+ 
+ /* Macros for ceil/floor intrinsics.  */
+ #define _mm_ceil_pd(V)	   _mm_round_pd ((V), _MM_FROUND_CEIL)
+ #define _mm_ceil_sd(D, V)  _mm_round_sd ((D), (V), _MM_FROUND_CEIL)
+ 
+ #define _mm_floor_pd(V)	   _mm_round_pd((V), _MM_FROUND_FLOOR)
+ #define _mm_floor_sd(D, V) _mm_round_sd ((D), (V), _MM_FROUND_FLOOR)
+ 
+ #define _mm_ceil_ps(V)	   _mm_round_ps ((V), _MM_FROUND_CEIL)
+ #define _mm_ceil_ss(D, V)  _mm_round_ss ((D), (V), _MM_FROUND_CEIL)
+ 
+ #define _mm_floor_ps(V)	   _mm_round_ps ((V), _MM_FROUND_FLOOR)
+ #define _mm_floor_ss(D, V) _mm_round_ss ((D), (V), _MM_FROUND_FLOOR)
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_frcz_ps (__m128 __A)
+ {
+   return (__m128) __builtin_ia32_frczps ((__v4sf)__A);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_frcz_pd (__m128d __A)
+ {
+   return (__m128d) __builtin_ia32_frczpd ((__v2df)__A);
+ }
+ 
+ static __inline __m128 __attribute__((__always_inline__))
+ _mm_frcz_ss (__m128 __A)
+ {
+   return (__m128) __builtin_ia32_frczss ((__v4sf)__A);
+ }
+ 
+ static __inline __m128d __attribute__((__always_inline__))
+ _mm_frcz_sd (__m128d __A)
+ {
+   return (__m128d) __builtin_ia32_frczsd ((__v2df)__A);
+ }
+ 
+ #endif /* __SSE5__ */
+ 
+ #endif /* _BMMINTRIN_H_INCLUDED */
*** gcc/testsuite/gcc.target/i386/sse5-check.h.~1~	2007-09-06 13:52:33.340458000 -0400
--- gcc/testsuite/gcc.target/i386/sse5-check.h	2007-09-06 13:49:01.491057000 -0400
***************
*** 0 ****
--- 1,20 ----
+ #include <stdlib.h>
+ 
+ #include "cpuid.h"
+ 
+ static void sse5_test (void);
+ 
+ int
+ main ()
+ {
+   unsigned int eax, ebx, ecx, edx;
+  
+   if (!__get_cpuid (0x80000001, &eax, &ebx, &ecx, &edx))
+     return 0;
+ 
+   /* Run SSE5 test only if host has SSE5 support.  */
+   if (ecx & bit_SSE5)
+     sse5_test ();
+ 
+   exit (0);
+ }
*** gcc/testsuite/gcc.target/i386/sse5-fma.c.~1~	2007-09-06 13:52:33.353458000 -0400
--- gcc/testsuite/gcc.target/i386/sse5-fma.c	2007-09-06 13:44:03.313209000 -0400
***************
*** 0 ****
--- 1,81 ----
+ /* Test that the compiler properly optimizes floating point multiply and add
+    instructions into fmaddss on SSE5 systems.  */
+ 
+ /* { dg-do compile { target x86_64-*-*} } */
+ /* { dg-options "-O2 -msse5 -mfused-madd" } */
+ 
+ extern void exit (int);
+ 
+ float
+ flt_mul_add (float a, float b, float c)
+ {
+   return (a * b) + c;
+ }
+ 
+ double
+ dbl_mul_add (double a, double b, double c)
+ {
+   return (a * b) + c;
+ }
+ 
+ float
+ flt_mul_sub (float a, float b, float c)
+ {
+   return (a * b) - c;
+ }
+ 
+ double
+ dbl_mul_sub (double a, double b, double c)
+ {
+   return (a * b) - c;
+ }
+ 
+ float
+ flt_neg_mul_add (float a, float b, float c)
+ {
+   return (-(a * b)) + c;
+ }
+ 
+ double
+ dbl_neg_mul_add (double a, double b, double c)
+ {
+   return (-(a * b)) + c;
+ }
+ 
+ float
+ flt_neg_mul_sub (float a, float b, float c)
+ {
+   return (-(a * b)) - c;
+ }
+ 
+ double
+ dbl_neg_mul_sub (double a, double b, double c)
+ {
+   return (-(a * b)) - c;
+ }
+ 
+ float  f[10] = { 2, 3, 4 };
+ double d[10] = { 2, 3, 4 };
+ 
+ int main ()
+ {
+   f[3] = flt_mul_add (f[0], f[1], f[2]);
+   f[4] = flt_mul_sub (f[0], f[1], f[2]);
+   f[5] = flt_neg_mul_add (f[0], f[1], f[2]);
+   f[6] = flt_neg_mul_sub (f[0], f[1], f[2]);
+ 
+   d[3] = dbl_mul_add (d[0], d[1], d[2]);
+   d[4] = dbl_mul_sub (d[0], d[1], d[2]);
+   d[5] = dbl_neg_mul_add (d[0], d[1], d[2]);
+   d[6] = dbl_neg_mul_sub (d[0], d[1], d[2]);
+   exit (0);
+ }
+ 
+ /* { dg-final { scan-assembler "fmaddss" } } */
+ /* { dg-final { scan-assembler "fmaddsd" } } */
+ /* { dg-final { scan-assembler "fmsubss" } } */
+ /* { dg-final { scan-assembler "fmsubsd" } } */
+ /* { dg-final { scan-assembler "fnmaddss" } } */
+ /* { dg-final { scan-assembler "fnmaddsd" } } */
+ /* { dg-final { scan-assembler "fnmsubss" } } */
+ /* { dg-final { scan-assembler "fnmsubsd" } } */
*** gcc/testsuite/gcc.target/i386/sse5-fma-vector.c.~1~	2007-09-06 13:52:33.367458000 -0400
--- gcc/testsuite/gcc.target/i386/sse5-fma-vector.c	2007-09-06 13:44:03.330228000 -0400
***************
*** 0 ****
--- 1,92 ----
+ /* Test that the compiler properly optimizes floating point multiply and add
+    instructions vector into fmaddps on SSE5 systems.  */
+ 
+ /* { dg-do compile { target x86_64-*-*} } */
+ /* { dg-options "-O2 -msse5 -mfused-madd -ftree-vectorize" } */
+ 
+ extern void exit (int);
+ 
+ typedef float     __m128  __attribute__ ((__vector_size__ (16), __may_alias__));
+ typedef double    __m128d __attribute__ ((__vector_size__ (16), __may_alias__));
+ 
+ #define SIZE 10240
+ 
+ union {
+   __m128 f_align;
+   __m128d d_align;
+   float f[SIZE];
+   double d[SIZE];
+ } a, b, c, d;
+ 
+ void
+ flt_mul_add (void)
+ {
+   int i;
+ 
+   for (i = 0; i < SIZE; i++)
+     a.f[i] = (b.f[i] * c.f[i]) + d.f[i];
+ }
+ 
+ void
+ dbl_mul_add (void)
+ {
+   int i;
+ 
+   for (i = 0; i < SIZE; i++)
+     a.d[i] = (b.d[i] * c.d[i]) + d.d[i];
+ }
+ 
+ void
+ flt_mul_sub (void)
+ {
+   int i;
+ 
+   for (i = 0; i < SIZE; i++)
+     a.f[i] = (b.f[i] * c.f[i]) - d.f[i];
+ }
+ 
+ void
+ dbl_mul_sub (void)
+ {
+   int i;
+ 
+   for (i = 0; i < SIZE; i++)
+     a.d[i] = (b.d[i] * c.d[i]) - d.d[i];
+ }
+ 
+ void
+ flt_neg_mul_add (void)
+ {
+   int i;
+ 
+   for (i = 0; i < SIZE; i++)
+     a.f[i] = (-(b.f[i] * c.f[i])) + d.f[i];
+ }
+ 
+ void
+ dbl_neg_mul_add (void)
+ {
+   int i;
+ 
+   for (i = 0; i < SIZE; i++)
+     a.d[i] = (-(b.d[i] * c.d[i])) + d.d[i];
+ }
+ 
+ int main ()
+ {
+   flt_mul_add ();
+   flt_mul_sub ();
+   flt_neg_mul_add ();
+ 
+   dbl_mul_add ();
+   dbl_mul_sub ();
+   dbl_neg_mul_add ();
+   exit (0);
+ }
+ 
+ /* { dg-final { scan-assembler "fmaddps" } } */
+ /* { dg-final { scan-assembler "fmaddpd" } } */
+ /* { dg-final { scan-assembler "fmsubps" } } */
+ /* { dg-final { scan-assembler "fmsubpd" } } */
+ /* { dg-final { scan-assembler "fnmaddps" } } */
+ /* { dg-final { scan-assembler "fnmaddpd" } } */
*** gcc/testsuite/gcc.target/i386/sse5-hadduX.c.~1~	2007-09-06 13:52:33.379459000 -0400
--- gcc/testsuite/gcc.target/i386/sse5-hadduX.c	2007-09-06 13:44:03.348245000 -0400
***************
*** 0 ****
--- 1,207 ----
+ /* { dg-do run { target i?86-*-* x86_64-*-* } } */
+ /* { dg-require-effective-target sse5 } */
+ /* { dg-options "-O2 -msse5" } */
+ 
+ #include "sse5-check.h"
+ 
+ #include <bmmintrin.h>
+ #include <string.h>
+ 
+ #define NUM 10
+ 
+ union
+ {
+   __m128i x[NUM];
+   unsigned char  ssi[NUM * 16];
+   unsigned short si[NUM * 8];
+   unsigned int li[NUM * 4];
+   unsigned long long  lli[NUM * 2];
+ } dst, res, src1;
+ 
+ static void
+ init_byte ()
+ {
+   int i;
+   for (i=0; i < NUM * 16; i++)
+     src1.ssi[i] = i;
+ }
+ 
+ static void
+ init_word ()
+ {
+   int i;
+   for (i=0; i < NUM * 8; i++)
+     src1.si[i] = i;
+ }
+ 
+ 
+ static void
+ init_dword ()
+ {
+   int i;
+   for (i=0; i < NUM * 4; i++)
+     src1.li[i] = i;
+ }
+ 
+ static int 
+ check_byte2word ()
+ {
+   int i, j, s, t, check_fails = 0;
+   for (i = 0; i < NUM * 16; i = i + 16)
+     {
+       for (j = 0; j < 8; j++)
+ 	{
+ 	  t = i + (2 * j);
+ 	  s = (i / 2) + j;
+ 	  res.si[s] = src1.ssi[t] + src1.ssi[t + 1] ;
+ 	  if (res.si[s] != dst.si[s]) 
+ 	    check_fails++;	
+ 	}
+     }
+ }
+ 
+ static int 
+ check_byte2dword ()
+ {
+   int i, j, s, t, check_fails = 0;
+   for (i = 0; i < NUM * 16; i = i + 16)
+     {
+       for (j = 0; j < 4; j++)
+ 	{
+ 	  t = i + (4 * j);
+ 	  s = (i / 4) + j;
+ 	  res.li[s] = (src1.ssi[t] + src1.ssi[t + 1]) + (src1.ssi[t + 2]
+ 	              + src1.ssi[t + 3]); 
+ 	  if (res.li[s] != dst.li[s]) 
+ 	    check_fails++;
+ 	}
+     }
+   return check_fails++;
+ }
+ 
+ static int
+ check_byte2qword ()
+ {
+   int i, j, s, t, check_fails = 0;
+   for (i = 0; i < NUM * 16; i = i + 16)
+     {
+       for (j = 0; j < 2; j++)
+ 	{
+ 	  t = i + (8 * j);
+ 	  s = (i / 8) + j;
+ 	  res.lli[s] = ((src1.ssi[t] + src1.ssi[t + 1]) + (src1.ssi[t + 2] 
+ 		       + src1.ssi[t + 3])) + ((src1.ssi[t + 4] + src1.ssi[t +5])
+ 	               + (src1.ssi[t + 6] + src1.ssi[t + 7])); 
+ 	  if (res.lli[s] != dst.lli[s]) 
+ 	    check_fails++;
+ 	}
+     }
+   return check_fails++;
+ }
+ 
+ static int
+ check_word2dword ()
+ {
+   int i, j, s, t, check_fails = 0;
+   for (i = 0; i < (NUM * 8); i = i + 8)
+     {
+       for (j = 0; j < 4; j++)
+ 	{
+ 	  t = i + (2 * j);
+ 	  s = (i / 2) + j;
+ 	  res.li[s] = src1.si[t] + src1.si[t + 1] ;
+ 	  if (res.li[s] != dst.li[s]) 
+ 	    check_fails++;	
+ 	}
+     }
+ }
+ 
+ static int 
+ check_word2qword ()
+ {
+   int i, j, s, t, check_fails = 0;
+   for (i = 0; i < NUM * 8; i = i + 8)
+     {
+       for (j = 0; j < 2; j++)
+ 	{
+ 	  t = i + (4 * j);
+ 	  s = (i / 4) + j;
+ 	  res.lli[s] = (src1.si[t] + src1.si[t + 1]) + (src1.si[t + 2]
+ 	               + src1.si[t + 3]); 
+ 	  if (res.lli[s] != dst.lli[s]) 
+ 	    check_fails++;
+ 	}
+     }
+   return check_fails++;
+ }
+ 
+ static int
+ check_dword2qword ()
+ {
+   int i, j, s, t, check_fails = 0;
+   for (i = 0; i < (NUM * 4); i = i + 4)
+     {
+       for (j = 0; j < 2; j++)
+ 	{
+ 	  t = i + (2 * j);
+ 	  s = (i / 2) + j;
+ 	  res.lli[s] = src1.li[t] + src1.li[t + 1] ;
+ 	  if (res.lli[s] != dst.lli[s]) 
+ 	    check_fails++;	
+ 	}
+     }
+ }
+ 
+ static void
+ sse5_test (void)
+ {
+   int i;
+   
+   /* Check haddubw */
+   init_byte ();
+   
+   for (i = 0; i < NUM; i++)
+     dst.x[i] = _mm_haddw_epu8 (src1.x[i]);
+   
+   if (check_byte2word())
+   abort ();
+   
+   /* Check haddubd */
+   for (i = 0; i < (NUM ); i++)
+     dst.x[i] = _mm_haddd_epu8 (src1.x[i]);
+   
+   if (check_byte2dword())
+     abort (); 
+   
+   /* Check haddubq */
+   for (i = 0; i < NUM; i++)
+     dst.x[i] = _mm_haddq_epu8 (src1.x[i]);
+   
+   if (check_byte2qword())
+     abort ();
+ 
+   /* Check hadduwd */
+   init_word ();
+ 
+   for (i = 0; i < (NUM ); i++)
+     dst.x[i] = _mm_haddd_epu16 (src1.x[i]);
+   
+   if (check_word2dword())
+     abort (); 
+    
+   /* Check haddbuwq */
+  
+   for (i = 0; i < NUM; i++)
+     dst.x[i] = _mm_haddq_epu16 (src1.x[i]);
+   
+   if (check_word2qword())
+     abort ();
+  
+   /* Check hadudq */
+   init_dword ();
+     for (i = 0; i < NUM; i++)
+     dst.x[i] = _mm_haddq_epu32 (src1.x[i]);
+   
+   if (check_dword2qword())
+     abort ();
+ }
*** gcc/testsuite/gcc.target/i386/sse5-haddX.c.~1~	2007-09-06 13:52:33.391458000 -0400
--- gcc/testsuite/gcc.target/i386/sse5-haddX.c	2007-09-06 13:44:03.366265000 -0400
***************
*** 0 ****
--- 1,208 ----
+ /* { dg-do run { target i?86-*-* x86_64-*-* } } */
+ /* { dg-require-effective-target sse5 } */
+ /* { dg-options "-O2 -msse5" } */
+ 
+ #include "sse5-check.h"
+ 
+ #include <bmmintrin.h>
+ #include <string.h>
+ 
+ #define NUM 10
+ 
+ union
+ {
+   __m128i x[NUM];
+   int8_t ssi[NUM * 16];
+   int16_t si[NUM * 8];
+   int32_t li[NUM * 4];
+   int64_t lli[NUM * 2];
+ } dst, res, src1;
+ 
+ static void
+ init_sbyte ()
+ {
+   int i;
+   for (i=0; i < NUM * 16; i++)
+     src1.ssi[i] = i;
+ }
+ 
+ static void
+ init_sword ()
+ {
+   int i;
+   for (i=0; i < NUM * 8; i++)
+     src1.si[i] = i;
+ }
+ 
+ 
+ static void
+ init_sdword ()
+ {
+   int i;
+   for (i=0; i < NUM * 4; i++)
+     src1.li[i] = i;
+ }
+ 
+ static int 
+ check_sbyte2word ()
+ {
+   int i, j, s, t, check_fails = 0;
+   for (i = 0; i < NUM * 16; i = i + 16)
+     {
+       for (j = 0; j < 8; j++)
+ 	{
+ 	  t = i + (2 * j);
+ 	  s = (i / 2) + j;
+ 	  res.si[s] = src1.ssi[t] + src1.ssi[t + 1] ;
+ 	  if (res.si[s] != dst.si[s]) 
+ 	    check_fails++;	
+ 	}
+     }
+ }
+ 
+ static int 
+ check_sbyte2dword ()
+ {
+   int i, j, s, t, check_fails = 0;
+   for (i = 0; i < NUM * 16; i = i + 16)
+     {
+       for (j = 0; j < 4; j++)
+ 	{
+ 	  t = i + (4 * j);
+ 	  s = (i / 4) + j;
+ 	  res.li[s] = (src1.ssi[t] + src1.ssi[t + 1]) + (src1.ssi[t + 2]
+ 	              + src1.ssi[t + 3]); 
+ 	  if (res.li[s] != dst.li[s]) 
+ 	    check_fails++;
+ 	}
+     }
+   return check_fails++;
+ }
+ 
+ static int
+ check_sbyte2qword ()
+ {
+   int i, j, s, t, check_fails = 0;
+   for (i = 0; i < NUM * 16; i = i + 16)
+     {
+       for (j = 0; j < 2; j++)
+ 	{
+ 	  t = i + (8 * j);
+ 	  s = (i / 8) + j;
+ 	  res.lli[s] = ((src1.ssi[t] + src1.ssi[t + 1]) + (src1.ssi[t + 2] 
+ 		       + src1.ssi[t + 3])) + ((src1.ssi[t + 4] + src1.ssi[t +5])
+ 	               + (src1.ssi[t + 6] + src1.ssi[t + 7])); 
+ 	  if (res.lli[s] != dst.lli[s]) 
+ 	    check_fails++;
+ 	}
+     }
+   return check_fails++;
+ }
+ 
+ static int
+ check_sword2dword ()
+ {
+   int i, j, s, t, check_fails = 0;
+   for (i = 0; i < (NUM * 8); i = i + 8)
+     {
+       for (j = 0; j < 4; j++)
+ 	{
+ 	  t = i + (2 * j);
+ 	  s = (i / 2) + j;
+ 	  res.li[s] = src1.si[t] + src1.si[t + 1] ;
+ 	  if (res.li[s] != dst.li[s]) 
+ 	    check_fails++;	
+ 	}
+     }
+ }
+ 
+ static int 
+ check_sword2qword ()
+ {
+   int i, j, s, t, check_fails = 0;
+   for (i = 0; i < NUM * 8; i = i + 8)
+     {
+       for (j = 0; j < 2; j++)
+ 	{
+ 	  t = i + (4 * j);
+ 	  s = (i / 4) + j;
+ 	  res.lli[s] = (src1.si[t] + src1.si[t + 1]) + (src1.si[t + 2]
+ 	               + src1.si[t + 3]); 
+ 	  if (res.lli[s] != dst.lli[s]) 
+ 	    check_fails++;
+ 	}
+     }
+   return check_fails++;
+ }
+ 
+ static int
+ check_dword2qword ()
+ {
+   int i, j, s, t, check_fails = 0;
+   for (i = 0; i < (NUM * 4); i = i + 4)
+     {
+       for (j = 0; j < 2; j++)
+ 	{
+ 	  t = i + (2 * j);
+ 	  s = (i / 2) + j;
+ 	  res.lli[s] = src1.li[t] + src1.li[t + 1] ;
+ 	  if (res.lli[s] != dst.lli[s]) 
+ 	    check_fails++;	
+ 	}
+     }
+ }
+ 
+ static void
+ sse5_test (void)
+ {
+   int i;
+   
+   /* Check haddbw */
+   init_sbyte ();
+   
+   for (i = 0; i < NUM; i++)
+     dst.x[i] = _mm_haddw_epi8 (src1.x[i]);
+   
+   if (check_sbyte2word())
+   abort ();
+   
+   /* Check haddbd */
+   for (i = 0; i < (NUM ); i++)
+     dst.x[i] = _mm_haddd_epi8 (src1.x[i]);
+   
+   if (check_sbyte2dword())
+     abort (); 
+   
+   /* Check haddbq */
+   for (i = 0; i < NUM; i++)
+     dst.x[i] = _mm_haddq_epi8 (src1.x[i]);
+   
+   if (check_sbyte2qword())
+     abort ();
+ 
+   /* Check haddwd */
+   init_sword ();
+ 
+   for (i = 0; i < (NUM ); i++)
+     dst.x[i] = _mm_haddd_epi16 (src1.x[i]);
+   
+   if (check_sword2dword())
+     abort (); 
+    
+   /* Check haddbwq */
+  
+   for (i = 0; i < NUM; i++)
+     dst.x[i] = _mm_haddq_epi16 (src1.x[i]);
+   
+   if (check_sword2qword())
+     abort ();
+  
+   /* Check haddq */
+   init_sdword ();
+ 
+     for (i = 0; i < NUM; i++)
+     dst.x[i] = _mm_haddq_epi32 (src1.x[i]);
+   
+   if (check_dword2qword())
+     abort ();
+ }
*** gcc/testsuite/gcc.target/i386/sse5-hsubX.c.~1~	2007-09-06 13:52:33.403458000 -0400
--- gcc/testsuite/gcc.target/i386/sse5-hsubX.c	2007-09-06 13:44:03.399295000 -0400
***************
*** 0 ****
--- 1,128 ----
+ /* { dg-do run { target i?86-*-* x86_64-*-* } } */
+ /* { dg-require-effective-target sse5 } */
+ /* { dg-options "-O2 -msse5" } */
+ 
+ #include "sse5-check.h"
+ 
+ #include <bmmintrin.h>
+ #include <string.h>
+ 
+ #define NUM 10
+ 
+ union
+ {
+   __m128i x[NUM];
+   int8_t ssi[NUM * 16];
+   int16_t si[NUM * 8];
+   int32_t li[NUM * 4];
+   int64_t lli[NUM * 2];
+ } dst, res, src1;
+ 
+ static void
+ init_sbyte ()
+ {
+   int i;
+   for (i=0; i < NUM * 16; i++)
+     src1.ssi[i] = i;
+ }
+ 
+ static void
+ init_sword ()
+ {
+   int i;
+   for (i=0; i < NUM * 8; i++)
+     src1.si[i] = i;
+ }
+ 
+ 
+ static void
+ init_sdword ()
+ {
+   int i;
+   for (i=0; i < NUM * 4; i++)
+     src1.li[i] = i;
+ }
+ 
+ static int 
+ check_sbyte2word ()
+ {
+   int i, j, s, t, check_fails = 0;
+   for (i = 0; i < NUM * 16; i = i + 16)
+     {
+       for (j = 0; j < 8; j++)
+ 	{
+ 	  t = i + (2 * j);
+ 	  s = (i / 2) + j;
+ 	  res.si[s] = src1.ssi[t] - src1.ssi[t + 1] ;
+ 	  if (res.si[s] != dst.si[s]) 
+ 	    check_fails++;	
+ 	}
+     }
+ }
+ 
+ static int
+ check_sword2dword ()
+ {
+   int i, j, s, t, check_fails = 0;
+   for (i = 0; i < (NUM * 8); i = i + 8)
+     {
+       for (j = 0; j < 4; j++)
+ 	{
+ 	  t = i + (2 * j);
+ 	  s = (i / 2) + j;
+ 	  res.li[s] = src1.si[t] - src1.si[t + 1] ;
+ 	  if (res.li[s] != dst.li[s]) 
+ 	    check_fails++;	
+ 	}
+     }
+ }
+ 
+ static int
+ check_dword2qword ()
+ {
+   int i, j, s, t, check_fails = 0;
+   for (i = 0; i < (NUM * 4); i = i + 4)
+     {
+       for (j = 0; j < 2; j++)
+ 	{
+ 	  t = i + (2 * j);
+ 	  s = (i / 2) + j;
+ 	  res.lli[s] = src1.li[t] - src1.li[t + 1] ;
+ 	  if (res.lli[s] != dst.lli[s]) 
+ 	    check_fails++;	
+ 	}
+     }
+ }
+ 
+ static void
+ sse5_test (void)
+ {
+   int i;
+   
+   /* Check hsubbw */
+   init_sbyte ();
+   
+   for (i = 0; i < NUM; i++)
+     dst.x[i] = _mm_hsubw_epi8 (src1.x[i]);
+   
+   if (check_sbyte2word())
+   abort ();
+   
+ 
+   /* Check hsubwd */
+   init_sword ();
+ 
+   for (i = 0; i < (NUM ); i++)
+     dst.x[i] = _mm_hsubd_epi16 (src1.x[i]);
+   
+   if (check_sword2dword())
+     abort (); 
+    
+    /* Check hsubdq */
+   init_sdword ();
+     for (i = 0; i < NUM; i++)
+     dst.x[i] = _mm_hsubq_epi32 (src1.x[i]);
+   
+   if (check_dword2qword())
+     abort ();
+ }
*** gcc/testsuite/gcc.target/i386/sse5-ima-vector.c.~1~	2007-09-06 13:52:33.428459000 -0400
--- gcc/testsuite/gcc.target/i386/sse5-ima-vector.c	2007-09-06 13:44:03.419317000 -0400
***************
*** 0 ****
--- 1,33 ----
+ /* Test that the compiler properly optimizes floating point multiply and add
+    instructions vector into fmaddps on SSE5 systems.  */
+ 
+ /* { dg-do compile { target x86_64-*-*} } */
+ /* { dg-options "-O2 -msse5 -mfused-madd -ftree-vectorize" } */
+ 
+ extern void exit (int);
+ 
+ typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__));
+ 
+ #define SIZE 10240
+ 
+ union {
+   __m128i align;
+   int i[SIZE];
+ } a, b, c, d;
+ 
+ void
+ int_mul_add (void)
+ {
+   int i;
+ 
+   for (i = 0; i < SIZE; i++)
+     a.i[i] = (b.i[i] * c.i[i]) + d.i[i];
+ }
+ 
+ int main ()
+ {
+   int_mul_add ();
+   exit (0);
+ }
+ 
+ /* { dg-final { scan-assembler "pmacsdd" } } */
*** gcc/testsuite/gcc.target/i386/sse5-maccXX.c.~1~	2007-09-06 13:52:33.439459000 -0400
--- gcc/testsuite/gcc.target/i386/sse5-maccXX.c	2007-09-06 13:44:03.434331000 -0400
***************
*** 0 ****
--- 1,140 ----
+ /* { dg-do run { target i?86-*-* x86_64-*-* } } */
+ /* { dg-require-effective-target sse5 } */
+ /* { dg-options "-O2 -msse5" } */
+ 
+ #include "sse5-check.h"
+ 
+ #include <bmmintrin.h>
+ #include <string.h>
+ 
+ #define NUM 20
+ 
+ union
+ {
+   __m128 x[NUM];
+   float f[NUM * 4];
+   __m128d y[NUM];
+   double d[NUM * 2];
+ } dst, res, src1, src2, src3;
+ 
+ 
+ /* Note that in macc*,msub*,mnmacc* and mnsub* instructions, the intermdediate 
+    product is not rounded, only the addition is rounded. */
+ 
+ static void
+ init_maccps ()
+ {
+   int i;
+   for (i = 0; i < NUM * 4; i++)
+     {
+       src1.f[i] = i;
+       src2.f[i] = i + 10;
+       src3.f[i] = i + 20;
+     }
+ }
+ 
+ static void
+ init_maccpd ()
+ {
+   int i;
+   for (i = 0; i < NUM * 4; i++)
+     {
+       src1.d[i] = i;
+       src2.d[i] = i + 10;
+       src3.d[i] = i + 20;
+     }
+ }
+ 
+ static int
+ check_maccps ()
+ {
+   int i, j, check_fails = 0;
+   for (i = 0; i < NUM * 4; i = i + 4)
+     for (j = 0; j < 4; j++)
+       {
+ 	res.f[i + j] = (src1.f[i + j] * src2.f[i + j]) + src3.f[i + j];
+ 	if (dst.f[i + j] != res.f[i + j]) 
+ 	  check_fails++;
+       }
+   return check_fails++;
+ }
+ 
+ static int
+ check_maccpd ()
+ {
+   int i, j, check_fails = 0;
+   for (i = 0; i < NUM * 2; i = i + 2)
+     for (j = 0; j < 2; j++)
+       {
+ 	res.d[i + j] = (src1.d[i + j] * src2.d[i + j]) + src3.d[i + j];
+ 	if (dst.d[i + j] != res.d[i + j]) 
+ 	  check_fails++;
+       }
+   return check_fails++;
+ }
+ 
+ 
+ static int
+ check_maccss ()
+ {
+   int i, j, check_fails = 0;
+   for (i = 0; i < NUM * 4; i= i + 4)
+     {
+       res.f[i] = (src1.f[i] * src2.f[i]) + src3.f[i];
+       if (dst.f[i] != res.f[i]) 
+ 	check_fails++;
+     }	
+   return check_fails++;
+ }
+ 
+ static int
+ check_maccsd ()
+ {
+   int i, j, check_fails = 0;
+   for (i = 0; i < NUM * 2; i = i + 2)
+     {
+       res.d[i] = (src1.d[i] * src2.d[i]) + src3.d[i];
+       if (dst.d[i] != res.d[i]) 
+ 	check_fails++;
+     }
+   return check_fails++;
+ }
+ 
+ static void
+ sse5_test (void)
+ {
+   int i;
+   
+   /* Check maccps */
+   init_maccps ();
+   
+   for (i = 0; i < NUM; i++)
+     dst.x[i] = _mm_macc_ps (src1.x[i], src2.x[i], src3.x[i]);
+   
+   if (check_maccps ()) 
+     abort ();
+   
+   /* check maccss */
+   for (i = 0; i < NUM; i++)
+     dst.x[i] = _mm_macc_ss (src1.x[i], src2.x[i], src3.x[i]);
+   
+   if (check_maccss ()) 
+     abort ();
+   
+   /* Check maccpd */
+   init_maccpd ();
+   
+   for (i = 0; i < NUM; i++)
+     dst.y[i] = _mm_macc_pd (src1.y[i], src2.y[i], src3.y[i]);
+   
+   if (check_maccpd ()) 
+     abort ();
+   
+   /* Check maccps */
+   for (i = 0; i < NUM; i++)
+     dst.y[i] = _mm_macc_sd (src1.y[i], src2.y[i], src3.y[i]);
+   
+   if (check_maccsd ()) 
+     abort ();
+   
+ }
*** gcc/testsuite/gcc.target/i386/sse5-msubXX.c.~1~	2007-09-06 13:52:33.451459000 -0400
--- gcc/testsuite/gcc.target/i386/sse5-msubXX.c	2007-09-06 13:44:03.455353000 -0400
***************
*** 0 ****
--- 1,139 ----
+ /* { dg-do run { target i?86-*-* x86_64-*-* } } */
+ /* { dg-require-effective-target sse5 } */
+ /* { dg-options "-O2 -msse5" } */
+ 
+ #include "sse5-check.h"
+ 
+ #include <bmmintrin.h>
+ #include <string.h>
+ 
+ #define NUM 20
+ 
+ union
+ {
+   __m128 x[NUM];
+   float f[NUM * 4];
+   __m128d y[NUM];
+   double d[NUM * 2];
+ } dst, res, src1, src2, src3;
+ 
+ /* Note that in macc*,msub*,mnmacc* and mnsub* instructions, the intermdediate 
+    product is not rounded, only the addition is rounded. */
+ 
+ static void
+ init_msubps ()
+ {
+   int i;
+   for (i = 0; i < NUM * 4; i++)
+     {
+       src1.f[i] = i;
+       src2.f[i] = i + 10;
+       src3.f[i] = i + 20;
+     }
+ }
+ 
+ static void
+ init_msubpd ()
+ {
+   int i;
+   for (i = 0; i < NUM * 4; i++)
+     {
+       src1.d[i] = i;
+       src2.d[i] = i + 10;
+       src3.d[i] = i + 20;
+     }
+ }
+ 
+ static int
+ check_msubps ()
+ {
+   int i, j, check_fails = 0;
+   for (i = 0; i < NUM * 4; i = i + 4)
+     for (j = 0; j < 4; j++)
+       {
+ 	res.f[i + j] = (src1.f[i + j] * src2.f[i + j]) - src3.f[i + j];
+ 	if (dst.f[i + j] != res.f[i + j]) 
+ 	  check_fails++;
+       }
+   return check_fails++;
+ }
+ 
+ static int
+ check_msubpd ()
+ {
+   int i, j, check_fails = 0;
+   for (i = 0; i < NUM * 2; i = i + 2)
+     for (j = 0; j < 2; j++)
+       {
+ 	res.d[i + j] = (src1.d[i + j] * src2.d[i + j]) - src3.d[i + j];
+ 	if (dst.d[i + j] != res.d[i + j]) 
+ 	  check_fails++;
+       }
+   return check_fails++;
+ }
+ 
+ 
+ static int
+ check_msubss ()
+ {
+   int i, j, check_fails = 0;
+   for (i = 0; i < NUM * 4; i = i + 4)
+     {
+       res.f[i] = (src1.f[i] * src2.f[i]) - src3.f[i];
+       if (dst.f[i] != res.f[i]) 
+ 	check_fails++;
+     }	
+   return check_fails++;
+ }
+ 
+ static int
+ check_msubsd ()
+ {
+   int i, j, check_fails = 0;
+   for (i = 0; i < NUM * 2; i = i + 2)
+     {
+       res.d[i] = (src1.d[i] * src2.d[i]) - src3.d[i];
+       if (dst.d[i] != res.d[i]) 
+ 	check_fails++;
+     }
+   return check_fails++;
+ }
+ 
+ static void
+ sse5_test (void)
+ {
+   int i;
+   
+   /* Check msubps */
+   init_msubps ();
+   
+   for (i = 0; i < NUM; i++)
+     dst.x[i] = _mm_msub_ps (src1.x[i], src2.x[i], src3.x[i]);
+   
+   if (check_msubps ()) 
+     abort ();
+   
+   /* check msubss */
+   for (i = 0; i < NUM; i++)
+     dst.x[i] = _mm_msub_ss (src1.x[i], src2.x[i], src3.x[i]);
+   
+   if (check_msubss ()) 
+     abort ();
+   
+   /* Check msubpd */
+   init_msubpd ();
+   
+   for (i = 0; i < NUM; i++)
+     dst.y[i] = _mm_msub_pd (src1.y[i], src2.y[i], src3.y[i]);
+   
+   if (check_msubpd ()) 
+     abort ();
+   
+   /* Check msubps */
+   for (i = 0; i < NUM; i++)
+     dst.y[i] = _mm_msub_sd (src1.y[i], src2.y[i], src3.y[i]);
+   
+   if (check_msubsd ()) 
+     abort ();
+   
+ }
*** gcc/testsuite/gcc.target/i386/sse5-nmaccXX.c.~1~	2007-09-06 13:52:33.462459000 -0400
--- gcc/testsuite/gcc.target/i386/sse5-nmaccXX.c	2007-09-06 13:44:03.481378000 -0400
***************
*** 0 ****
--- 1,139 ----
+ /* { dg-do run { target i?86-*-* x86_64-*-* } } */
+ /* { dg-require-effective-target sse5 } */
+ /* { dg-options "-O2 -msse5" } */
+ 
+ #include "sse5-check.h"
+ 
+ #include <bmmintrin.h>
+ #include <string.h>
+ 
+ #define NUM 20
+ 
+ union
+ {
+   __m128 x[NUM];
+   float f[NUM * 4];
+   __m128d y[NUM];
+   double d[NUM * 2];
+ } dst, res, src1, src2, src3;
+ 
+ /* Note that in macc*,msub*,mnmacc* and mnsub* instructions, the intermdediate 
+    product is not rounded, only the addition is rounded. */
+ 
+ static void
+ init_nmaccps ()
+ {
+   int i;
+   for (i = 0; i < NUM * 4; i++)
+     {
+       src1.f[i] = i;
+       src2.f[i] = i + 10;
+       src3.f[i] = i + 20;
+     }
+ }
+ 
+ static void
+ init_nmaccpd ()
+ {
+   int i;
+   for (i = 0; i < NUM * 4; i++)
+     {
+       src1.d[i] = i;
+       src2.d[i] = i + 10;
+       src3.d[i] = i + 20;
+     }
+ }
+ 
+ static int
+ check_nmaccps ()
+ {
+   int i, j, check_fails = 0;
+   for (i = 0; i < NUM * 4; i = i + 4)
+     for (j = 0; j < 4; j++)
+       {
+ 	res.f[i + j] = - (src1.f[i + j] * src2.f[i + j]) + src3.f[i + j];
+ 	if (dst.f[i + j] != res.f[i + j]) 
+ 	  check_fails++;
+       }
+   return check_fails++;
+ }
+ 
+ static int
+ check_nmaccpd ()
+ {
+   int i, j, check_fails = 0;
+   for (i = 0; i < NUM * 2; i = i + 2)
+     for (j = 0; j < 2; j++)
+       {
+ 	res.d[i + j] = - (src1.d[i + j] * src2.d[i + j]) + src3.d[i + j];
+ 	if (dst.d[i + j] != res.d[i + j]) 
+ 	  check_fails++;
+       }
+   return check_fails++;
+ }
+ 
+ 
+ static int
+ check_nmaccss ()
+ {
+   int i, j, check_fails = 0;
+   for (i = 0; i < NUM * 4; i = i + 4)
+     {
+       res.f[i] = - (src1.f[i] * src2.f[i]) + src3.f[i];
+       if (dst.f[i] != res.f[i]) 
+ 	check_fails++;
+     }	
+   return check_fails++;
+ }
+ 
+ static int
+ check_nmaccsd ()
+ {
+   int i, j, check_fails = 0;
+   for (i = 0; i < NUM * 2; i = i + 2)
+     {
+       res.d[i] = - (src1.d[i] * src2.d[i]) + src3.d[i];
+       if (dst.d[i] != res.d[i]) 
+ 	check_fails++;
+     }
+   return check_fails++;
+ }
+ 
+ static void
+ sse5_test (void)
+ {
+   int i;
+   
+   /* Check nmaccps */
+   init_nmaccps ();
+   
+   for (i = 0; i < NUM; i++)
+     dst.x[i] = _mm_nmacc_ps (src1.x[i], src2.x[i], src3.x[i]);
+   
+   if (check_nmaccps ()) 
+     abort ();
+   
+   /* check nmaccss */
+   for (i = 0; i < NUM; i++)
+     dst.x[i] = _mm_nmacc_ss (src1.x[i], src2.x[i], src3.x[i]);
+   
+   if (check_nmaccss ()) 
+     abort ();
+   
+   /* Check nmaccpd */
+   init_nmaccpd ();
+   
+   for (i = 0; i < NUM; i++)
+     dst.y[i] = _mm_nmacc_pd (src1.y[i], src2.y[i], src3.y[i]);
+   
+   if (check_nmaccpd ()) 
+     abort ();
+   
+   /* Check nmaccps */
+   for (i = 0; i < NUM; i++)
+     dst.y[i] = _mm_nmacc_sd (src1.y[i], src2.y[i], src3.y[i]);
+   
+   if (check_nmaccsd ()) 
+     abort ();
+   
+ }
*** gcc/testsuite/gcc.target/i386/sse5-nmsubXX.c.~1~	2007-09-06 13:52:33.474458000 -0400
--- gcc/testsuite/gcc.target/i386/sse5-nmsubXX.c	2007-09-06 13:44:03.498395000 -0400
***************
*** 0 ****
--- 1,139 ----
+ /* { dg-do run { target i?86-*-* x86_64-*-* } } */
+ /* { dg-require-effective-target sse5 } */
+ /* { dg-options "-O2 -msse5" } */
+ 
+ #include "sse5-check.h"
+ 
+ #include <bmmintrin.h>
+ #include <string.h>
+ 
+ #define NUM 20
+ 
+ union
+ {
+   __m128 x[NUM];
+   float f[NUM * 4];
+   __m128d y[NUM];
+   double d[NUM * 2];
+ } dst, res, src1, src2, src3;
+ 
+ /* Note that in macc*,msub*,mnmacc* and mnsub* instructions, the intermdediate 
+    product is not rounded, only the addition is rounded. */
+ 
+ static void
+ init_nmsubps ()
+ {
+   int i;
+   for (i = 0; i < NUM * 4; i++)
+     {
+       src1.f[i] = i;
+       src2.f[i] = i + 10;
+       src3.f[i] = i + 20;
+     }
+ }
+ 
+ static void
+ init_nmsubpd ()
+ {
+   int i;
+   for (i = 0; i < NUM * 4; i++)
+     {
+       src1.d[i] = i;
+       src2.d[i] = i + 10;
+       src3.d[i] = i + 20;
+     }
+ }
+ 
+ static int
+ check_nmsubps ()
+ {
+   int i, j, check_fails = 0;
+   for (i = 0; i < NUM * 4; i = i + 4)
+     for (j = 0; j < 4; j++)
+       {
+ 	res.f[i + j] = - (src1.f[i + j] * src2.f[i + j]) - src3.f[i + j];
+ 	if (dst.f[i + j] != res.f[i + j]) 
+ 	  check_fails++;
+       }
+   return check_fails++;
+ }
+ 
+ static int
+ check_nmsubpd ()
+ {
+   int i, j, check_fails = 0;
+   for (i = 0; i < NUM * 2; i = i + 2)
+     for (j = 0; j < 2; j++)
+       {
+ 	res.d[i + j] = - (src1.d[i + j] * src2.d[i + j]) - src3.d[i + j];
+ 	if (dst.d[i + j] != res.d[i + j]) 
+ 	  check_fails++;
+       }
+   return check_fails++;
+ }
+ 
+ 
+ static int
+ check_nmsubss ()
+ {
+   int i, j, check_fails = 0;
+   for (i = 0; i < NUM * 4; i = i + 4)
+     {
+       res.f[i] = - (src1.f[i] * src2.f[i]) - src3.f[i];
+       if (dst.f[i] != res.f[i]) 
+ 	check_fails++;
+     }	
+   return check_fails++;
+ }
+ 
+ static int
+ check_nmsubsd ()
+ {
+   int i, j, check_fails = 0;
+   for (i = 0; i < NUM * 2; i = i + 2)
+     {
+       res.d[i] = - (src1.d[i] * src2.d[i]) - src3.d[i];
+       if (dst.d[i] != res.d[i]) 
+ 	check_fails++;
+     }
+   return check_fails++;
+ }
+ 
+ static void
+ sse5_test (void)
+ {
+   int i;
+   
+   /* Check nmsubps */
+   init_nmsubps ();
+   
+   for (i = 0; i < NUM; i++)
+     dst.x[i] = _mm_nmsub_ps (src1.x[i], src2.x[i], src3.x[i]);
+   
+   if (check_nmsubps (&dst.x[i], &src1.f[i * 4], &src2.f[i * 4], &src3.f[i * 4])) 
+     abort ();
+   
+   /* check nmsubss */
+   for (i = 0; i < NUM; i++)
+     dst.x[i] = _mm_nmsub_ss (src1.x[i], src2.x[i], src3.x[i]);
+   
+   if (check_nmsubss (&dst.x[i], &src1.f[i * 4], &src2.f[i * 4], &src3.f[i * 4])) 
+     abort ();
+   
+   /* Check nmsubpd */
+   init_nmsubpd ();
+   
+   for (i = 0; i < NUM; i++)
+     dst.y[i] = _mm_nmsub_pd (src1.y[i], src2.y[i], src3.y[i]);
+   
+   if (check_nmsubpd (&dst.y[i], &src1.d[i * 2], &src2.d[i * 2], &src3.d[i * 2])) 
+     abort ();
+   
+   /* Check nmsubps */
+   for (i = 0; i < NUM; i++)
+     dst.y[i] = _mm_nmsub_sd (src1.y[i], src2.y[i], src3.y[i]);
+   
+   if (check_nmsubsd (&dst.y[i], &src1.d[i * 2], &src2.d[i * 2], &src3.d[i * 2])) 
+     abort ();
+   
+ }
*** gcc/testsuite/gcc.target/i386/sse5-pcmov2.c.~1~	2007-09-06 13:52:33.486459000 -0400
--- gcc/testsuite/gcc.target/i386/sse5-pcmov2.c	2007-09-06 13:44:03.504403000 -0400
***************
*** 0 ****
--- 1,22 ----
+ /* Test that the compiler properly optimizes conditional floating point moves
+    into the pcmov instruction on SSE5 systems.  */
+ 
+ /* { dg-do compile { target x86_64-*-*} } */
+ /* { dg-options "-O2 -msse5" } */
+ 
+ extern void exit (int);
+ 
+ float flt_test (float a, float b, float c, float d)
+ {
+   return (a > b) ? c : d;
+ }
+ 
+ float flt_a = 1, flt_b = 2, flt_c = 3, flt_d = 4, flt_e;
+ 
+ int main()
+ {
+   flt_e = flt_test (flt_a, flt_b, flt_c, flt_d);
+   exit (0);
+ }
+ 
+ /* { dg-final { scan-assembler "pcmov" } } */
*** gcc/testsuite/gcc.target/i386/sse5-pcmov.c.~1~	2007-09-06 13:52:33.497458000 -0400
--- gcc/testsuite/gcc.target/i386/sse5-pcmov.c	2007-09-06 13:44:03.525422000 -0400
***************
*** 0 ****
--- 1,22 ----
+ /* Test that the compiler properly optimizes conditional floating point moves
+    into the pcmov instruction on SSE5 systems.  */
+ 
+ /* { dg-do compile { target x86_64-*-*} } */
+ /* { dg-options "-O2 -msse5" } */
+ 
+ extern void exit (int);
+ 
+ double dbl_test (double a, double b, double c, double d)
+ {
+   return (a > b) ? c : d;
+ }
+ 
+ double dbl_a = 1, dbl_b = 2, dbl_c = 3, dbl_d = 4, dbl_e;
+ 
+ int main()
+ {
+   dbl_e = dbl_test (dbl_a, dbl_b, dbl_c, dbl_d);
+   exit (0);
+ }
+ 
+ /* { dg-final { scan-assembler "pcmov" } } */
*** gcc/testsuite/gcc.target/i386/sse5-permpX.c.~1~	2007-09-06 13:52:33.507459000 -0400
--- gcc/testsuite/gcc.target/i386/sse5-permpX.c	2007-09-06 13:44:03.541441000 -0400
***************
*** 0 ****
--- 1,120 ----
+ /* { dg-do run { target i?86-*-* x86_64-*-* } } */
+ /* { dg-require-effective-target sse5 } */
+ /* { dg-options "-O2 -msse5" } */
+ 
+ #include "sse5-check.h"
+ 
+ #include <bmmintrin.h>
+ #include <string.h>
+ 
+ union
+ {
+   __m128 x[2];
+   __m128d y[2];
+   __m128i z[2];
+   float f[8];
+   double d[4];
+   int i[8];
+   long li[4];
+ } dst, res, src1, src2, src3;
+ 
+ 
+ static void
+ init_ddata ()
+ {
+   int i;
+   for (i = 0; i < 4; i++)
+     {
+       src1.d[i] = i;
+       src2.d[i] = i + 2;
+     }
+  
+   src3.li[0] = 3;
+   src3.li[1] = 0;
+   src3.li[2] = 1;
+   src3.li[3] = 2;
+ 
+   res.d[0] = 3.0;
+   res.d[1] = 0.0;
+   res.d[2] = 3.0;
+   res.d[3] = 4.0;
+ }
+ 
+ 
+ static void 
+ init_fdata ()
+ {
+   int i;
+   for (i = 0; i < 8; i++)
+     {
+       src1.f[i] = i;
+       src2.f[i] = i + 2;
+     }
+ 
+   src3.i[0] = 7;
+   src3.i[1] = 5;
+   src3.i[2] = 1;
+   src3.i[3] = 2;
+   src3.i[4] = 0;
+   src3.i[5] = 4;
+   src3.i[6] = 3;
+   src3.i[7] = 6; 
+ 
+   res.f[0] = 5.0;
+   res.f[1] = 3.0;
+   res.f[2] = 1.0;
+   res.f[3] = 2.0;
+   res.f[4] = 4.0;
+   res.f[5] = 6.0;
+   res.f[6] = 7.0;
+   res.f[7] = 8.0;
+ }
+ 
+ static int
+ check_permpd ()
+ {
+   int i, check_fails = 0;
+ 
+   for (i = 0; i < 4; i++)
+     {
+       if (res.d[i] != dst.d[i])
+ 	check_fails++;
+     }
+   return check_fails++;
+ }
+ 
+ static int
+ check_permps ()
+ {
+   int i, check_fails = 0;
+ 
+   for (i = 0; i < 8; i++)
+     {
+       if (res.f[i] != dst.f[i])
+ 	check_fails++;
+     }
+   return check_fails++;
+ }
+ 
+ static void
+ sse5_test (void)
+ {
+   int i;
+   init_ddata();
+ 
+   for (i = 0; i < 2; i++)
+     dst.y[i] = _mm_perm_pd (src1.y[i], src2.y[i], src3.z[i]);
+   
+   if (check_permpd ())
+     abort ();
+   
+   init_fdata();
+   
+   for (i = 0; i < 2; i++)
+     dst.x[i] = _mm_perm_ps (src1.x[i], src2.x[i], src3.z[i]);
+    
+   if (check_permps ())
+     abort (); 
+ }
+ 
+ 


More information about the Gcc-patches mailing list