This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

PATCH: Add SSE4.2 support


I added a dummy nmmintrin.h for SSE4.2 intrinsics and put SSE4.2
intrinsics in smmintrin.h since icc will make SSE4.2 intrinsics
available in smmintrin.h in the future release.

pcmp[ei]strm in SSE4.2 sets flag register in a non-standard way. I
added 2 CC modes: CCPCMPESTR and CCPCMPISTR, to deal with it. They
are only compatible with themselves.

For _mm_cmp[e]istr[acosz], they only return flag register status. They
can be implemented with either pcmp[ei]stri or pcmp[ei]strm. But
depending on if there is a compatible pcmp[ei]stri or pcmp[ei]strm
nearby, one may be optimized out other and the other can't. I implemented
them with pcmp[ei]stri and tried to spilt it to pcmp[ei]strm if it may
be optimized out. I checked the spilt condition in ix86_pcmpstrm_ok. It
only looks for a compatible pcmp[ei]strm without checking if their
operands are the same. It means we may generate pcmp[ei]strm instead
of pcmp[ei]stri when we see a pcmp[ei]strm, but it can't be optimzied 
out since 2 pcmp[ei]strm take different operands.

I added -msse4/-mno-sse4. -msse4 will enable both SSE4.1 and SSE4.2.
Ideally, -mno-sse4 should disable both SSE4.1 and SSE4.2. But since
we run out of bits in x86 option flag, I used

msse4
Target Report Mask(SSE4_1|MASK_SSE4_2) MaskExists

It looks odd, but works. What are the best way to deal with it?

I will submit a patch for the SSE4.2 intrinsic tests later.


H.J.
---
2007-05-22  H.J. Lu  <hongjiu.lu@intel.com>

	* config.gcc (i[34567]86-*-*): Add nmmintrin.h to
	extra_headers.
	(x86_64-*-*): Likewise.

	* i386/i386-modes.def (CCPCMPESTR): New.
	(CCPCMPISTR): Likewise.

	* config/i386/i386-protos.h (ix86_pcmpstrm_ok): New.

	* config/i386/i386.c (ix86_handle_option): Handle SSE4.2.
	(override_options): Support SSE4.2.
	(put_condition_code): Handle CCPCMPESTRmode and CCPCMPISTRmode.
	(ix86_cc_modes_compatible): Likewise.
	(IX86_BUILTIN_CRC32QI): New for SSE4.2.
	(IX86_BUILTIN_CRC32HI): Likewise.
	(IX86_BUILTIN_CRC32SI): Likewise.
	(IX86_BUILTIN_CRC32DI): Likewise.
	(IX86_BUILTIN_PCMPESTRI128): Likewise.
	(IX86_BUILTIN_PCMPESTRM128): Likewise.
	(IX86_BUILTIN_PCMPESTRA128): Likewise.
	(IX86_BUILTIN_PCMPESTRC128): Likewise.
	(IX86_BUILTIN_PCMPESTRO128): Likewise.
	(IX86_BUILTIN_PCMPESTRS128): Likewise.
	(IX86_BUILTIN_PCMPESTRZ128): Likewise.
	(IX86_BUILTIN_PCMPISTRI128): Likewise.
	(IX86_BUILTIN_PCMPISTRM128): Likewise.
	(IX86_BUILTIN_PCMPISTRA128): Likewise.
	(IX86_BUILTIN_PCMPISTRC128): Likewise.
	(IX86_BUILTIN_PCMPISTRO128): Likewise.
	(IX86_BUILTIN_PCMPISTRS128): Likewise.
	(IX86_BUILTIN_PCMPISTRZ128): Likewise.
	(IX86_BUILTIN_PCMPGTQ): Likewise.
	(BUILTIN_DESC_PCMPSTR_CC): Likewise.
	(bdesc_pcmpestr): Likewise.
	(bdesc_pcmpistr): Likewise.
	(bdesc_crc32): Likewise.
	(bdesc_sse_3arg): Likewise.
	(ix86_expand_crc32): Likewise.
	(ix86_check_pcmpstrm): Likewise.
	(ix86_pcmpstrm_ok): Likewise.
	(ix86_expand_sse_pcmpestr): Likewise.
	(ix86_expand_sse_pcmpistr): Likewise.
	(ix86_init_mmx_sse_builtins): Support SSE4.2.
	(ix86_expand_builtin): Likewise.

	* config/i386/i386.h (TARGET_CPU_CPP_BUILTINS): Define
	__SSE4_2__ for -msse4.2.
	(REVERSIBLE_CC_MODE): Return 0 for CCPCMPESTRmode or
	CCPCMPISTRmode.

	* config/i386/i386.md (UNSPEC_CRC32): New for SSE4.2.
	(UNSPEC_PCMPESTR): Likewise.
	(UNSPEC_PCMPISTR): Likewise.
	(CRC32MODE): Likewise.
	(crc32modesuffix): Likewise.
	(crc32modeconstraint): Likewise.
	(sse4_2_crc32<mode>): Likewise.
	(sse4_2_crc32di): Likewise.

	* config/i386/i386.opt (msse4.2): New for SSE4.2.
	(msse4): Likewise.

	* config/i386/nmmintrin.h: New. The dummy SSE4.2 intrinsic header
	file.

	* config/i386/predicates.md (fcmov_comparison_operator): Handle
	CCPCMPESTRmode and CCPCMPISTRmode.
	(ix86_comparison_operator): Likewise.

	* config/i386/smmintrin.h: Add SSE4.2 intrinsics.

	* config/i386/sse.md (sse4_2_gtv2di3): New pattern for
	SSE4.2.
	(vcondv2di): Likewise.
	(vconduv2di): Likewise.
	(sse4_2_pcmpestri): Likewise.
	(sse4_2_pcmpestrm): Likewise.
	(sse4_2_pcmpistri): Likewise.
	(sse4_2_pcmpistrm): Likewise.
	Split see4_2_pcmpestri to sse4_2_pcmpestrm if profitable.
	Split see4_2_pcmpistri to sse4_2_pcmpistrm if profitable.

	* doc/extend.texi: Document SSE4.2 built-in functions.

	* doc/invoke.texi: Document -msse4.2/-msse4.

--- gcc/config.gcc.nni	2007-05-22 07:43:24.000000000 -0700
+++ gcc/config.gcc	2007-05-22 13:31:31.000000000 -0700
@@ -276,12 +276,14 @@ xscale-*-*)
 i[34567]86-*-*)
 	cpu_type=i386
 	extra_headers="mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
-		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h"
+		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
+		       nmmintrin.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
 	extra_headers="mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
-		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h"
+		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
+		       nmmintrin.h"
 	need_64bit_hwint=yes
 	;;
 ia64-*-*)
--- gcc/config/i386/i386-modes.def.nni	2007-05-22 07:43:24.000000000 -0700
+++ gcc/config/i386/i386-modes.def	2007-05-22 13:31:31.000000000 -0700
@@ -53,7 +53,17 @@ ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG
    mode is used to simulate comparisons of (a-b) and (a+b)
    against zero using sub/cmp/add operations.
 
-   Add CCZ to indicate that only the Zero flag is valid.  */
+   Add CCZ to indicate that only the Zero flag is valid.
+
+   Add CCPCMPESTR/CCPCMPISTR for pcmp[ei]str[im] instructions:
+   
+   	suffix	      rtx_code
+	  a		GTU
+	  c		LTU
+	  o		UNGT	OF == 1
+	  s		UNLT	SF == 1
+	  z		EQ
+ */
 
 CC_MODE (CCGC);
 CC_MODE (CCGOC);
@@ -61,6 +71,8 @@ CC_MODE (CCNO);
 CC_MODE (CCZ);
 CC_MODE (CCFP);
 CC_MODE (CCFPU);
+CC_MODE (CCPCMPESTR);
+CC_MODE (CCPCMPISTR);
 
 /* Vector modes.  */
 VECTOR_MODES (INT, 4);        /*            V4QI V2HI */
--- gcc/config/i386/i386-protos.h.nni	2007-05-22 13:31:31.000000000 -0700
+++ gcc/config/i386/i386-protos.h	2007-05-22 13:31:31.000000000 -0700
@@ -89,6 +89,7 @@ extern void ix86_expand_binary_operator 
 extern int ix86_binary_operator_ok (enum rtx_code, enum machine_mode, rtx[]);
 extern void ix86_expand_unary_operator (enum rtx_code, enum machine_mode,
 					rtx[]);
+extern int ix86_pcmpstrm_ok (rtx);
 extern rtx ix86_build_const_vector (enum machine_mode, bool, rtx);
 extern void ix86_split_convert_uns_si_sse (rtx[]);
 extern void ix86_expand_convert_uns_didf_sse (rtx, rtx);
--- gcc/config/i386/i386.c.nni	2007-05-22 13:31:31.000000000 -0700
+++ gcc/config/i386/i386.c	2007-05-22 13:31:31.000000000 -0700
@@ -1596,14 +1596,24 @@ ix86_handle_option (size_t code, const c
     case OPT_mssse3:
       if (!value)
 	{
-	  target_flags &= ~(MASK_SSE4_1 | MASK_SSE4A);
-	  target_flags_explicit |= MASK_SSE4_1 | MASK_SSE4A;
+	  target_flags &= ~(MASK_SSE4_1 | MASK_SSE4_2 | MASK_SSE4A);
+	  target_flags_explicit |= (MASK_SSE4_1 | MASK_SSE4_2
+				    | MASK_SSE4A);
 	}
       return true;
 
     case OPT_msse4_1:
       if (!value)
 	{
+	  target_flags &= ~(MASK_SSE4_2 | MASK_SSE4A);
+	  target_flags_explicit |= MASK_SSE4_2 | MASK_SSE4A;
+	}
+      return true;
+
+    case OPT_msse4:
+    case OPT_msse4_2:
+      if (!value)
+	{
 	  target_flags &= ~MASK_SSE4A;
 	  target_flags_explicit |= MASK_SSE4A;
 	}
@@ -1612,8 +1622,8 @@ ix86_handle_option (size_t code, const c
     case OPT_msse4a:
       if (!value)
 	{
-	  target_flags &= ~MASK_SSE4_1;
-	  target_flags_explicit |= MASK_SSE4_1;
+	  target_flags &= ~(MASK_SSE4_1 | MASK_SSE4_2);
+	  target_flags_explicit |= MASK_SSE4_1 | MASK_SSE4_2;
 	}
       return true;
 
@@ -1691,7 +1701,8 @@ override_options (void)
 	  PTA_ABM = 1 << 11,
  	  PTA_SSE4A = 1 << 12,
 	  PTA_NO_SAHF = 1 << 13,
- 	  PTA_SSE4_1 = 1 << 14
+ 	  PTA_SSE4_1 = 1 << 14,
+ 	  PTA_SSE4_2 = 1 << 15
 	} flags;
     }
   const processor_alias_table[] =
@@ -1956,6 +1967,9 @@ override_options (void)
 	if (processor_alias_table[i].flags & PTA_SSE4_1
 	    && !(target_flags_explicit & MASK_SSE4_1))
 	  target_flags |= MASK_SSE4_1;
+	if (processor_alias_table[i].flags & PTA_SSE4_2
+	    && !(target_flags_explicit & MASK_SSE4_2))
+	  target_flags |= MASK_SSE4_2;
 	if (processor_alias_table[i].flags & PTA_PREFETCH_SSE)
 	  x86_prefetch_sse = true;
 	if (processor_alias_table[i].flags & PTA_CX16)
@@ -2161,6 +2175,10 @@ override_options (void)
   if (!TARGET_80387)
     target_flags |= MASK_NO_FANCY_MATH_387;
 
+  /* Turn on SSE4.1 builtins and popcnt instruction for -msse4.2.  */
+  if (TARGET_SSE4_2)
+    target_flags |= MASK_SSE4_1 | MASK_POPCNT;
+
   /* Turn on SSSE3 builtins for -msse4.1.  */
   if (TARGET_SSE4_1)
     target_flags |= MASK_SSSE3;
@@ -8067,7 +8085,10 @@ put_condition_code (enum rtx_code code, 
       mode = CCmode;
     }
   if (reverse)
-    code = reverse_condition (code);
+    {
+      gcc_assert (mode != CCPCMPESTRmode && mode != CCPCMPISTRmode);
+      code = reverse_condition (code);
+    }
 
   switch (code)
     {
@@ -8075,6 +8096,7 @@ put_condition_code (enum rtx_code code, 
       suffix = "e";
       break;
     case NE:
+      gcc_assert (mode != CCPCMPESTRmode && mode != CCPCMPISTRmode);
       suffix = "ne";
       break;
     case GT:
@@ -8084,10 +8106,13 @@ put_condition_code (enum rtx_code code, 
     case GTU:
       /* ??? Use "nbe" instead of "a" for fcmov lossage on some assemblers.
 	 Those same assemblers have the same but opposite lossage on cmov.  */
-      gcc_assert (mode == CCmode);
+      gcc_assert (mode == CCmode
+		  || mode == CCPCMPESTRmode
+		  || mode == CCPCMPISTRmode);
       suffix = fp ? "nbe" : "a";
       break;
     case LT:
+      gcc_assert (mode != CCPCMPESTRmode && mode != CCPCMPISTRmode);
       switch (mode)
 	{
 	case CCNOmode:
@@ -8105,10 +8130,13 @@ put_condition_code (enum rtx_code code, 
 	}
       break;
     case LTU:
-      gcc_assert (mode == CCmode);
+      gcc_assert (mode == CCmode
+		  || mode == CCPCMPESTRmode
+		  || mode == CCPCMPISTRmode );
       suffix = "b";
       break;
     case GE:
+      gcc_assert (mode != CCPCMPESTRmode && mode != CCPCMPISTRmode);
       switch (mode)
 	{
 	case CCNOmode:
@@ -8144,6 +8172,14 @@ put_condition_code (enum rtx_code code, 
     case ORDERED:
       suffix = fp ? "nu" : "np";
       break;
+    case UNGT:
+      gcc_assert (mode == CCPCMPESTRmode || mode == CCPCMPISTRmode);
+      suffix = "o";
+      break;
+    case UNLT:
+      gcc_assert (mode == CCPCMPESTRmode || mode == CCPCMPISTRmode);
+      suffix = "s";
+      break;
     default:
       gcc_unreachable ();
     }
@@ -10863,6 +10899,8 @@ ix86_cc_modes_compatible (enum machine_m
 
     case CCFPmode:
     case CCFPUmode:
+    case CCPCMPESTRmode:
+    case CCPCMPISTRmode:
       /* These are only compatible with themselves, which we already
 	 checked above.  */
       return VOIDmode;
@@ -16558,6 +16596,29 @@ enum ix86_builtins
   IX86_BUILTIN_VEC_SET_V4HI,
   IX86_BUILTIN_VEC_SET_V16QI,
 
+  /* SSE4.2.  */
+  IX86_BUILTIN_CRC32QI,
+  IX86_BUILTIN_CRC32HI,
+  IX86_BUILTIN_CRC32SI,
+  IX86_BUILTIN_CRC32DI,
+
+  IX86_BUILTIN_PCMPESTRI128,
+  IX86_BUILTIN_PCMPESTRM128,
+  IX86_BUILTIN_PCMPESTRA128,
+  IX86_BUILTIN_PCMPESTRC128,
+  IX86_BUILTIN_PCMPESTRO128,
+  IX86_BUILTIN_PCMPESTRS128,
+  IX86_BUILTIN_PCMPESTRZ128,
+  IX86_BUILTIN_PCMPISTRI128,
+  IX86_BUILTIN_PCMPISTRM128,
+  IX86_BUILTIN_PCMPISTRA128,
+  IX86_BUILTIN_PCMPISTRC128,
+  IX86_BUILTIN_PCMPISTRO128,
+  IX86_BUILTIN_PCMPISTRS128,
+  IX86_BUILTIN_PCMPISTRZ128,
+
+  IX86_BUILTIN_PCMPGTQ,
+
   IX86_BUILTIN_MAX
 };
 
@@ -16603,6 +16664,9 @@ def_builtin_const (int mask, const char 
    swap_comparison in order to support it.  */
 #define BUILTIN_DESC_SWAP_OPERANDS	1
 
+/* Set when we check FLAGS_REG for pcmp[ei]strm.  */
+#define BUILTIN_DESC_PCMPSTR_CC		2
+
 struct builtin_description
 {
   const unsigned int mask;
@@ -16649,6 +16713,39 @@ static const struct builtin_description 
   { MASK_SSE4_1, CODE_FOR_sse4_1_ptest, "__builtin_ia32_ptestnzc128", IX86_BUILTIN_PTESTNZC, GTU, 0 },
 };
 
+static const struct builtin_description bdesc_pcmpestr[] =
+{
+  /* SSE4.2 */
+  { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpestri, "__builtin_ia32_pcmpestri128", IX86_BUILTIN_PCMPESTRI128, 0, 0 },
+  { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpestrm, "__builtin_ia32_pcmpestrm128", IX86_BUILTIN_PCMPESTRM128, 0, 0 },
+  { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpestri, "__builtin_ia32_pcmpestria128", IX86_BUILTIN_PCMPESTRA128, GTU, BUILTIN_DESC_PCMPSTR_CC },
+  { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpestri, "__builtin_ia32_pcmpestric128", IX86_BUILTIN_PCMPESTRC128, LTU, BUILTIN_DESC_PCMPSTR_CC },
+  { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpestri, "__builtin_ia32_pcmpestrio128", IX86_BUILTIN_PCMPESTRO128, UNGT, BUILTIN_DESC_PCMPSTR_CC },
+  { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpestri, "__builtin_ia32_pcmpestris128", IX86_BUILTIN_PCMPESTRS128, UNLT, BUILTIN_DESC_PCMPSTR_CC },
+  { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpestri, "__builtin_ia32_pcmpestriz128", IX86_BUILTIN_PCMPESTRZ128, EQ, BUILTIN_DESC_PCMPSTR_CC },
+};
+
+static const struct builtin_description bdesc_pcmpistr[] =
+{
+  /* SSE4.2 */
+  { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpistri, "__builtin_ia32_pcmpistri128", IX86_BUILTIN_PCMPISTRI128, 0, 0 },
+  { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpistrm, "__builtin_ia32_pcmpistrm128", IX86_BUILTIN_PCMPISTRM128, 0, 0 },
+  { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpistri, "__builtin_ia32_pcmpistria128", IX86_BUILTIN_PCMPISTRA128, GTU, BUILTIN_DESC_PCMPSTR_CC },
+  { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpistri, "__builtin_ia32_pcmpistric128", IX86_BUILTIN_PCMPISTRC128, LTU, BUILTIN_DESC_PCMPSTR_CC },
+  { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpistri, "__builtin_ia32_pcmpistrio128", IX86_BUILTIN_PCMPISTRO128, UNGT, BUILTIN_DESC_PCMPSTR_CC },
+  { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpistri, "__builtin_ia32_pcmpistris128", IX86_BUILTIN_PCMPISTRS128, UNLT, BUILTIN_DESC_PCMPSTR_CC },
+  { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpistri, "__builtin_ia32_pcmpistriz128", IX86_BUILTIN_PCMPISTRZ128, EQ, BUILTIN_DESC_PCMPSTR_CC },
+};
+
+static const struct builtin_description bdesc_crc32[] =
+{
+  /* SSE4.2 */
+  { MASK_SSE4_2, CODE_FOR_sse4_2_crc32qi, 0, IX86_BUILTIN_CRC32QI, 0, 0 },
+  { MASK_SSE4_2, CODE_FOR_sse4_2_crc32hi, 0, IX86_BUILTIN_CRC32HI, 0, 0 },
+  { MASK_SSE4_2, CODE_FOR_sse4_2_crc32si, 0, IX86_BUILTIN_CRC32SI, 0, 0 },
+  { MASK_SSE4_2, CODE_FOR_sse4_2_crc32di, 0, IX86_BUILTIN_CRC32DI, 0, 0 },
+};
+
 /* SSE builtins with 3 arguments and the last argument must be a 8 bit
    constant or xmm0.  */
 static const struct builtin_description bdesc_sse_3arg[] =
@@ -16981,6 +17078,9 @@ static const struct builtin_description 
   { MASK_SSE4_1, CODE_FOR_uminv8hi3, "__builtin_ia32_pminuw128", IX86_BUILTIN_PMINUW128, 0, 0 },
   { MASK_SSE4_1, CODE_FOR_sse4_1_mulv2siv2di3, 0, IX86_BUILTIN_PMULDQ128, 0, 0 },
   { MASK_SSE4_1, CODE_FOR_mulv4si3, "__builtin_ia32_pmulld128", IX86_BUILTIN_PMULLD128, 0, 0 },
+
+  /* SSE4.2 */
+  { MASK_SSE4_2, CODE_FOR_sse4_2_gtv2di3, "__builtin_ia32_pcmpgtq", IX86_BUILTIN_PCMPGTQ, 0, 0 },
 };
 
 static const struct builtin_description bdesc_1arg[] =
@@ -17410,6 +17510,28 @@ ix86_init_mmx_sse_builtins (void)
     = build_function_type_list (integer_type_node,
 				V2DI_type_node, V2DI_type_node,
 				NULL_TREE);
+  tree int_ftype_v16qi_int_v16qi_int_int
+    = build_function_type_list (integer_type_node,
+				V16QI_type_node,
+				integer_type_node,
+				V16QI_type_node,
+				integer_type_node,
+				integer_type_node,
+				NULL_TREE);
+  tree v16qi_ftype_v16qi_int_v16qi_int_int
+    = build_function_type_list (V16QI_type_node,
+				V16QI_type_node,
+				integer_type_node,
+				V16QI_type_node,
+				integer_type_node,
+				integer_type_node,
+				NULL_TREE);
+  tree int_ftype_v16qi_v16qi_int
+    = build_function_type_list (integer_type_node,
+				V16QI_type_node,
+				V16QI_type_node,
+				integer_type_node,
+				NULL_TREE);
 
   tree float80_type;
   tree float128_type;
@@ -17627,6 +17749,30 @@ ix86_init_mmx_sse_builtins (void)
   for (i = 0, d = bdesc_ptest; i < ARRAY_SIZE (bdesc_ptest); i++, d++)
     def_builtin (d->mask, d->name, int_ftype_v2di_v2di, d->code);
 
+  /* pcmpestr[im] insns.  */
+  for (i = 0, d = bdesc_pcmpestr;
+       i < ARRAY_SIZE (bdesc_pcmpestr);
+       i++, d++)
+    {
+      if (d->icode == CODE_FOR_sse4_2_pcmpestrm)
+	ftype = v16qi_ftype_v16qi_int_v16qi_int_int;
+      else
+	ftype = int_ftype_v16qi_int_v16qi_int_int;
+      def_builtin (d->mask, d->name, ftype, d->code);
+    }
+
+  /* pcmpistr[im] insns.  */
+  for (i = 0, d = bdesc_pcmpistr;
+       i < ARRAY_SIZE (bdesc_pcmpistr);
+       i++, d++)
+    {
+      if (d->icode == CODE_FOR_sse4_2_pcmpistrm)
+	ftype = v16qi_ftype_v16qi_v16qi_int;
+      else
+	ftype = int_ftype_v16qi_v16qi_int;
+      def_builtin (d->mask, d->name, ftype, d->code);
+    }
+
   def_builtin (MASK_MMX, "__builtin_ia32_packsswb", v8qi_ftype_v4hi_v4hi, IX86_BUILTIN_PACKSSWB);
   def_builtin (MASK_MMX, "__builtin_ia32_packssdw", v4hi_ftype_v2si_v2si, IX86_BUILTIN_PACKSSDW);
   def_builtin (MASK_MMX, "__builtin_ia32_packuswb", v8qi_ftype_v4hi_v4hi, IX86_BUILTIN_PACKUSWB);
@@ -17838,6 +17984,32 @@ ix86_init_mmx_sse_builtins (void)
   def_builtin_const (MASK_SSE4_1, "__builtin_ia32_roundss",
 		     v4sf_ftype_v4sf_v4sf_int, IX86_BUILTIN_ROUNDSS);
 
+  /* SSE4.2. */
+  ftype = build_function_type_list (unsigned_type_node,
+				    unsigned_type_node,
+				    unsigned_char_type_node,
+				    NULL_TREE);
+  def_builtin (MASK_SSE4_2, "__builtin_ia32_crc32qi",
+	       ftype, IX86_BUILTIN_CRC32QI);
+  ftype = build_function_type_list (unsigned_type_node,
+				    unsigned_type_node,
+				    short_unsigned_type_node,
+				    NULL_TREE);
+  def_builtin (MASK_SSE4_2, "__builtin_ia32_crc32hi",
+	       ftype, IX86_BUILTIN_CRC32HI);
+  ftype = build_function_type_list (unsigned_type_node,
+				    unsigned_type_node,
+				    unsigned_type_node,
+				    NULL_TREE);
+  def_builtin (MASK_SSE4_2, "__builtin_ia32_crc32si",
+	       ftype, IX86_BUILTIN_CRC32SI);
+  ftype = build_function_type_list (long_long_unsigned_type_node,
+				    long_long_unsigned_type_node,
+				    long_long_unsigned_type_node,
+				    NULL_TREE);
+  def_builtin (MASK_SSE4_2, "__builtin_ia32_crc32di",
+	       ftype, IX86_BUILTIN_CRC32DI);
+
   /* AMDFAM10 SSE4A New built-ins  */
   def_builtin (MASK_SSE4A, "__builtin_ia32_movntsd",
                void_ftype_pdouble_v2df, IX86_BUILTIN_MOVNTSD);
@@ -18039,6 +18211,41 @@ ix86_expand_sse_4_operands_builtin (enum
   return target;
 }
 
+/* Subroutine of ix86_expand_builtin to take care of crc32 insns.  */
+
+static rtx
+ix86_expand_crc32 (enum insn_code icode, tree exp, rtx target)
+{
+  rtx pat;
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+  tree arg1 = CALL_EXPR_ARG (exp, 1);
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  enum machine_mode tmode = insn_data[icode].operand[0].mode;
+  enum machine_mode mode0 = insn_data[icode].operand[1].mode;
+  enum machine_mode mode1 = insn_data[icode].operand[2].mode;
+
+  if (optimize
+      || !target
+      || GET_MODE (target) != tmode
+      || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
+    target = gen_reg_rtx (tmode);
+
+  if (!(*insn_data[icode].operand[1].predicate) (op0, mode0))
+    op0 = copy_to_mode_reg (mode0, op0);
+  if (!(*insn_data[icode].operand[2].predicate) (op1, mode1))
+    {
+      op1 = copy_to_reg (op1);
+      op1 = simplify_gen_subreg (mode1, op1, GET_MODE (op1), 0);
+    }
+
+  pat = GEN_FCN (icode) (target, op0, op1);
+  if (! pat)
+    return 0;
+  emit_insn (pat);
+  return target;
+}
+
 /* Subroutine of ix86_expand_builtin to take care of binop insns.  */
 
 static rtx
@@ -18372,6 +18579,303 @@ ix86_expand_sse_ptest (const struct buil
   return SUBREG_REG (target);
 }
 
+/* Return TRUE if we should continue search. Otherwise return FALSE
+   and set REPLACE_P to 1 if we can replace pcmp[ei]stri with
+   pcmp[ei]strm.  */
+
+static bool
+ix86_check_pcmpstrm (rtx insn, int index, int *replace_p)
+{
+  rtx set = PATTERN (insn);
+  rtx src;
+  int unspec;
+
+  if (GET_CODE (set) != PARALLEL)
+    return true;
+
+  set = XVECEXP (set, 0, 0);
+  if (GET_CODE (set) != SET)
+    return true;
+
+  src = SET_SRC (set);
+  if (GET_CODE (src) != UNSPEC)
+    return true;
+
+  unspec = XINT (src, 1);
+  if (unspec != UNSPEC_PCMPESTR && unspec != UNSPEC_PCMPISTR)
+    return true;
+
+  /* We only replace pcmp[ei]stri if the next pcmp[ei]strm is
+     compatible.  */
+  if (unspec == index)
+    {
+      rtx dest = SET_DEST (set);
+
+      /* It can be either pcmp[ei]stri or pcmp[ei]strm.  We only
+	 replace pcmp[ei]stri with pcmp[ei]strm if this one is
+	 pcmp[ei]strm.  */
+      *replace_p = reg_or_subregno (dest) == FIRST_SSE_REG;
+    }
+
+  return false;
+}
+
+/* When we only use FLAGS_REG, there is no difference between those
+   pcmp[ei]stri or pcmp[ei]strm. We will replace pcmp[ei]stri with
+   pcmp[ei]strm if it can be optimized out later.  */
+
+int
+ix86_pcmpstrm_ok (rtx insn)
+{
+  rtx set, src, p;
+  int index;
+  int replace = 0;
+
+  set = PATTERN (insn);
+  gcc_assert (GET_CODE (set) == PARALLEL);
+  set = XVECEXP (set, 0, 0);
+  src = SET_SRC (set);
+  gcc_assert (GET_CODE (set) == SET && GET_CODE (src) == UNSPEC);
+  index = XINT (src, 1);
+  gcc_assert (index == UNSPEC_PCMPESTR || index == UNSPEC_PCMPISTR);
+
+  /* We can't use pcmp[ei]strm if ecx is used.  */
+  if (! find_regno_note (insn, REG_UNUSED, 2))
+    return replace;
+
+  /* If there is a compatible pcmp[ei]strm insn after this one, we
+     will replace pcmp[ei]stri with pcmp[ei]strm. We hope that 2 are
+     the same and one of them will be optimized out later.  */
+  for (p = NEXT_INSN (insn); p; p = NEXT_INSN (p))
+    if (INSN_P (p))
+      {
+	/* We don't replace pcmp[ei]strm if we hit a label or a jump
+	   insn.  */
+	if (LABEL_P (p) || !NONJUMP_INSN_P (p))
+	  break;
+
+	if (!ix86_check_pcmpstrm (p, index, &replace))
+	  {
+	    if (replace)
+	      return replace;
+	    break;
+	  }
+      }
+
+  /* Search for a compatible pcmp[ei]strm insn before this one.  */
+  for (p = PREV_INSN (insn); p; p = PREV_INSN (p))
+    if (INSN_P (p))
+      {
+	/* We don't replace pcmp[ei]strm if we hit a label or a jump
+	   insn.  */
+	if (LABEL_P (p) || !NONJUMP_INSN_P (p))
+	  return replace;
+
+	if (!ix86_check_pcmpstrm (p, index, &replace))
+	  return replace;
+      }
+  
+  return replace;
+}
+
+/* Subroutine of ix86_expand_builtin to take care of pcmpestr[im]
+   insns.  */
+
+static rtx
+ix86_expand_sse_pcmpestr (const struct builtin_description *d,
+			   tree exp, rtx target)
+{
+  rtx pat;
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+  tree arg1 = CALL_EXPR_ARG (exp, 1);
+  tree arg2 = CALL_EXPR_ARG (exp, 2);
+  tree arg3 = CALL_EXPR_ARG (exp, 3);
+  tree arg4 = CALL_EXPR_ARG (exp, 4);
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  rtx op2 = expand_normal (arg2);
+  rtx op3 = expand_normal (arg3);
+  rtx op4 = expand_normal (arg4);
+  enum machine_mode modev0, modev1, modeimm;
+
+  modev0 = insn_data[d->icode].operand[0].mode;
+  modev1 = insn_data[d->icode].operand[1].mode;
+  modeimm = insn_data[d->icode].operand[2].mode;
+
+  if (VECTOR_MODE_P (modev0))
+    op0 = safe_vector_operand (op0, modev0);
+  if (VECTOR_MODE_P (modev1))
+    op2 = safe_vector_operand (op2, modev1);
+
+  if ((optimize && !register_operand (op0, modev0))
+      || !(*insn_data[d->icode].operand[0].predicate) (op0, modev0))
+    op0 = copy_to_mode_reg (modev0, op0);
+  if ((optimize && !register_operand (op2, modev1))
+      || !(*insn_data[d->icode].operand[1].predicate) (op2, modev1))
+    op2 = copy_to_mode_reg (modev1, op2);
+
+  if (! (*insn_data[d->icode].operand[2].predicate) (op4, modeimm))
+    {
+      error ("the fifth argument must be a 8-bit immediate");
+      return const0_rtx;
+    }
+
+  /* OP1 is the length of the first string which should be in eax.  */
+  pat = gen_rtx_REG (SImode, 0);
+  emit_move_insn (pat, op1);
+  
+  /* OP3 is the length of the second string which should be in edx.  */
+  pat = gen_rtx_REG (SImode, 1);
+  emit_move_insn (pat, op3);
+  
+  if ((d->flag & BUILTIN_DESC_PCMPSTR_CC) == 0)
+    {
+      enum machine_mode tmode;
+      unsigned int tregno;
+
+      if (d->icode == CODE_FOR_sse4_2_pcmpestri)
+	{
+	  /* Index is returned in ecx.  */
+	  tmode = SImode;
+	  tregno = 2;
+	}
+      else
+	{
+	  /* Mask is returned in xmm0.  */
+	  tmode = V16QImode;
+	  tregno = FIRST_SSE_REG;
+	}
+
+      if (optimize
+	  || !target
+	  || GET_MODE (target) != tmode
+	  || ! (*insn_data[d->icode].operand[0].predicate) (target,
+							    tmode)
+	  || reg_or_subregno (target) != tregno)
+	target = gen_rtx_REG (tmode, tregno);
+
+      pat = GEN_FCN (d->icode) (op0, op2, op4);
+      if (! pat)
+	return 0;
+      emit_insn (pat);
+      return target;
+    }
+  else
+    {
+      target = gen_reg_rtx (SImode);
+      emit_move_insn (target, const0_rtx);
+      target = gen_rtx_SUBREG (QImode, target, 0);
+
+      pat = GEN_FCN (d->icode) (op0, op2, op4);
+      if (! pat)
+	return 0;
+      emit_insn (pat);
+      gcc_assert (GET_CODE (pat) == PARALLEL);
+      pat = XVECEXP (pat, 0, 1);
+      gcc_assert (GET_CODE (pat) == SET);
+      gcc_assert (GET_MODE (SET_DEST (pat)) == CCPCMPESTRmode);
+      emit_insn (gen_rtx_SET (VOIDmode,
+			      gen_rtx_STRICT_LOW_PART (VOIDmode, target),
+			      gen_rtx_fmt_ee (d->comparison, QImode,
+					      SET_DEST (pat),
+					      const0_rtx)));
+      return SUBREG_REG (target);
+    }
+}
+
+/* Subroutine of ix86_expand_builtin to take care of pcmpistr[im]
+   insns.  */
+
+static rtx
+ix86_expand_sse_pcmpistr (const struct builtin_description *d,
+			   tree exp, rtx target)
+{
+  rtx pat;
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+  tree arg1 = CALL_EXPR_ARG (exp, 1);
+  tree arg2 = CALL_EXPR_ARG (exp, 2);
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  rtx op2 = expand_normal (arg2);
+  enum machine_mode modev0, modev1, modeimm;
+
+  modev0 = insn_data[d->icode].operand[0].mode;
+  modev1 = insn_data[d->icode].operand[1].mode;
+  modeimm = insn_data[d->icode].operand[2].mode;
+
+  if (VECTOR_MODE_P (modev0))
+    op0 = safe_vector_operand (op0, modev0);
+  if (VECTOR_MODE_P (modev1))
+    op2 = safe_vector_operand (op2, modev1);
+
+  if ((optimize && !register_operand (op0, modev0))
+      || !(*insn_data[d->icode].operand[0].predicate) (op0, modev0))
+    op0 = copy_to_mode_reg (modev0, op0);
+  if ((optimize && !register_operand (op1, modev1))
+      || !(*insn_data[d->icode].operand[1].predicate) (op1, modev1))
+    op1 = copy_to_mode_reg (modev1, op1);
+
+  if (! (*insn_data[d->icode].operand[2].predicate) (op2, modeimm))
+    {
+      error ("the third argument must be a 8-bit immediate");
+      return const0_rtx;
+    }
+
+  if ((d->flag & BUILTIN_DESC_PCMPSTR_CC) == 0)
+    {
+      enum machine_mode tmode;
+      unsigned int tregno;
+
+      if (d->icode == CODE_FOR_sse4_2_pcmpistri)
+	{
+	  /* Index is returned in ecx.  */
+	  tmode = SImode;
+	  tregno = 2;
+	}
+      else
+	{
+	  /* Mask is returned in xmm0.  */
+	  tmode = V16QImode;
+	  tregno = FIRST_SSE_REG;
+	}
+
+      if (optimize
+	  || !target
+	  || GET_MODE (target) != tmode
+	  || ! (*insn_data[d->icode].operand[0].predicate) (target,
+							    tmode)
+	  || reg_or_subregno (target) != tregno)
+	target = gen_rtx_REG (tmode, tregno);
+
+      pat = GEN_FCN (d->icode) (op0, op1, op2);
+      if (! pat)
+	return 0;
+      emit_insn (pat);
+      return target;
+    }
+  else
+    {
+      target = gen_reg_rtx (SImode);
+      emit_move_insn (target, const0_rtx);
+      target = gen_rtx_SUBREG (QImode, target, 0);
+
+      pat = GEN_FCN (d->icode) (op0, op1, op2);
+      if (! pat)
+	return 0;
+      emit_insn (pat);
+      gcc_assert (GET_CODE (pat) == PARALLEL);
+      pat = XVECEXP (pat, 0, 1);
+      gcc_assert (GET_CODE (pat) == SET);
+      gcc_assert (GET_MODE (SET_DEST (pat)) == CCPCMPISTRmode);
+      emit_insn (gen_rtx_SET (VOIDmode,
+			      gen_rtx_STRICT_LOW_PART (VOIDmode, target),
+			      gen_rtx_fmt_ee (d->comparison, QImode,
+					      SET_DEST (pat),
+					      const0_rtx)));
+      return SUBREG_REG (target);
+    }
+}
+
 /* Return the integer constant in ARG.  Constrain it to be in the range
    of the subparts of VEC_TYPE; issue an error if not.  */
 
@@ -19198,6 +19702,22 @@ ix86_expand_builtin (tree exp, rtx targe
     if (d->code == fcode)
       return ix86_expand_sse_ptest (d, exp, target);
 
+  for (i = 0, d = bdesc_crc32; i < ARRAY_SIZE (bdesc_crc32); i++, d++)
+    if (d->code == fcode)
+      return ix86_expand_crc32 (d->icode, exp, target);
+
+  for (i = 0, d = bdesc_pcmpestr;
+       i < ARRAY_SIZE (bdesc_pcmpestr);
+       i++, d++)
+    if (d->code == fcode)
+      return ix86_expand_sse_pcmpestr (d, exp, target);
+
+  for (i = 0, d = bdesc_pcmpistr;
+       i < ARRAY_SIZE (bdesc_pcmpistr);
+       i++, d++)
+    if (d->code == fcode)
+      return ix86_expand_sse_pcmpistr (d, exp, target);
+
   gcc_unreachable ();
 }
 
--- gcc/config/i386/i386.h.nni	2007-05-22 13:31:31.000000000 -0700
+++ gcc/config/i386/i386.h	2007-05-22 13:31:31.000000000 -0700
@@ -542,6 +542,8 @@ extern const char *host_detect_local_cpu
 	builtin_define ("__SSSE3__");				\
       if (TARGET_SSE4_1)					\
 	builtin_define ("__SSE4_1__");				\
+      if (TARGET_SSE4_2)					\
+	builtin_define ("__SSE4_2__");				\
       if (TARGET_SSE4A)						\
  	builtin_define ("__SSE4A__");		                \
       if (TARGET_SSE_MATH && TARGET_SSE)			\
@@ -2029,7 +2031,8 @@ do {							\
 /* Return nonzero if MODE implies a floating point inequality can be
    reversed.  */
 
-#define REVERSIBLE_CC_MODE(MODE) 1
+#define REVERSIBLE_CC_MODE(MODE) \
+  ((MODE) != CCPCMPESTRmode && (MODE) != CCPCMPISTRmode)
 
 /* A C expression whose value is reversed condition code of the CODE for
    comparison done in CC_MODE mode.  */
--- gcc/config/i386/i386.md.nni	2007-05-22 07:43:24.000000000 -0700
+++ gcc/config/i386/i386.md	2007-05-22 13:31:31.000000000 -0700
@@ -173,6 +173,11 @@
    (UNSPEC_PTEST		140)
    (UNSPEC_ROUNDP		141)
    (UNSPEC_ROUNDS		142)
+
+   ; For SSE4.2 support
+   (UNSPEC_CRC32		143)
+   (UNSPEC_PCMPESTR		144)
+   (UNSPEC_PCMPISTR		145)
   ])
 
 (define_constants
@@ -20893,6 +20898,36 @@
   }
   [(set_attr "type" "multi")])
 
+(define_mode_macro CRC32MODE [QI HI SI])
+(define_mode_attr crc32modesuffix [(QI "b") (HI "w") (SI "l")])
+(define_mode_attr crc32modeconstraint [(QI "qm") (HI "rm") (SI "rm")])
+
+(define_insn "sse4_2_crc32<mode>"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(unspec:SI
+	  [(match_operand:SI 1 "register_operand" "0")
+	   (match_operand:CRC32MODE 2 "nonimmediate_operand" "<crc32modeconstraint>")]
+	  UNSPEC_CRC32))]
+  "TARGET_SSE4_2"
+  "crc32<crc32modesuffix>\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_rep" "1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "SI")])
+
+(define_insn "sse4_2_crc32di"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+	(unspec:DI
+	  [(match_operand:DI 1 "register_operand" "0")
+	   (match_operand:DI 2 "nonimmediate_operand" "rm")]
+	  UNSPEC_CRC32))]
+  "TARGET_SSE4_2 && TARGET_64BIT"
+  "crc32q\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_rep" "1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "DI")])
+
 (include "mmx.md")
 (include "sse.md")
 (include "sync.md")
--- gcc/config/i386/i386.opt.nni	2007-05-22 07:43:24.000000000 -0700
+++ gcc/config/i386/i386.opt	2007-05-22 14:41:28.000000000 -0700
@@ -191,6 +191,14 @@ msse4.1
 Target Report Mask(SSE4_1)
 Support MMX, SSE, SSE2, SSE3, SSSE3 and SSE4.1 built-in functions and code generation
 
+msse4.2
+Target Report Mask(SSE4_2)
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in functions and code generation
+
+msse4
+Target Report Mask(SSE4_1|MASK_SSE4_2) MaskExists
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in functions and code generation
+
 msse4a
 Target Report Mask(SSE4A)
 Support MMX, SSE, SSE2, SSE3 and SSE4A built-in functions and code generation
--- gcc/config/i386/nmmintrin.h.nni	2007-05-22 13:31:31.000000000 -0700
+++ gcc/config/i386/nmmintrin.h	2007-05-22 13:31:31.000000000 -0700
@@ -0,0 +1,40 @@
+/* Copyright (C) 2007 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING.  If not, write to
+   the Free Software Foundation, 59 Temple Place - Suite 330,
+   Boston, MA 02111-1307, USA.  */
+
+/* As a special exception, if you include this header file into source
+   files compiled by GCC, this header file does not by itself cause
+   the resulting executable to be covered by the GNU General Public
+   License.  This exception does not however invalidate any other
+   reasons why the executable file might be covered by the GNU General
+   Public License.  */
+
+/* Implemented from the specification included in the Intel C++ Compiler
+   User Guide and Reference, version 10.0.  */
+
+#ifndef _NMMINTRIN_H_INCLUDED
+#define _NMMINTRIN_H_INCLUDED
+
+#ifndef __SSE4_2__
+# error "SSE4.2 instruction set not enabled"
+#else
+/* We just include SSE4.1 header file.  */
+#include <smmintrin.h>
+#endif /* __SSE4_2__ */
+
+#endif /* _NMMINTRIN_H_INCLUDED */
--- gcc/config/i386/predicates.md.nni	2007-05-22 07:43:24.000000000 -0700
+++ gcc/config/i386/predicates.md	2007-05-22 13:31:31.000000000 -0700
@@ -856,7 +856,22 @@
   enum machine_mode inmode = GET_MODE (XEXP (op, 0));
   enum rtx_code code = GET_CODE (op);
 
-  if (inmode == CCFPmode || inmode == CCFPUmode)
+  if (inmode == CCPCMPESTRmode || inmode == CCPCMPISTRmode)
+    {
+      switch (code)
+	{
+	case GTU:
+	case LTU:
+	case EQ:
+	  return 1;
+	case UNGT:
+	case UNLT:
+	  return 0;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+  else if (inmode == CCFPmode || inmode == CCFPUmode)
     {
       enum rtx_code second_code, bypass_code;
       ix86_fp_comparison_codes (code, &bypass_code, &code, &second_code);
@@ -906,14 +921,23 @@
     }
   switch (code)
     {
-    case EQ: case NE:
+    case NE:
+      return inmode != CCPCMPESTRmode && inmode != CCPCMPISTRmode;
+    case EQ:
       return 1;
     case LT: case GE:
       if (inmode == CCmode || inmode == CCGCmode
 	  || inmode == CCGOCmode || inmode == CCNOmode)
 	return 1;
       return 0;
-    case LTU: case GTU: case LEU: case ORDERED: case UNORDERED: case GEU:
+    case LTU:
+    case GTU:
+      if (inmode == CCmode
+	  || inmode == CCPCMPESTRmode
+	  || inmode == CCPCMPISTRmode)
+	return 1;
+      return 0;
+    case LEU: case ORDERED: case UNORDERED: case GEU:
       if (inmode == CCmode)
 	return 1;
       return 0;
@@ -921,6 +945,10 @@
       if (inmode == CCmode || inmode == CCGCmode || inmode == CCNOmode)
 	return 1;
       return 0;
+    case UNGT:
+    case UNLT:
+      if (inmode == CCPCMPESTRmode || inmode == CCPCMPISTRmode)
+	return 1;
     default:
       return 0;
     }
--- gcc/config/i386/smmintrin.h.nni	2007-05-22 07:43:24.000000000 -0700
+++ gcc/config/i386/smmintrin.h	2007-05-22 13:31:31.000000000 -0700
@@ -573,6 +573,246 @@ _mm_stream_load_si128 (__m128i *__X)
   return (__m128i) __builtin_ia32_movntdqa ((__v2di *) __X);
 }
 
+#ifdef __SSE4_2__
+
+/* These macros specify the source data format.  */
+#define SIDD_UBYTE_OPS			0x00
+#define SIDD_UWORD_OPS			0x01
+#define SIDD_SBYTE_OPS			0x02
+#define SIDD_SWORD_OPS			0x03
+
+/* These macros specify the comparison operation.  */
+#define SIDD_CMP_EQUAL_ANY		0x00
+#define SIDD_CMP_RANGES			0x04
+#define SIDD_CMP_EQUAL_EACH		0x08
+#define SIDD_CMP_EQUAL_ORDERED		0x0c
+
+/* These macros specify the the polarity.  */
+#define SIDD_POSITIVE_POLARITY		0x00
+#define SIDD_NEGATIVE_POLARITY		0x10
+#define SIDD_MASKED_POSITIVE_POLARITY	0x20
+#define SIDD_MASKED_NEGATIVE_POLARITY	0x30
+
+/* These macros specify the output selection in _mm_cmpXstri ().  */
+#define SIDD_LEAST_SIGNIFICANT		0x00
+#define SIDD_MOST_SIGNIFICANT		0x40
+
+/* These macros specify the output selection in _mm_cmpXstrm ().  */
+#define SIDD_BIT_MASK			0x00
+#define SIDD_UNIT_MASK			0x40
+
+/* Intrinsics for text/string processing.  */
+
+#ifdef __OPTIMIZE__
+static __inline __m128i __attribute__((__always_inline__))
+_mm_cmpistrm (__m128i __X, __m128i __Y, const int __M)
+{
+  return (__m128i) __builtin_ia32_pcmpistrm128 ((__v16qi)__X,
+						(__v16qi)__Y,
+						__M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpistri (__m128i __X, __m128i __Y, const int __M)
+{
+  return __builtin_ia32_pcmpistri128 ((__v16qi)__X,
+				      (__v16qi)__Y,
+				      __M);
+}
+
+static __inline __m128i __attribute__((__always_inline__))
+_mm_cmpestrm (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M)
+{
+  return (__m128i) __builtin_ia32_pcmpestrm128 ((__v16qi)__X, __LX,
+						(__v16qi)__Y, __LY,
+						__M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpestri (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M)
+{
+  return __builtin_ia32_pcmpestri128 ((__v16qi)__X, __LX,
+				      (__v16qi)__Y, __LY,
+				      __M);
+}
+#else
+#define _mm_cmpistrm(X, Y, M) \
+  ((__m128i) __builtin_ia32_pcmpistrm128 ((__v16qi)(X), (__v16qi)(Y), (M)))
+#define _mm_cmpistri(X, Y, M) \
+  __builtin_ia32_pcmpistri128 ((__v16qi)(X), (__v16qi)(Y), (M))
+
+#define _mm_cmpestrm(X, LX, Y, LY, M) \
+  ((__m128i) __builtin_ia32_pcmpestrm128 ((__v16qi)(X), (int)(LX), \
+					  (__v16qi)(Y), (int)(LY), (M)))
+#define _mm_cmpestri(X, LX, Y, LY, M) \
+  __builtin_ia32_pcmpestri128 ((__v16qi)(X), (int)(LX), \
+			       (__v16qi)(Y), (int)(LY), (M))
+#endif
+
+/* Intrinsics for text/string processing and reading values of
+   EFlags.  */
+
+#ifdef __OPTIMIZE__
+static __inline int __attribute__((__always_inline__))
+_mm_cmpistra (__m128i __X, __m128i __Y, const int __M)
+{
+  return __builtin_ia32_pcmpistria128 ((__v16qi)__X,
+				       (__v16qi)__Y,
+				       __M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpistrc (__m128i __X, __m128i __Y, const int __M)
+{
+  return __builtin_ia32_pcmpistric128 ((__v16qi)__X,
+				       (__v16qi)__Y,
+				       __M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpistro (__m128i __X, __m128i __Y, const int __M)
+{
+  return __builtin_ia32_pcmpistrio128 ((__v16qi)__X,
+				       (__v16qi)__Y,
+				       __M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpistrs (__m128i __X, __m128i __Y, const int __M)
+{
+  return __builtin_ia32_pcmpistris128 ((__v16qi)__X,
+				       (__v16qi)__Y,
+				       __M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpistrz (__m128i __X, __m128i __Y, const int __M)
+{
+  return __builtin_ia32_pcmpistriz128 ((__v16qi)__X,
+				       (__v16qi)__Y,
+				       __M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpestra (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M)
+{
+  return __builtin_ia32_pcmpestria128 ((__v16qi)__X, __LX,
+				       (__v16qi)__Y, __LY,
+				       __M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpestrc (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M)
+{
+  return __builtin_ia32_pcmpestric128 ((__v16qi)__X, __LX,
+				       (__v16qi)__Y, __LY,
+				       __M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpestro (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M)
+{
+  return __builtin_ia32_pcmpestrio128 ((__v16qi)__X, __LX,
+				       (__v16qi)__Y, __LY,
+				       __M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpestrs (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M)
+{
+  return __builtin_ia32_pcmpestris128 ((__v16qi)__X, __LX,
+				       (__v16qi)__Y, __LY,
+				       __M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpestrz (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M)
+{
+  return __builtin_ia32_pcmpestriz128 ((__v16qi)__X, __LX,
+				       (__v16qi)__Y, __LY,
+				       __M);
+}
+#else
+#define _mm_cmpistra(X, Y, M) \
+  __builtin_ia32_pcmpistria128 ((__v16qi)(X), (__v16qi)(Y), (M))
+#define _mm_cmpistrc(X, Y, M) \
+  __builtin_ia32_pcmpistric128 ((__v16qi)(X), (__v16qi)(Y), (M))
+#define _mm_cmpistro(X, Y, M) \
+  __builtin_ia32_pcmpistrio128 ((__v16qi)(X), (__v16qi)(Y), (M))
+#define _mm_cmpistrs(X, Y, M) \
+  __builtin_ia32_pcmpistris128 ((__v16qi)(X), (__v16qi)(Y), (M))
+#define _mm_cmpistrz(X, Y, M) \
+  __builtin_ia32_pcmpistriz128 ((__v16qi)(X), (__v16qi)(Y), (M))
+
+#define _mm_cmpestra(X, LX, Y, LY, M) \
+  __builtin_ia32_pcmpestria128 ((__v16qi)(X), (int)(LX), \
+				(__v16qi)(Y), (int)(LY), (M))
+#define _mm_cmpestrc(X, LX, Y, LY, M) \
+  __builtin_ia32_pcmpestric128 ((__v16qi)(X), (int)(LX), \
+				(__v16qi)(Y), (int)(LY), (M))
+#define _mm_cmpestro(X, LX, Y, LY, M) \
+  __builtin_ia32_pcmpestrio128 ((__v16qi)(X), (int)(LX), \
+				(__v16qi)(Y), (int)(LY), (M))
+#define _mm_cmpestrs(X, LX, Y, LY, M) \
+  __builtin_ia32_pcmpestris128 ((__v16qi)(X), (int)(LX), \
+				(__v16qi)(Y), (int)(LY), (M))
+#define _mm_cmpestrz(X, LX, Y, LY, M) \
+  __builtin_ia32_pcmpestriz128 ((__v16qi)(X), (int)(LX), \
+				(__v16qi)(Y), (int)(LY), (M))
+#endif
+
+/* Packed integer 64-bit comparison, zeroing or filling with ones
+   corresponding parts of result.  */
+static __inline __m128i __attribute__((__always_inline__))
+_mm_cmpgt_epi64 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_pcmpgtq ((__v2di)__X, (__v2di)__Y);
+}
+
+/* Calculate a number of bits set to 1.  */
+static __inline int __attribute__((__always_inline__))
+_mm_popcnt_u32 (unsigned int __X)
+{
+  return __builtin_popcount (__X);
+}
+
+#ifdef __x86_64__
+static __inline long long  __attribute__((__always_inline__))
+_mm_popcnt_u64 (unsigned long long __X)
+{
+  return __builtin_popcountll (__X);
+}
+#endif
+
+/* Accumulate CRC32 (polynomial 0x11EDC6F41) value.  */
+static __inline unsigned int __attribute__((__always_inline__))
+_mm_crc32_u8 (unsigned int __C, unsigned char __V)
+{
+  return __builtin_ia32_crc32qi (__C, __V);
+}
+
+static __inline unsigned int __attribute__((__always_inline__))
+_mm_crc32_u16 (unsigned int __C, unsigned short __V)
+{
+  return __builtin_ia32_crc32hi (__C, __V);
+}
+
+static __inline unsigned int __attribute__((__always_inline__))
+_mm_crc32_u32 (unsigned int __C, unsigned int __V)
+{
+  return __builtin_ia32_crc32si (__C, __V);
+}
+
+#ifdef __x86_64__
+static __inline unsigned long long __attribute__((__always_inline__))
+_mm_crc32_u64 (unsigned long long __C, unsigned long long __V)
+{
+  return __builtin_ia32_crc32di (__C, __V);
+}
+#endif
+
+#endif /* __SSE4_2__ */
+
 #endif /* __SSE4_1__ */
 
 #endif /* _SMMINTRIN_H_INCLUDED */
--- gcc/config/i386/sse.md.nni	2007-05-22 13:31:31.000000000 -0700
+++ gcc/config/i386/sse.md	2007-05-22 13:31:31.000000000 -0700
@@ -3633,6 +3633,16 @@
    (set_attr "prefix_data16" "1")
    (set_attr "mode" "TI")])
 
+(define_insn "sse4_2_gtv2di3"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(gt:V2DI
+	  (match_operand:V2DI 1 "nonimmediate_operand" "0")
+	  (match_operand:V2DI 2 "nonimmediate_operand" "xm")))]
+  "TARGET_SSE4_2"
+  "pcmpgtq\t{%2, %0|%0, %2}"
+  [(set_attr "type" "ssecmp")
+   (set_attr "mode" "TI")])
+
 (define_expand "vcond<mode>"
   [(set (match_operand:SSEMODE124 0 "register_operand" "")
         (if_then_else:SSEMODE124
@@ -3649,6 +3659,22 @@
     FAIL;
 })
 
+(define_expand "vcondv2di"
+  [(set (match_operand:V2DI 0 "register_operand" "")
+        (if_then_else:V2DI
+          (match_operator 3 ""
+            [(match_operand:V2DI 4 "nonimmediate_operand" "")
+             (match_operand:V2DI 5 "nonimmediate_operand" "")])
+          (match_operand:V2DI 1 "general_operand" "")
+          (match_operand:V2DI 2 "general_operand" "")))]
+  "TARGET_SSE4_2"
+{
+  if (ix86_expand_int_vcond (operands))
+    DONE;
+  else
+    FAIL;
+})
+
 (define_expand "vcondu<mode>"
   [(set (match_operand:SSEMODE124 0 "register_operand" "")
         (if_then_else:SSEMODE124
@@ -3665,6 +3691,22 @@
     FAIL;
 })
 
+(define_expand "vconduv2di"
+  [(set (match_operand:V2DI 0 "register_operand" "")
+        (if_then_else:V2DI
+          (match_operator 3 ""
+            [(match_operand:V2DI 4 "nonimmediate_operand" "")
+             (match_operand:V2DI 5 "nonimmediate_operand" "")])
+          (match_operand:V2DI 1 "general_operand" "")
+          (match_operand:V2DI 2 "general_operand" "")))]
+  "TARGET_SSE4_2"
+{
+  if (ix86_expand_int_vcond (operands))
+    DONE;
+  else
+    FAIL;
+})
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel integral logical operations
@@ -6372,3 +6414,157 @@
   [(set_attr "type" "ssecvt")
    (set_attr "prefix_extra" "1")
    (set_attr "mode" "V4SF")])
+
+(define_insn "sse4_2_pcmpestri"
+  [(set (reg:SI 2)
+	(unspec:SI
+	  [(match_operand:V16QI 0 "register_operand" "x")
+	   (reg:SI 0)
+	   (match_operand:V16QI 1 "nonimmediate_operand" "xm")
+	   (reg:SI 1)
+	   (match_operand:SI 2 "const_0_to_255_operand" "n")]
+	  UNSPEC_PCMPESTR))
+   (set (reg:CCPCMPESTR FLAGS_REG)
+	(unspec:CCPCMPESTR
+	  [(match_dup 0)
+	   (reg:SI 0)
+	   (match_dup 1)
+	   (reg:SI 1)
+	   (match_dup 2)]
+	  UNSPEC_PCMPESTR))]
+  "TARGET_SSE4_2"
+  "pcmpestri\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_data16" "1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "sse4_2_pcmpestrm"
+  [(set (reg:V16QI 21)
+	(unspec:V16QI
+	  [(match_operand:V16QI 0 "register_operand" "x")
+	   (reg:SI 0)
+	   (match_operand:V16QI 1 "nonimmediate_operand" "xm")
+	   (reg:SI 1)
+	   (match_operand:SI 2 "const_0_to_255_operand" "n")]
+	  UNSPEC_PCMPESTR))
+   (set (reg:CCPCMPESTR FLAGS_REG)
+	(unspec:CCPCMPESTR
+	  [(match_dup 0)
+	   (reg:SI 0)
+	   (match_dup 1)
+	   (reg:SI 1)
+	   (match_dup 2)]
+	  UNSPEC_PCMPESTR))]
+  "TARGET_SSE4_2"
+  "pcmpestrm\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_data16" "1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_split
+  [(set (reg:SI 2)
+	(unspec:SI
+	  [(match_operand:V16QI 0 "register_operand" "")
+	   (reg:SI 0)
+	   (match_operand:V16QI 1 "nonimmediate_operand" "")
+	   (reg:SI 1)
+	   (match_operand:SI 2 "const_0_to_255_operand" "")]
+	  UNSPEC_PCMPESTR))
+   (set (reg:CCPCMPESTR FLAGS_REG)
+	(unspec:CCPCMPESTR
+	  [(match_dup 0)
+	   (reg:SI 0)
+	   (match_dup 1)
+	   (reg:SI 1)
+	   (match_dup 2)]
+	  UNSPEC_PCMPESTR))]
+  "TARGET_SSE4_2 && ix86_pcmpstrm_ok (insn)"
+  [(parallel
+    [(set (reg:V16QI 21)
+	  (unspec:V16QI
+	    [(match_dup 0)
+	     (reg:SI 0)
+	     (match_dup 1)
+	     (reg:SI 1)
+	     (match_dup 2)]
+	    UNSPEC_PCMPESTR))
+     (set (reg:CCPCMPESTR FLAGS_REG)
+	  (unspec:CCPCMPESTR
+	    [(match_dup 0)
+	     (reg:SI 0)
+	     (match_dup 1)
+	     (reg:SI 1)
+	     (match_dup 2)]
+	    UNSPEC_PCMPESTR))])]
+  "")
+
+(define_insn "sse4_2_pcmpistri"
+  [(set (reg:SI 2)
+	(unspec:SI
+	  [(match_operand:V16QI 0 "register_operand" "x")
+	   (match_operand:V16QI 1 "nonimmediate_operand" "xm")
+	   (match_operand:SI 2 "const_0_to_255_operand" "n")]
+	  UNSPEC_PCMPISTR))
+   (set (reg:CCPCMPISTR FLAGS_REG)
+	(unspec:CCPCMPISTR
+	  [(match_dup 0)
+	   (match_dup 1)
+	   (match_dup 2)]
+	  UNSPEC_PCMPISTR))]
+  "TARGET_SSE4_2"
+  "pcmpistri\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_data16" "1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "sse4_2_pcmpistrm"
+  [(set (reg:V16QI 21)
+	(unspec:V16QI
+	  [(match_operand:V16QI 0 "register_operand" "x")
+	   (match_operand:V16QI 1 "nonimmediate_operand" "xm")
+	   (match_operand:SI 2 "const_0_to_255_operand" "n")]
+	  UNSPEC_PCMPISTR))
+   (set (reg:CCPCMPISTR FLAGS_REG)
+	(unspec:CCPCMPISTR
+	  [(match_dup 0)
+	   (match_dup 1)
+	   (match_dup 2)]
+	  UNSPEC_PCMPISTR))]
+  "TARGET_SSE4_2"
+  "pcmpistrm\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_data16" "1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_split
+  [(set (reg:SI 2)
+	(unspec:SI
+	  [(match_operand:V16QI 0 "register_operand" "")
+	   (match_operand:V16QI 1 "nonimmediate_operand" "")
+	   (match_operand:SI 2 "const_0_to_255_operand" "")]
+	  UNSPEC_PCMPISTR))
+   (set (reg:CCPCMPISTR FLAGS_REG)
+	(unspec:CCPCMPISTR
+	  [(match_dup 0)
+	   (match_dup 1)
+	   (match_dup 2)]
+	  UNSPEC_PCMPISTR))]
+  "TARGET_SSE4_2 && ix86_pcmpstrm_ok (insn)"
+  [(parallel
+    [(set (reg:V16QI 21)
+	  (unspec:V16QI
+	    [(match_dup 0)
+	     (match_dup 1)
+	     (match_dup 2)]
+	    UNSPEC_PCMPISTR))
+     (set (reg:CCPCMPISTR FLAGS_REG)
+	  (unspec:CCPCMPISTR
+	    [(match_dup 0)
+	     (match_dup 1)
+	     (match_dup 2)]
+	    UNSPEC_PCMPISTR))])]
+  "")
--- gcc/doc/extend.texi.nni	2007-05-22 13:31:31.000000000 -0700
+++ gcc/doc/extend.texi	2007-05-22 13:31:31.000000000 -0700
@@ -7481,6 +7481,54 @@ Generates the @code{pextrd} machine inst
 Generates the @code{pextrq} machine instruction in 64bit mode.
 @end table
 
+The following built-in functions are available when @option{-msse4.2} is
+used.  All of them generate the machine instruction that is part of the
+name.
+
+@smallexample
+v16qi __builtin_ia32_pcmpestrm128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestri128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestria128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestric128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestrio128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestris128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestriz128 (v16qi, int, v16qi, int, const int)
+v16qi __builtin_ia32_pcmpistrm128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistri128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistria128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistric128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistrio128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistris128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistriz128 (v16qi, v16qi, const int)
+__v2di __builtin_ia32_pcmpgtq (__v2di, __v2di)
+@end smallexample
+
+The following built-in functions are available when @option{-msse4.2} is
+used.
+
+@table @code
+unsigned int __builtin_ia32_crc32qi (unsigned int, unsigned char)
+Generates the @code{crc32b} machine instruction.
+unsigned int __builtin_ia32_crc32hi (unsigned int, unsigned short)
+Generates the @code{crc32w} machine instruction.
+unsigned int __builtin_ia32_crc32si (unsigned int, unsigned int)
+Generates the @code{crc32l} machine instruction.
+unsigned long long __builtin_ia32_crc32di (unsigned int, unsigned long long)
+@end table
+
+The following built-in functions are changed to generate new SSE4.2
+instructions when @option{-msse4.2} is used.
+
+@table @code
+int __builtin_popcount (unsigned int)
+Generates the @code{popcntl} machine instruction.
+int __builtin_popcountl (unsigned long)
+Generates the @code{popcntl} or @code{popcntq} machine instruction,
+depending on the size of @code{unsigned long}.
+int __builtin_popcountll (unsigned long long)
+Generates the @code{popcntq} machine instruction.
+@end table
+
 The following built-in functions are available when @option{-msse4a} is used.
 
 @smallexample
--- gcc/doc/invoke.texi.nni	2007-05-22 07:43:24.000000000 -0700
+++ gcc/doc/invoke.texi	2007-05-22 13:31:31.000000000 -0700
@@ -547,7 +547,7 @@ Objective-C and Objective-C++ Dialects}.
 -mno-fp-ret-in-387  -msoft-float @gol
 -mno-wide-multiply  -mrtd  -malign-double @gol
 -mpreferred-stack-boundary=@var{num} -mcx16 -msahf @gol
--mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 @gol
+-mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 @gol
 -msse4a -m3dnow -mpopcnt -mabm @gol
 -mthreads  -mno-align-stringops  -minline-all-stringops @gol
 -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
@@ -10263,6 +10263,10 @@ preferred alignment to @option{-mpreferr
 @itemx -mno-ssse3
 @item -msse4.1
 @itemx -mno-sse4.1
+@item -msse4.2
+@itemx -mno-sse4.2
+@item -msse4
+@itemx -mno-sse4
 @item -msse4a
 @item -mno-sse4a
 @item -m3dnow


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]