This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: PATCH: Add SSE4.2 support
On Mon, May 28, 2007 at 04:54:37PM +0200, Uros Bizjak wrote:
> Hello!
>
> > * config/i386/sse.md (sse4_2_gtv2di3): New pattern for
> > SSE4.2.
> > (vcondv2di): Likewise.
> > (vconduv2di): Likewise.
>
> >+(define_expand "vcondv2di"
>
> This pattern should be merged with existing vcond<mode> pattern, where
> mode macro is changed to SSEMODEI. The logic to reject unsupported
> pattern should be contained in ix86_expand_int_vcond(). Please note,
> that even in SSE4.1, we can expand vector condition for EQ and NE
> modes. Also, ix86_expand_int_vcond() generates code that simulates
> conditional move, and other things.
>
> >+(define_expand "vconduv2di"
>
> This expander should be merged with existing vcondu in the same way.
I am enclosing an updated patch to handle them.
>
> BTW: If it is not too much trouble, could string/text processing
> intrinsic be split out into separate patch? The first patch would then
> implement only SSE4.2 flags handling and logic/CRC operations that we
> are all somehow familiar with, and the second will add string
> processing.
>
I prefer to use one patch for SSE4.2 if possible at all. But I will
try to use 2 if there is no way around it.
Thanks.
H.J.
---
2007-05-29 H.J. Lu <hongjiu.lu@intel.com>
* config.gcc (i[34567]86-*-*): Add nmmintrin.h to
extra_headers.
(x86_64-*-*): Likewise.
* i386/i386-modes.def (CCPCMPESTR): New.
(CCPCMPISTR): Likewise.
* config/i386/i386-protos.h (ix86_pcmpstrm_ok): New.
* config/i386/i386.c (ix86_handle_option): Handle SSE4.2.
(override_options): Support SSE4.2.
(put_condition_code): Handle CCPCMPESTRmode and CCPCMPISTRmode.
(ix86_cc_modes_compatible): Likewise.
(ix86_expand_int_vcond): Support V2DImode.
(IX86_BUILTIN_CRC32QI): New for SSE4.2.
(IX86_BUILTIN_CRC32HI): Likewise.
(IX86_BUILTIN_CRC32SI): Likewise.
(IX86_BUILTIN_CRC32DI): Likewise.
(IX86_BUILTIN_PCMPESTRI128): Likewise.
(IX86_BUILTIN_PCMPESTRM128): Likewise.
(IX86_BUILTIN_PCMPESTRA128): Likewise.
(IX86_BUILTIN_PCMPESTRC128): Likewise.
(IX86_BUILTIN_PCMPESTRO128): Likewise.
(IX86_BUILTIN_PCMPESTRS128): Likewise.
(IX86_BUILTIN_PCMPESTRZ128): Likewise.
(IX86_BUILTIN_PCMPISTRI128): Likewise.
(IX86_BUILTIN_PCMPISTRM128): Likewise.
(IX86_BUILTIN_PCMPISTRA128): Likewise.
(IX86_BUILTIN_PCMPISTRC128): Likewise.
(IX86_BUILTIN_PCMPISTRO128): Likewise.
(IX86_BUILTIN_PCMPISTRS128): Likewise.
(IX86_BUILTIN_PCMPISTRZ128): Likewise.
(IX86_BUILTIN_PCMPGTQ): Likewise.
(BUILTIN_DESC_PCMPSTR_CC): Likewise.
(bdesc_pcmpestr): Likewise.
(bdesc_pcmpistr): Likewise.
(bdesc_crc32): Likewise.
(bdesc_sse_3arg): Likewise.
(ix86_expand_crc32): Likewise.
(ix86_check_pcmpstrm): Likewise.
(ix86_pcmpstrm_ok): Likewise.
(ix86_expand_sse_pcmpestr): Likewise.
(ix86_expand_sse_pcmpistr): Likewise.
(ix86_init_mmx_sse_builtins): Support SSE4.2.
(ix86_expand_builtin): Likewise.
* config/i386/i386.h (TARGET_CPU_CPP_BUILTINS): Define
__SSE4_2__ for -msse4.2.
(REVERSIBLE_CC_MODE): Return 0 for CCPCMPESTRmode or
CCPCMPISTRmode.
* config/i386/i386.md (UNSPEC_CRC32): New for SSE4.2.
(UNSPEC_PCMPESTR): Likewise.
(UNSPEC_PCMPISTR): Likewise.
(CRC32MODE): Likewise.
(crc32modesuffix): Likewise.
(crc32modeconstraint): Likewise.
(sse4_2_crc32<mode>): Likewise.
(sse4_2_crc32di): Likewise.
* config/i386/i386.opt (msse4.2): New for SSE4.2.
(msse4): Likewise.
* config/i386/nmmintrin.h: New. The dummy SSE4.2 intrinsic header
file.
* config/i386/predicates.md (fcmov_comparison_operator): Handle
CCPCMPESTRmode and CCPCMPISTRmode.
(ix86_comparison_operator): Likewise.
* config/i386/smmintrin.h: Add SSE4.2 intrinsics.
* config/i386/sse.md (sse4_2_gtv2di3): New pattern for
SSE4.2.
(sse4_2_pcmpestri): Likewise.
(sse4_2_pcmpestrm): Likewise.
(sse4_2_pcmpistri): Likewise.
(sse4_2_pcmpistrm): Likewise.
(vcond<mode>): Use SSEMODEI instead of SSEMODE124.
(vcondu<mode>): Likewise.
Split see4_2_pcmpestri to sse4_2_pcmpestrm if profitable.
Split see4_2_pcmpistri to sse4_2_pcmpistrm if profitable.
* doc/extend.texi: Document SSE4.2 built-in functions.
* doc/invoke.texi: Document -msse4.2/-msse4.
--- gcc/config.gcc.nni 2007-05-29 09:15:47.000000000 -0700
+++ gcc/config.gcc 2007-05-29 09:29:01.000000000 -0700
@@ -276,12 +276,14 @@ xscale-*-*)
i[34567]86-*-*)
cpu_type=i386
extra_headers="mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
- pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h"
+ pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
+ nmmintrin.h"
;;
x86_64-*-*)
cpu_type=i386
extra_headers="mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
- pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h"
+ pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
+ nmmintrin.h"
need_64bit_hwint=yes
;;
ia64-*-*)
--- gcc/config/i386/i386-modes.def.nni 2007-05-22 07:43:24.000000000 -0700
+++ gcc/config/i386/i386-modes.def 2007-05-29 09:29:01.000000000 -0700
@@ -53,7 +53,17 @@ ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG
mode is used to simulate comparisons of (a-b) and (a+b)
against zero using sub/cmp/add operations.
- Add CCZ to indicate that only the Zero flag is valid. */
+ Add CCZ to indicate that only the Zero flag is valid.
+
+ Add CCPCMPESTR/CCPCMPISTR for pcmp[ei]str[im] instructions:
+
+ suffix rtx_code
+ a GTU
+ c LTU
+ o UNGT OF == 1
+ s UNLT SF == 1
+ z EQ
+ */
CC_MODE (CCGC);
CC_MODE (CCGOC);
@@ -61,6 +71,8 @@ CC_MODE (CCNO);
CC_MODE (CCZ);
CC_MODE (CCFP);
CC_MODE (CCFPU);
+CC_MODE (CCPCMPESTR);
+CC_MODE (CCPCMPISTR);
/* Vector modes. */
VECTOR_MODES (INT, 4); /* V4QI V2HI */
--- gcc/config/i386/i386-protos.h.nni 2007-05-29 09:15:46.000000000 -0700
+++ gcc/config/i386/i386-protos.h 2007-05-29 09:29:01.000000000 -0700
@@ -89,6 +89,7 @@ extern void ix86_expand_binary_operator
extern int ix86_binary_operator_ok (enum rtx_code, enum machine_mode, rtx[]);
extern void ix86_expand_unary_operator (enum rtx_code, enum machine_mode,
rtx[]);
+extern int ix86_pcmpstrm_ok (rtx);
extern rtx ix86_build_const_vector (enum machine_mode, bool, rtx);
extern void ix86_split_convert_uns_si_sse (rtx[]);
extern void ix86_expand_convert_uns_didf_sse (rtx, rtx);
--- gcc/config/i386/i386.c.nni 2007-05-29 09:15:46.000000000 -0700
+++ gcc/config/i386/i386.c 2007-05-29 10:29:56.000000000 -0700
@@ -1571,9 +1571,10 @@ ix86_handle_option (size_t code, const c
if (!value)
{
target_flags &= ~(MASK_SSE2 | MASK_SSE3 | MASK_SSSE3
- | MASK_SSE4_1 | MASK_SSE4A);
+ | MASK_SSE4_1 | MASK_SSE4_2 | MASK_SSE4A);
target_flags_explicit |= (MASK_SSE2 | MASK_SSE3 | MASK_SSSE3
- | MASK_SSE4_1 | MASK_SSE4A);
+ | MASK_SSE4_1 | MASK_SSE4_2
+ | MASK_SSE4A);
}
return true;
@@ -1581,9 +1582,10 @@ ix86_handle_option (size_t code, const c
if (!value)
{
target_flags &= ~(MASK_SSE3 | MASK_SSSE3 | MASK_SSE4_1
- | MASK_SSE4A);
+ | MASK_SSE4_2 | MASK_SSE4A);
target_flags_explicit |= (MASK_SSE3 | MASK_SSSE3
- | MASK_SSE4_1 | MASK_SSE4A);
+ | MASK_SSE4_1 | MASK_SSE4_2
+ | MASK_SSE4A);
}
return true;
@@ -1592,21 +1594,31 @@ ix86_handle_option (size_t code, const c
{
target_flags &= ~(MASK_SSSE3 | MASK_SSE4_1 | MASK_SSE4A);
target_flags_explicit |= (MASK_SSSE3 | MASK_SSE4_1
- | MASK_SSE4A);
+ | MASK_SSE4_2 | MASK_SSE4A);
}
return true;
case OPT_mssse3:
if (!value)
{
- target_flags &= ~(MASK_SSE4_1 | MASK_SSE4A);
- target_flags_explicit |= MASK_SSE4_1 | MASK_SSE4A;
+ target_flags &= ~(MASK_SSE4_1 | MASK_SSE4_2 | MASK_SSE4A);
+ target_flags_explicit |= (MASK_SSE4_1 | MASK_SSE4_2
+ | MASK_SSE4A);
}
return true;
case OPT_msse4_1:
if (!value)
{
+ target_flags &= ~(MASK_SSE4_2 | MASK_SSE4A);
+ target_flags_explicit |= MASK_SSE4_2 | MASK_SSE4A;
+ }
+ return true;
+
+ case OPT_msse4:
+ case OPT_msse4_2:
+ if (!value)
+ {
target_flags &= ~MASK_SSE4A;
target_flags_explicit |= MASK_SSE4A;
}
@@ -1615,8 +1627,8 @@ ix86_handle_option (size_t code, const c
case OPT_msse4a:
if (!value)
{
- target_flags &= ~MASK_SSE4_1;
- target_flags_explicit |= MASK_SSE4_1;
+ target_flags &= ~(MASK_SSE4_1 | MASK_SSE4_2);
+ target_flags_explicit |= MASK_SSE4_1 | MASK_SSE4_2;
}
return true;
@@ -1694,7 +1706,8 @@ override_options (void)
PTA_ABM = 1 << 11,
PTA_SSE4A = 1 << 12,
PTA_NO_SAHF = 1 << 13,
- PTA_SSE4_1 = 1 << 14
+ PTA_SSE4_1 = 1 << 14,
+ PTA_SSE4_2 = 1 << 15
} flags;
}
const processor_alias_table[] =
@@ -1959,6 +1972,9 @@ override_options (void)
if (processor_alias_table[i].flags & PTA_SSE4_1
&& !(target_flags_explicit & MASK_SSE4_1))
target_flags |= MASK_SSE4_1;
+ if (processor_alias_table[i].flags & PTA_SSE4_2
+ && !(target_flags_explicit & MASK_SSE4_2))
+ target_flags |= MASK_SSE4_2;
if (processor_alias_table[i].flags & PTA_PREFETCH_SSE)
x86_prefetch_sse = true;
if (processor_alias_table[i].flags & PTA_CX16)
@@ -2164,6 +2180,10 @@ override_options (void)
if (!TARGET_80387)
target_flags |= MASK_NO_FANCY_MATH_387;
+ /* Turn on SSE4.1 builtins and popcnt instruction for -msse4.2. */
+ if (TARGET_SSE4_2)
+ target_flags |= MASK_SSE4_1 | MASK_POPCNT;
+
/* Turn on SSSE3 builtins for -msse4.1. */
if (TARGET_SSE4_1)
target_flags |= MASK_SSSE3;
@@ -8070,7 +8090,10 @@ put_condition_code (enum rtx_code code,
mode = CCmode;
}
if (reverse)
- code = reverse_condition (code);
+ {
+ gcc_assert (mode != CCPCMPESTRmode && mode != CCPCMPISTRmode);
+ code = reverse_condition (code);
+ }
switch (code)
{
@@ -8078,6 +8101,7 @@ put_condition_code (enum rtx_code code,
suffix = "e";
break;
case NE:
+ gcc_assert (mode != CCPCMPESTRmode && mode != CCPCMPISTRmode);
suffix = "ne";
break;
case GT:
@@ -8087,10 +8111,13 @@ put_condition_code (enum rtx_code code,
case GTU:
/* ??? Use "nbe" instead of "a" for fcmov lossage on some assemblers.
Those same assemblers have the same but opposite lossage on cmov. */
- gcc_assert (mode == CCmode);
+ gcc_assert (mode == CCmode
+ || mode == CCPCMPESTRmode
+ || mode == CCPCMPISTRmode);
suffix = fp ? "nbe" : "a";
break;
case LT:
+ gcc_assert (mode != CCPCMPESTRmode && mode != CCPCMPISTRmode);
switch (mode)
{
case CCNOmode:
@@ -8108,10 +8135,13 @@ put_condition_code (enum rtx_code code,
}
break;
case LTU:
- gcc_assert (mode == CCmode);
+ gcc_assert (mode == CCmode
+ || mode == CCPCMPESTRmode
+ || mode == CCPCMPISTRmode );
suffix = "b";
break;
case GE:
+ gcc_assert (mode != CCPCMPESTRmode && mode != CCPCMPISTRmode);
switch (mode)
{
case CCNOmode:
@@ -8147,6 +8177,14 @@ put_condition_code (enum rtx_code code,
case ORDERED:
suffix = fp ? "nu" : "np";
break;
+ case UNGT:
+ gcc_assert (mode == CCPCMPESTRmode || mode == CCPCMPISTRmode);
+ suffix = "o";
+ break;
+ case UNLT:
+ gcc_assert (mode == CCPCMPESTRmode || mode == CCPCMPISTRmode);
+ suffix = "s";
+ break;
default:
gcc_unreachable ();
}
@@ -10896,6 +10934,8 @@ ix86_cc_modes_compatible (enum machine_m
case CCFPmode:
case CCFPUmode:
+ case CCPCMPESTRmode:
+ case CCPCMPISTRmode:
/* These are only compatible with themselves, which we already
checked above. */
return VOIDmode;
@@ -12693,7 +12733,7 @@ ix86_expand_fp_vcond (rtx operands[])
return true;
}
-/* Expand a signed integral vector conditional move. */
+/* Expand a signed/unsigned integral vector conditional move. */
bool
ix86_expand_int_vcond (rtx operands[])
@@ -12737,6 +12777,29 @@ ix86_expand_int_vcond (rtx operands[])
gcc_unreachable ();
}
+ /* Only SSE4.1/SSE4.2 supports V2DImode. */
+ if (mode == V2DImode)
+ {
+ swicth (code)
+ {
+ case EQ:
+ /* SSE4.1 supports EQ. */
+ if (!TARGET_SSE4_1)
+ return false;
+ break;
+
+ case GT:
+ case GTU:
+ /* SSE4.2 supports GT/GTU. */
+ if (!TARGET_SSE4_2)
+ return false;
+ break;
+
+ default:
+ gcc_unreachable ();
+ }
+ }
+
/* Unsigned parallel compare is not supported by the hardware. Play some
tricks to turn this into a signed comparison against 0. */
if (code == GTU)
@@ -16591,6 +16654,29 @@ enum ix86_builtins
IX86_BUILTIN_VEC_SET_V4HI,
IX86_BUILTIN_VEC_SET_V16QI,
+ /* SSE4.2. */
+ IX86_BUILTIN_CRC32QI,
+ IX86_BUILTIN_CRC32HI,
+ IX86_BUILTIN_CRC32SI,
+ IX86_BUILTIN_CRC32DI,
+
+ IX86_BUILTIN_PCMPESTRI128,
+ IX86_BUILTIN_PCMPESTRM128,
+ IX86_BUILTIN_PCMPESTRA128,
+ IX86_BUILTIN_PCMPESTRC128,
+ IX86_BUILTIN_PCMPESTRO128,
+ IX86_BUILTIN_PCMPESTRS128,
+ IX86_BUILTIN_PCMPESTRZ128,
+ IX86_BUILTIN_PCMPISTRI128,
+ IX86_BUILTIN_PCMPISTRM128,
+ IX86_BUILTIN_PCMPISTRA128,
+ IX86_BUILTIN_PCMPISTRC128,
+ IX86_BUILTIN_PCMPISTRO128,
+ IX86_BUILTIN_PCMPISTRS128,
+ IX86_BUILTIN_PCMPISTRZ128,
+
+ IX86_BUILTIN_PCMPGTQ,
+
IX86_BUILTIN_MAX
};
@@ -16636,6 +16722,9 @@ def_builtin_const (int mask, const char
swap_comparison in order to support it. */
#define BUILTIN_DESC_SWAP_OPERANDS 1
+/* Set when we check FLAGS_REG for pcmp[ei]strm. */
+#define BUILTIN_DESC_PCMPSTR_CC 2
+
struct builtin_description
{
const unsigned int mask;
@@ -16682,6 +16771,39 @@ static const struct builtin_description
{ MASK_SSE4_1, CODE_FOR_sse4_1_ptest, "__builtin_ia32_ptestnzc128", IX86_BUILTIN_PTESTNZC, GTU, 0 },
};
+static const struct builtin_description bdesc_pcmpestr[] =
+{
+ /* SSE4.2 */
+ { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpestri, "__builtin_ia32_pcmpestri128", IX86_BUILTIN_PCMPESTRI128, 0, 0 },
+ { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpestrm, "__builtin_ia32_pcmpestrm128", IX86_BUILTIN_PCMPESTRM128, 0, 0 },
+ { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpestri, "__builtin_ia32_pcmpestria128", IX86_BUILTIN_PCMPESTRA128, GTU, BUILTIN_DESC_PCMPSTR_CC },
+ { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpestri, "__builtin_ia32_pcmpestric128", IX86_BUILTIN_PCMPESTRC128, LTU, BUILTIN_DESC_PCMPSTR_CC },
+ { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpestri, "__builtin_ia32_pcmpestrio128", IX86_BUILTIN_PCMPESTRO128, UNGT, BUILTIN_DESC_PCMPSTR_CC },
+ { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpestri, "__builtin_ia32_pcmpestris128", IX86_BUILTIN_PCMPESTRS128, UNLT, BUILTIN_DESC_PCMPSTR_CC },
+ { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpestri, "__builtin_ia32_pcmpestriz128", IX86_BUILTIN_PCMPESTRZ128, EQ, BUILTIN_DESC_PCMPSTR_CC },
+};
+
+static const struct builtin_description bdesc_pcmpistr[] =
+{
+ /* SSE4.2 */
+ { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpistri, "__builtin_ia32_pcmpistri128", IX86_BUILTIN_PCMPISTRI128, 0, 0 },
+ { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpistrm, "__builtin_ia32_pcmpistrm128", IX86_BUILTIN_PCMPISTRM128, 0, 0 },
+ { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpistri, "__builtin_ia32_pcmpistria128", IX86_BUILTIN_PCMPISTRA128, GTU, BUILTIN_DESC_PCMPSTR_CC },
+ { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpistri, "__builtin_ia32_pcmpistric128", IX86_BUILTIN_PCMPISTRC128, LTU, BUILTIN_DESC_PCMPSTR_CC },
+ { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpistri, "__builtin_ia32_pcmpistrio128", IX86_BUILTIN_PCMPISTRO128, UNGT, BUILTIN_DESC_PCMPSTR_CC },
+ { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpistri, "__builtin_ia32_pcmpistris128", IX86_BUILTIN_PCMPISTRS128, UNLT, BUILTIN_DESC_PCMPSTR_CC },
+ { MASK_SSE4_2, CODE_FOR_sse4_2_pcmpistri, "__builtin_ia32_pcmpistriz128", IX86_BUILTIN_PCMPISTRZ128, EQ, BUILTIN_DESC_PCMPSTR_CC },
+};
+
+static const struct builtin_description bdesc_crc32[] =
+{
+ /* SSE4.2 */
+ { MASK_SSE4_2 | MASK_64BIT, CODE_FOR_sse4_2_crc32qi, 0, IX86_BUILTIN_CRC32QI, 0, 0 },
+ { MASK_SSE4_2, CODE_FOR_sse4_2_crc32hi, 0, IX86_BUILTIN_CRC32HI, 0, 0 },
+ { MASK_SSE4_2, CODE_FOR_sse4_2_crc32si, 0, IX86_BUILTIN_CRC32SI, 0, 0 },
+ { MASK_SSE4_2, CODE_FOR_sse4_2_crc32di, 0, IX86_BUILTIN_CRC32DI, 0, 0 },
+};
+
/* SSE builtins with 3 arguments and the last argument must be a 8 bit
constant or xmm0. */
static const struct builtin_description bdesc_sse_3arg[] =
@@ -17014,6 +17136,9 @@ static const struct builtin_description
{ MASK_SSE4_1, CODE_FOR_uminv8hi3, "__builtin_ia32_pminuw128", IX86_BUILTIN_PMINUW128, 0, 0 },
{ MASK_SSE4_1, CODE_FOR_sse4_1_mulv2siv2di3, 0, IX86_BUILTIN_PMULDQ128, 0, 0 },
{ MASK_SSE4_1, CODE_FOR_mulv4si3, "__builtin_ia32_pmulld128", IX86_BUILTIN_PMULLD128, 0, 0 },
+
+ /* SSE4.2 */
+ { MASK_SSE4_2, CODE_FOR_sse4_2_gtv2di3, "__builtin_ia32_pcmpgtq", IX86_BUILTIN_PCMPGTQ, 0, 0 },
};
static const struct builtin_description bdesc_1arg[] =
@@ -17443,6 +17568,28 @@ ix86_init_mmx_sse_builtins (void)
= build_function_type_list (integer_type_node,
V2DI_type_node, V2DI_type_node,
NULL_TREE);
+ tree int_ftype_v16qi_int_v16qi_int_int
+ = build_function_type_list (integer_type_node,
+ V16QI_type_node,
+ integer_type_node,
+ V16QI_type_node,
+ integer_type_node,
+ integer_type_node,
+ NULL_TREE);
+ tree v16qi_ftype_v16qi_int_v16qi_int_int
+ = build_function_type_list (V16QI_type_node,
+ V16QI_type_node,
+ integer_type_node,
+ V16QI_type_node,
+ integer_type_node,
+ integer_type_node,
+ NULL_TREE);
+ tree int_ftype_v16qi_v16qi_int
+ = build_function_type_list (integer_type_node,
+ V16QI_type_node,
+ V16QI_type_node,
+ integer_type_node,
+ NULL_TREE);
tree float80_type;
tree float128_type;
@@ -17660,6 +17807,30 @@ ix86_init_mmx_sse_builtins (void)
for (i = 0, d = bdesc_ptest; i < ARRAY_SIZE (bdesc_ptest); i++, d++)
def_builtin (d->mask, d->name, int_ftype_v2di_v2di, d->code);
+ /* pcmpestr[im] insns. */
+ for (i = 0, d = bdesc_pcmpestr;
+ i < ARRAY_SIZE (bdesc_pcmpestr);
+ i++, d++)
+ {
+ if (d->icode == CODE_FOR_sse4_2_pcmpestrm)
+ ftype = v16qi_ftype_v16qi_int_v16qi_int_int;
+ else
+ ftype = int_ftype_v16qi_int_v16qi_int_int;
+ def_builtin (d->mask, d->name, ftype, d->code);
+ }
+
+ /* pcmpistr[im] insns. */
+ for (i = 0, d = bdesc_pcmpistr;
+ i < ARRAY_SIZE (bdesc_pcmpistr);
+ i++, d++)
+ {
+ if (d->icode == CODE_FOR_sse4_2_pcmpistrm)
+ ftype = v16qi_ftype_v16qi_v16qi_int;
+ else
+ ftype = int_ftype_v16qi_v16qi_int;
+ def_builtin (d->mask, d->name, ftype, d->code);
+ }
+
def_builtin (MASK_MMX, "__builtin_ia32_packsswb", v8qi_ftype_v4hi_v4hi, IX86_BUILTIN_PACKSSWB);
def_builtin (MASK_MMX, "__builtin_ia32_packssdw", v4hi_ftype_v2si_v2si, IX86_BUILTIN_PACKSSDW);
def_builtin (MASK_MMX, "__builtin_ia32_packuswb", v8qi_ftype_v4hi_v4hi, IX86_BUILTIN_PACKUSWB);
@@ -17871,6 +18042,32 @@ ix86_init_mmx_sse_builtins (void)
def_builtin_const (MASK_SSE4_1, "__builtin_ia32_roundss",
v4sf_ftype_v4sf_v4sf_int, IX86_BUILTIN_ROUNDSS);
+ /* SSE4.2. */
+ ftype = build_function_type_list (unsigned_type_node,
+ unsigned_type_node,
+ unsigned_char_type_node,
+ NULL_TREE);
+ def_builtin (MASK_SSE4_2, "__builtin_ia32_crc32qi",
+ ftype, IX86_BUILTIN_CRC32QI);
+ ftype = build_function_type_list (unsigned_type_node,
+ unsigned_type_node,
+ short_unsigned_type_node,
+ NULL_TREE);
+ def_builtin (MASK_SSE4_2, "__builtin_ia32_crc32hi",
+ ftype, IX86_BUILTIN_CRC32HI);
+ ftype = build_function_type_list (unsigned_type_node,
+ unsigned_type_node,
+ unsigned_type_node,
+ NULL_TREE);
+ def_builtin (MASK_SSE4_2, "__builtin_ia32_crc32si",
+ ftype, IX86_BUILTIN_CRC32SI);
+ ftype = build_function_type_list (long_long_unsigned_type_node,
+ long_long_unsigned_type_node,
+ long_long_unsigned_type_node,
+ NULL_TREE);
+ def_builtin (MASK_SSE4_2, "__builtin_ia32_crc32di",
+ ftype, IX86_BUILTIN_CRC32DI);
+
/* AMDFAM10 SSE4A New built-ins */
def_builtin (MASK_SSE4A, "__builtin_ia32_movntsd",
void_ftype_pdouble_v2df, IX86_BUILTIN_MOVNTSD);
@@ -18072,6 +18269,41 @@ ix86_expand_sse_4_operands_builtin (enum
return target;
}
+/* Subroutine of ix86_expand_builtin to take care of crc32 insns. */
+
+static rtx
+ix86_expand_crc32 (enum insn_code icode, tree exp, rtx target)
+{
+ rtx pat;
+ tree arg0 = CALL_EXPR_ARG (exp, 0);
+ tree arg1 = CALL_EXPR_ARG (exp, 1);
+ rtx op0 = expand_normal (arg0);
+ rtx op1 = expand_normal (arg1);
+ enum machine_mode tmode = insn_data[icode].operand[0].mode;
+ enum machine_mode mode0 = insn_data[icode].operand[1].mode;
+ enum machine_mode mode1 = insn_data[icode].operand[2].mode;
+
+ if (optimize
+ || !target
+ || GET_MODE (target) != tmode
+ || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
+ target = gen_reg_rtx (tmode);
+
+ if (!(*insn_data[icode].operand[1].predicate) (op0, mode0))
+ op0 = copy_to_mode_reg (mode0, op0);
+ if (!(*insn_data[icode].operand[2].predicate) (op1, mode1))
+ {
+ op1 = copy_to_reg (op1);
+ op1 = simplify_gen_subreg (mode1, op1, GET_MODE (op1), 0);
+ }
+
+ pat = GEN_FCN (icode) (target, op0, op1);
+ if (! pat)
+ return 0;
+ emit_insn (pat);
+ return target;
+}
+
/* Subroutine of ix86_expand_builtin to take care of binop insns. */
static rtx
@@ -18405,6 +18637,303 @@ ix86_expand_sse_ptest (const struct buil
return SUBREG_REG (target);
}
+/* Return TRUE if we should continue search. Otherwise return FALSE
+ and set REPLACE_P to 1 if we can replace pcmp[ei]stri with
+ pcmp[ei]strm. */
+
+static bool
+ix86_check_pcmpstrm (rtx insn, int index, int *replace_p)
+{
+ rtx set = PATTERN (insn);
+ rtx src;
+ int unspec;
+
+ if (GET_CODE (set) != PARALLEL)
+ return true;
+
+ set = XVECEXP (set, 0, 0);
+ if (GET_CODE (set) != SET)
+ return true;
+
+ src = SET_SRC (set);
+ if (GET_CODE (src) != UNSPEC)
+ return true;
+
+ unspec = XINT (src, 1);
+ if (unspec != UNSPEC_PCMPESTR && unspec != UNSPEC_PCMPISTR)
+ return true;
+
+ /* We only replace pcmp[ei]stri if the next pcmp[ei]strm is
+ compatible. */
+ if (unspec == index)
+ {
+ rtx dest = SET_DEST (set);
+
+ /* It can be either pcmp[ei]stri or pcmp[ei]strm. We only
+ replace pcmp[ei]stri with pcmp[ei]strm if this one is
+ pcmp[ei]strm. */
+ *replace_p = reg_or_subregno (dest) == FIRST_SSE_REG;
+ }
+
+ return false;
+}
+
+/* When we only use FLAGS_REG, there is no difference between those
+ pcmp[ei]stri or pcmp[ei]strm. We will replace pcmp[ei]stri with
+ pcmp[ei]strm if it can be optimized out later. */
+
+int
+ix86_pcmpstrm_ok (rtx insn)
+{
+ rtx set, src, p;
+ int index;
+ int replace = 0;
+
+ set = PATTERN (insn);
+ gcc_assert (GET_CODE (set) == PARALLEL);
+ set = XVECEXP (set, 0, 0);
+ src = SET_SRC (set);
+ gcc_assert (GET_CODE (set) == SET && GET_CODE (src) == UNSPEC);
+ index = XINT (src, 1);
+ gcc_assert (index == UNSPEC_PCMPESTR || index == UNSPEC_PCMPISTR);
+
+ /* We can't use pcmp[ei]strm if ecx is used. */
+ if (! find_regno_note (insn, REG_UNUSED, 2))
+ return replace;
+
+ /* If there is a compatible pcmp[ei]strm insn after this one, we
+ will replace pcmp[ei]stri with pcmp[ei]strm. We hope that 2 are
+ the same and one of them will be optimized out later. */
+ for (p = NEXT_INSN (insn); p; p = NEXT_INSN (p))
+ if (INSN_P (p))
+ {
+ /* We don't replace pcmp[ei]strm if we hit a label or a jump
+ insn. */
+ if (LABEL_P (p) || !NONJUMP_INSN_P (p))
+ break;
+
+ if (!ix86_check_pcmpstrm (p, index, &replace))
+ {
+ if (replace)
+ return replace;
+ break;
+ }
+ }
+
+ /* Search for a compatible pcmp[ei]strm insn before this one. */
+ for (p = PREV_INSN (insn); p; p = PREV_INSN (p))
+ if (INSN_P (p))
+ {
+ /* We don't replace pcmp[ei]strm if we hit a label or a jump
+ insn. */
+ if (LABEL_P (p) || !NONJUMP_INSN_P (p))
+ return replace;
+
+ if (!ix86_check_pcmpstrm (p, index, &replace))
+ return replace;
+ }
+
+ return replace;
+}
+
+/* Subroutine of ix86_expand_builtin to take care of pcmpestr[im]
+ insns. */
+
+static rtx
+ix86_expand_sse_pcmpestr (const struct builtin_description *d,
+ tree exp, rtx target)
+{
+ rtx pat;
+ tree arg0 = CALL_EXPR_ARG (exp, 0);
+ tree arg1 = CALL_EXPR_ARG (exp, 1);
+ tree arg2 = CALL_EXPR_ARG (exp, 2);
+ tree arg3 = CALL_EXPR_ARG (exp, 3);
+ tree arg4 = CALL_EXPR_ARG (exp, 4);
+ rtx op0 = expand_normal (arg0);
+ rtx op1 = expand_normal (arg1);
+ rtx op2 = expand_normal (arg2);
+ rtx op3 = expand_normal (arg3);
+ rtx op4 = expand_normal (arg4);
+ enum machine_mode modev0, modev1, modeimm;
+
+ modev0 = insn_data[d->icode].operand[0].mode;
+ modev1 = insn_data[d->icode].operand[1].mode;
+ modeimm = insn_data[d->icode].operand[2].mode;
+
+ if (VECTOR_MODE_P (modev0))
+ op0 = safe_vector_operand (op0, modev0);
+ if (VECTOR_MODE_P (modev1))
+ op2 = safe_vector_operand (op2, modev1);
+
+ if ((optimize && !register_operand (op0, modev0))
+ || !(*insn_data[d->icode].operand[0].predicate) (op0, modev0))
+ op0 = copy_to_mode_reg (modev0, op0);
+ if ((optimize && !register_operand (op2, modev1))
+ || !(*insn_data[d->icode].operand[1].predicate) (op2, modev1))
+ op2 = copy_to_mode_reg (modev1, op2);
+
+ if (! (*insn_data[d->icode].operand[2].predicate) (op4, modeimm))
+ {
+ error ("the fifth argument must be a 8-bit immediate");
+ return const0_rtx;
+ }
+
+ /* OP1 is the length of the first string which should be in eax. */
+ pat = gen_rtx_REG (SImode, 0);
+ emit_move_insn (pat, op1);
+
+ /* OP3 is the length of the second string which should be in edx. */
+ pat = gen_rtx_REG (SImode, 1);
+ emit_move_insn (pat, op3);
+
+ if ((d->flag & BUILTIN_DESC_PCMPSTR_CC) == 0)
+ {
+ enum machine_mode tmode;
+ unsigned int tregno;
+
+ if (d->icode == CODE_FOR_sse4_2_pcmpestri)
+ {
+ /* Index is returned in ecx. */
+ tmode = SImode;
+ tregno = 2;
+ }
+ else
+ {
+ /* Mask is returned in xmm0. */
+ tmode = V16QImode;
+ tregno = FIRST_SSE_REG;
+ }
+
+ if (optimize
+ || !target
+ || GET_MODE (target) != tmode
+ || ! (*insn_data[d->icode].operand[0].predicate) (target,
+ tmode)
+ || reg_or_subregno (target) != tregno)
+ target = gen_rtx_REG (tmode, tregno);
+
+ pat = GEN_FCN (d->icode) (op0, op2, op4);
+ if (! pat)
+ return 0;
+ emit_insn (pat);
+ return target;
+ }
+ else
+ {
+ target = gen_reg_rtx (SImode);
+ emit_move_insn (target, const0_rtx);
+ target = gen_rtx_SUBREG (QImode, target, 0);
+
+ pat = GEN_FCN (d->icode) (op0, op2, op4);
+ if (! pat)
+ return 0;
+ emit_insn (pat);
+ gcc_assert (GET_CODE (pat) == PARALLEL);
+ pat = XVECEXP (pat, 0, 1);
+ gcc_assert (GET_CODE (pat) == SET);
+ gcc_assert (GET_MODE (SET_DEST (pat)) == CCPCMPESTRmode);
+ emit_insn (gen_rtx_SET (VOIDmode,
+ gen_rtx_STRICT_LOW_PART (VOIDmode, target),
+ gen_rtx_fmt_ee (d->comparison, QImode,
+ SET_DEST (pat),
+ const0_rtx)));
+ return SUBREG_REG (target);
+ }
+}
+
+/* Subroutine of ix86_expand_builtin to take care of pcmpistr[im]
+ insns. */
+
+static rtx
+ix86_expand_sse_pcmpistr (const struct builtin_description *d,
+ tree exp, rtx target)
+{
+ rtx pat;
+ tree arg0 = CALL_EXPR_ARG (exp, 0);
+ tree arg1 = CALL_EXPR_ARG (exp, 1);
+ tree arg2 = CALL_EXPR_ARG (exp, 2);
+ rtx op0 = expand_normal (arg0);
+ rtx op1 = expand_normal (arg1);
+ rtx op2 = expand_normal (arg2);
+ enum machine_mode modev0, modev1, modeimm;
+
+ modev0 = insn_data[d->icode].operand[0].mode;
+ modev1 = insn_data[d->icode].operand[1].mode;
+ modeimm = insn_data[d->icode].operand[2].mode;
+
+ if (VECTOR_MODE_P (modev0))
+ op0 = safe_vector_operand (op0, modev0);
+ if (VECTOR_MODE_P (modev1))
+ op2 = safe_vector_operand (op2, modev1);
+
+ if ((optimize && !register_operand (op0, modev0))
+ || !(*insn_data[d->icode].operand[0].predicate) (op0, modev0))
+ op0 = copy_to_mode_reg (modev0, op0);
+ if ((optimize && !register_operand (op1, modev1))
+ || !(*insn_data[d->icode].operand[1].predicate) (op1, modev1))
+ op1 = copy_to_mode_reg (modev1, op1);
+
+ if (! (*insn_data[d->icode].operand[2].predicate) (op2, modeimm))
+ {
+ error ("the third argument must be a 8-bit immediate");
+ return const0_rtx;
+ }
+
+ if ((d->flag & BUILTIN_DESC_PCMPSTR_CC) == 0)
+ {
+ enum machine_mode tmode;
+ unsigned int tregno;
+
+ if (d->icode == CODE_FOR_sse4_2_pcmpistri)
+ {
+ /* Index is returned in ecx. */
+ tmode = SImode;
+ tregno = 2;
+ }
+ else
+ {
+ /* Mask is returned in xmm0. */
+ tmode = V16QImode;
+ tregno = FIRST_SSE_REG;
+ }
+
+ if (optimize
+ || !target
+ || GET_MODE (target) != tmode
+ || ! (*insn_data[d->icode].operand[0].predicate) (target,
+ tmode)
+ || reg_or_subregno (target) != tregno)
+ target = gen_rtx_REG (tmode, tregno);
+
+ pat = GEN_FCN (d->icode) (op0, op1, op2);
+ if (! pat)
+ return 0;
+ emit_insn (pat);
+ return target;
+ }
+ else
+ {
+ target = gen_reg_rtx (SImode);
+ emit_move_insn (target, const0_rtx);
+ target = gen_rtx_SUBREG (QImode, target, 0);
+
+ pat = GEN_FCN (d->icode) (op0, op1, op2);
+ if (! pat)
+ return 0;
+ emit_insn (pat);
+ gcc_assert (GET_CODE (pat) == PARALLEL);
+ pat = XVECEXP (pat, 0, 1);
+ gcc_assert (GET_CODE (pat) == SET);
+ gcc_assert (GET_MODE (SET_DEST (pat)) == CCPCMPISTRmode);
+ emit_insn (gen_rtx_SET (VOIDmode,
+ gen_rtx_STRICT_LOW_PART (VOIDmode, target),
+ gen_rtx_fmt_ee (d->comparison, QImode,
+ SET_DEST (pat),
+ const0_rtx)));
+ return SUBREG_REG (target);
+ }
+}
+
/* Return the integer constant in ARG. Constrain it to be in the range
of the subparts of VEC_TYPE; issue an error if not. */
@@ -19231,6 +19760,22 @@ ix86_expand_builtin (tree exp, rtx targe
if (d->code == fcode)
return ix86_expand_sse_ptest (d, exp, target);
+ for (i = 0, d = bdesc_crc32; i < ARRAY_SIZE (bdesc_crc32); i++, d++)
+ if (d->code == fcode)
+ return ix86_expand_crc32 (d->icode, exp, target);
+
+ for (i = 0, d = bdesc_pcmpestr;
+ i < ARRAY_SIZE (bdesc_pcmpestr);
+ i++, d++)
+ if (d->code == fcode)
+ return ix86_expand_sse_pcmpestr (d, exp, target);
+
+ for (i = 0, d = bdesc_pcmpistr;
+ i < ARRAY_SIZE (bdesc_pcmpistr);
+ i++, d++)
+ if (d->code == fcode)
+ return ix86_expand_sse_pcmpistr (d, exp, target);
+
gcc_unreachable ();
}
--- gcc/config/i386/i386.h.nni 2007-05-29 09:29:01.000000000 -0700
+++ gcc/config/i386/i386.h 2007-05-29 09:29:01.000000000 -0700
@@ -542,6 +542,8 @@ extern const char *host_detect_local_cpu
builtin_define ("__SSSE3__"); \
if (TARGET_SSE4_1) \
builtin_define ("__SSE4_1__"); \
+ if (TARGET_SSE4_2) \
+ builtin_define ("__SSE4_2__"); \
if (TARGET_SSE4A) \
builtin_define ("__SSE4A__"); \
if (TARGET_SSE_MATH && TARGET_SSE) \
@@ -2029,7 +2031,8 @@ do { \
/* Return nonzero if MODE implies a floating point inequality can be
reversed. */
-#define REVERSIBLE_CC_MODE(MODE) 1
+#define REVERSIBLE_CC_MODE(MODE) \
+ ((MODE) != CCPCMPESTRmode && (MODE) != CCPCMPISTRmode)
/* A C expression whose value is reversed condition code of the CODE for
comparison done in CC_MODE mode. */
--- gcc/config/i386/i386.md.nni 2007-05-29 09:15:46.000000000 -0700
+++ gcc/config/i386/i386.md 2007-05-29 09:29:01.000000000 -0700
@@ -173,6 +173,11 @@
(UNSPEC_PTEST 140)
(UNSPEC_ROUNDP 141)
(UNSPEC_ROUNDS 142)
+
+ ; For SSE4.2 support
+ (UNSPEC_CRC32 143)
+ (UNSPEC_PCMPESTR 144)
+ (UNSPEC_PCMPISTR 145)
])
(define_constants
@@ -20895,6 +20900,36 @@
}
[(set_attr "type" "multi")])
+(define_mode_macro CRC32MODE [QI HI SI])
+(define_mode_attr crc32modesuffix [(QI "b") (HI "w") (SI "l")])
+(define_mode_attr crc32modeconstraint [(QI "qm") (HI "rm") (SI "rm")])
+
+(define_insn "sse4_2_crc32<mode>"
+ [(set (match_operand:SI 0 "register_operand" "=r")
+ (unspec:SI
+ [(match_operand:SI 1 "register_operand" "0")
+ (match_operand:CRC32MODE 2 "nonimmediate_operand" "<crc32modeconstraint>")]
+ UNSPEC_CRC32))]
+ "TARGET_SSE4_2"
+ "crc32<crc32modesuffix>\t{%2, %0|%0, %2}"
+ [(set_attr "type" "sselog1")
+ (set_attr "prefix_rep" "1")
+ (set_attr "prefix_extra" "1")
+ (set_attr "mode" "SI")])
+
+(define_insn "sse4_2_crc32di"
+ [(set (match_operand:DI 0 "register_operand" "=r")
+ (unspec:DI
+ [(match_operand:DI 1 "register_operand" "0")
+ (match_operand:DI 2 "nonimmediate_operand" "rm")]
+ UNSPEC_CRC32))]
+ "TARGET_SSE4_2 && TARGET_64BIT"
+ "crc32q\t{%2, %0|%0, %2}"
+ [(set_attr "type" "sselog1")
+ (set_attr "prefix_rep" "1")
+ (set_attr "prefix_extra" "1")
+ (set_attr "mode" "DI")])
+
(include "mmx.md")
(include "sse.md")
(include "sync.md")
--- gcc/config/i386/i386.opt.nni 2007-05-22 07:43:24.000000000 -0700
+++ gcc/config/i386/i386.opt 2007-05-29 09:29:01.000000000 -0700
@@ -191,6 +191,14 @@ msse4.1
Target Report Mask(SSE4_1)
Support MMX, SSE, SSE2, SSE3, SSSE3 and SSE4.1 built-in functions and code generation
+msse4.2
+Target Report Mask(SSE4_2)
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in functions and code generation
+
+msse4
+Target Report Mask(SSE4_1|MASK_SSE4_2) MaskExists
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in functions and code generation
+
msse4a
Target Report Mask(SSE4A)
Support MMX, SSE, SSE2, SSE3 and SSE4A built-in functions and code generation
--- gcc/config/i386/nmmintrin.h.nni 2007-05-29 09:29:01.000000000 -0700
+++ gcc/config/i386/nmmintrin.h 2007-05-29 09:29:01.000000000 -0700
@@ -0,0 +1,40 @@
+/* Copyright (C) 2007 Free Software Foundation, Inc.
+
+ This file is part of GCC.
+
+ GCC is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2, or (at your option)
+ any later version.
+
+ GCC is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with GCC; see the file COPYING. If not, write to
+ the Free Software Foundation, 59 Temple Place - Suite 330,
+ Boston, MA 02111-1307, USA. */
+
+/* As a special exception, if you include this header file into source
+ files compiled by GCC, this header file does not by itself cause
+ the resulting executable to be covered by the GNU General Public
+ License. This exception does not however invalidate any other
+ reasons why the executable file might be covered by the GNU General
+ Public License. */
+
+/* Implemented from the specification included in the Intel C++ Compiler
+ User Guide and Reference, version 10.0. */
+
+#ifndef _NMMINTRIN_H_INCLUDED
+#define _NMMINTRIN_H_INCLUDED
+
+#ifndef __SSE4_2__
+# error "SSE4.2 instruction set not enabled"
+#else
+/* We just include SSE4.1 header file. */
+#include <smmintrin.h>
+#endif /* __SSE4_2__ */
+
+#endif /* _NMMINTRIN_H_INCLUDED */
--- gcc/config/i386/predicates.md.nni 2007-05-22 07:43:24.000000000 -0700
+++ gcc/config/i386/predicates.md 2007-05-29 09:29:01.000000000 -0700
@@ -856,7 +856,22 @@
enum machine_mode inmode = GET_MODE (XEXP (op, 0));
enum rtx_code code = GET_CODE (op);
- if (inmode == CCFPmode || inmode == CCFPUmode)
+ if (inmode == CCPCMPESTRmode || inmode == CCPCMPISTRmode)
+ {
+ switch (code)
+ {
+ case GTU:
+ case LTU:
+ case EQ:
+ return 1;
+ case UNGT:
+ case UNLT:
+ return 0;
+ default:
+ gcc_unreachable ();
+ }
+ }
+ else if (inmode == CCFPmode || inmode == CCFPUmode)
{
enum rtx_code second_code, bypass_code;
ix86_fp_comparison_codes (code, &bypass_code, &code, &second_code);
@@ -906,14 +921,23 @@
}
switch (code)
{
- case EQ: case NE:
+ case NE:
+ return inmode != CCPCMPESTRmode && inmode != CCPCMPISTRmode;
+ case EQ:
return 1;
case LT: case GE:
if (inmode == CCmode || inmode == CCGCmode
|| inmode == CCGOCmode || inmode == CCNOmode)
return 1;
return 0;
- case LTU: case GTU: case LEU: case ORDERED: case UNORDERED: case GEU:
+ case LTU:
+ case GTU:
+ if (inmode == CCmode
+ || inmode == CCPCMPESTRmode
+ || inmode == CCPCMPISTRmode)
+ return 1;
+ return 0;
+ case LEU: case ORDERED: case UNORDERED: case GEU:
if (inmode == CCmode)
return 1;
return 0;
@@ -921,6 +945,10 @@
if (inmode == CCmode || inmode == CCGCmode || inmode == CCNOmode)
return 1;
return 0;
+ case UNGT:
+ case UNLT:
+ if (inmode == CCPCMPESTRmode || inmode == CCPCMPISTRmode)
+ return 1;
default:
return 0;
}
--- gcc/config/i386/smmintrin.h.nni 2007-05-22 07:43:24.000000000 -0700
+++ gcc/config/i386/smmintrin.h 2007-05-29 09:29:01.000000000 -0700
@@ -573,6 +573,246 @@ _mm_stream_load_si128 (__m128i *__X)
return (__m128i) __builtin_ia32_movntdqa ((__v2di *) __X);
}
+#ifdef __SSE4_2__
+
+/* These macros specify the source data format. */
+#define SIDD_UBYTE_OPS 0x00
+#define SIDD_UWORD_OPS 0x01
+#define SIDD_SBYTE_OPS 0x02
+#define SIDD_SWORD_OPS 0x03
+
+/* These macros specify the comparison operation. */
+#define SIDD_CMP_EQUAL_ANY 0x00
+#define SIDD_CMP_RANGES 0x04
+#define SIDD_CMP_EQUAL_EACH 0x08
+#define SIDD_CMP_EQUAL_ORDERED 0x0c
+
+/* These macros specify the the polarity. */
+#define SIDD_POSITIVE_POLARITY 0x00
+#define SIDD_NEGATIVE_POLARITY 0x10
+#define SIDD_MASKED_POSITIVE_POLARITY 0x20
+#define SIDD_MASKED_NEGATIVE_POLARITY 0x30
+
+/* These macros specify the output selection in _mm_cmpXstri (). */
+#define SIDD_LEAST_SIGNIFICANT 0x00
+#define SIDD_MOST_SIGNIFICANT 0x40
+
+/* These macros specify the output selection in _mm_cmpXstrm (). */
+#define SIDD_BIT_MASK 0x00
+#define SIDD_UNIT_MASK 0x40
+
+/* Intrinsics for text/string processing. */
+
+#ifdef __OPTIMIZE__
+static __inline __m128i __attribute__((__always_inline__))
+_mm_cmpistrm (__m128i __X, __m128i __Y, const int __M)
+{
+ return (__m128i) __builtin_ia32_pcmpistrm128 ((__v16qi)__X,
+ (__v16qi)__Y,
+ __M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpistri (__m128i __X, __m128i __Y, const int __M)
+{
+ return __builtin_ia32_pcmpistri128 ((__v16qi)__X,
+ (__v16qi)__Y,
+ __M);
+}
+
+static __inline __m128i __attribute__((__always_inline__))
+_mm_cmpestrm (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M)
+{
+ return (__m128i) __builtin_ia32_pcmpestrm128 ((__v16qi)__X, __LX,
+ (__v16qi)__Y, __LY,
+ __M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpestri (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M)
+{
+ return __builtin_ia32_pcmpestri128 ((__v16qi)__X, __LX,
+ (__v16qi)__Y, __LY,
+ __M);
+}
+#else
+#define _mm_cmpistrm(X, Y, M) \
+ ((__m128i) __builtin_ia32_pcmpistrm128 ((__v16qi)(X), (__v16qi)(Y), (M)))
+#define _mm_cmpistri(X, Y, M) \
+ __builtin_ia32_pcmpistri128 ((__v16qi)(X), (__v16qi)(Y), (M))
+
+#define _mm_cmpestrm(X, LX, Y, LY, M) \
+ ((__m128i) __builtin_ia32_pcmpestrm128 ((__v16qi)(X), (int)(LX), \
+ (__v16qi)(Y), (int)(LY), (M)))
+#define _mm_cmpestri(X, LX, Y, LY, M) \
+ __builtin_ia32_pcmpestri128 ((__v16qi)(X), (int)(LX), \
+ (__v16qi)(Y), (int)(LY), (M))
+#endif
+
+/* Intrinsics for text/string processing and reading values of
+ EFlags. */
+
+#ifdef __OPTIMIZE__
+static __inline int __attribute__((__always_inline__))
+_mm_cmpistra (__m128i __X, __m128i __Y, const int __M)
+{
+ return __builtin_ia32_pcmpistria128 ((__v16qi)__X,
+ (__v16qi)__Y,
+ __M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpistrc (__m128i __X, __m128i __Y, const int __M)
+{
+ return __builtin_ia32_pcmpistric128 ((__v16qi)__X,
+ (__v16qi)__Y,
+ __M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpistro (__m128i __X, __m128i __Y, const int __M)
+{
+ return __builtin_ia32_pcmpistrio128 ((__v16qi)__X,
+ (__v16qi)__Y,
+ __M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpistrs (__m128i __X, __m128i __Y, const int __M)
+{
+ return __builtin_ia32_pcmpistris128 ((__v16qi)__X,
+ (__v16qi)__Y,
+ __M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpistrz (__m128i __X, __m128i __Y, const int __M)
+{
+ return __builtin_ia32_pcmpistriz128 ((__v16qi)__X,
+ (__v16qi)__Y,
+ __M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpestra (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M)
+{
+ return __builtin_ia32_pcmpestria128 ((__v16qi)__X, __LX,
+ (__v16qi)__Y, __LY,
+ __M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpestrc (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M)
+{
+ return __builtin_ia32_pcmpestric128 ((__v16qi)__X, __LX,
+ (__v16qi)__Y, __LY,
+ __M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpestro (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M)
+{
+ return __builtin_ia32_pcmpestrio128 ((__v16qi)__X, __LX,
+ (__v16qi)__Y, __LY,
+ __M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpestrs (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M)
+{
+ return __builtin_ia32_pcmpestris128 ((__v16qi)__X, __LX,
+ (__v16qi)__Y, __LY,
+ __M);
+}
+
+static __inline int __attribute__((__always_inline__))
+_mm_cmpestrz (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M)
+{
+ return __builtin_ia32_pcmpestriz128 ((__v16qi)__X, __LX,
+ (__v16qi)__Y, __LY,
+ __M);
+}
+#else
+#define _mm_cmpistra(X, Y, M) \
+ __builtin_ia32_pcmpistria128 ((__v16qi)(X), (__v16qi)(Y), (M))
+#define _mm_cmpistrc(X, Y, M) \
+ __builtin_ia32_pcmpistric128 ((__v16qi)(X), (__v16qi)(Y), (M))
+#define _mm_cmpistro(X, Y, M) \
+ __builtin_ia32_pcmpistrio128 ((__v16qi)(X), (__v16qi)(Y), (M))
+#define _mm_cmpistrs(X, Y, M) \
+ __builtin_ia32_pcmpistris128 ((__v16qi)(X), (__v16qi)(Y), (M))
+#define _mm_cmpistrz(X, Y, M) \
+ __builtin_ia32_pcmpistriz128 ((__v16qi)(X), (__v16qi)(Y), (M))
+
+#define _mm_cmpestra(X, LX, Y, LY, M) \
+ __builtin_ia32_pcmpestria128 ((__v16qi)(X), (int)(LX), \
+ (__v16qi)(Y), (int)(LY), (M))
+#define _mm_cmpestrc(X, LX, Y, LY, M) \
+ __builtin_ia32_pcmpestric128 ((__v16qi)(X), (int)(LX), \
+ (__v16qi)(Y), (int)(LY), (M))
+#define _mm_cmpestro(X, LX, Y, LY, M) \
+ __builtin_ia32_pcmpestrio128 ((__v16qi)(X), (int)(LX), \
+ (__v16qi)(Y), (int)(LY), (M))
+#define _mm_cmpestrs(X, LX, Y, LY, M) \
+ __builtin_ia32_pcmpestris128 ((__v16qi)(X), (int)(LX), \
+ (__v16qi)(Y), (int)(LY), (M))
+#define _mm_cmpestrz(X, LX, Y, LY, M) \
+ __builtin_ia32_pcmpestriz128 ((__v16qi)(X), (int)(LX), \
+ (__v16qi)(Y), (int)(LY), (M))
+#endif
+
+/* Packed integer 64-bit comparison, zeroing or filling with ones
+ corresponding parts of result. */
+static __inline __m128i __attribute__((__always_inline__))
+_mm_cmpgt_epi64 (__m128i __X, __m128i __Y)
+{
+ return (__m128i) __builtin_ia32_pcmpgtq ((__v2di)__X, (__v2di)__Y);
+}
+
+/* Calculate a number of bits set to 1. */
+static __inline int __attribute__((__always_inline__))
+_mm_popcnt_u32 (unsigned int __X)
+{
+ return __builtin_popcount (__X);
+}
+
+#ifdef __x86_64__
+static __inline long long __attribute__((__always_inline__))
+_mm_popcnt_u64 (unsigned long long __X)
+{
+ return __builtin_popcountll (__X);
+}
+#endif
+
+/* Accumulate CRC32 (polynomial 0x11EDC6F41) value. */
+static __inline unsigned int __attribute__((__always_inline__))
+_mm_crc32_u8 (unsigned int __C, unsigned char __V)
+{
+ return __builtin_ia32_crc32qi (__C, __V);
+}
+
+static __inline unsigned int __attribute__((__always_inline__))
+_mm_crc32_u16 (unsigned int __C, unsigned short __V)
+{
+ return __builtin_ia32_crc32hi (__C, __V);
+}
+
+static __inline unsigned int __attribute__((__always_inline__))
+_mm_crc32_u32 (unsigned int __C, unsigned int __V)
+{
+ return __builtin_ia32_crc32si (__C, __V);
+}
+
+#ifdef __x86_64__
+static __inline unsigned long long __attribute__((__always_inline__))
+_mm_crc32_u64 (unsigned long long __C, unsigned long long __V)
+{
+ return __builtin_ia32_crc32di (__C, __V);
+}
+#endif
+
+#endif /* __SSE4_2__ */
+
#endif /* __SSE4_1__ */
#endif /* _SMMINTRIN_H_INCLUDED */
--- gcc/config/i386/sse.md.nni 2007-05-29 09:15:46.000000000 -0700
+++ gcc/config/i386/sse.md 2007-05-29 10:04:27.000000000 -0700
@@ -3633,14 +3633,24 @@
(set_attr "prefix_data16" "1")
(set_attr "mode" "TI")])
+(define_insn "sse4_2_gtv2di3"
+ [(set (match_operand:V2DI 0 "register_operand" "=x")
+ (gt:V2DI
+ (match_operand:V2DI 1 "nonimmediate_operand" "0")
+ (match_operand:V2DI 2 "nonimmediate_operand" "xm")))]
+ "TARGET_SSE4_2"
+ "pcmpgtq\t{%2, %0|%0, %2}"
+ [(set_attr "type" "ssecmp")
+ (set_attr "mode" "TI")])
+
(define_expand "vcond<mode>"
- [(set (match_operand:SSEMODE124 0 "register_operand" "")
- (if_then_else:SSEMODE124
+ [(set (match_operand:SSEMODEI 0 "register_operand" "")
+ (if_then_else:SSEMODEI
(match_operator 3 ""
- [(match_operand:SSEMODE124 4 "nonimmediate_operand" "")
- (match_operand:SSEMODE124 5 "nonimmediate_operand" "")])
- (match_operand:SSEMODE124 1 "general_operand" "")
- (match_operand:SSEMODE124 2 "general_operand" "")))]
+ [(match_operand:SSEMODEI 4 "nonimmediate_operand" "")
+ (match_operand:SSEMODEI 5 "nonimmediate_operand" "")])
+ (match_operand:SSEMODEI 1 "general_operand" "")
+ (match_operand:SSEMODEI 2 "general_operand" "")))]
"TARGET_SSE2"
{
if (ix86_expand_int_vcond (operands))
@@ -3650,13 +3660,13 @@
})
(define_expand "vcondu<mode>"
- [(set (match_operand:SSEMODE124 0 "register_operand" "")
- (if_then_else:SSEMODE124
+ [(set (match_operand:SSEMODEI 0 "register_operand" "")
+ (if_then_else:SSEMODEI
(match_operator 3 ""
- [(match_operand:SSEMODE124 4 "nonimmediate_operand" "")
- (match_operand:SSEMODE124 5 "nonimmediate_operand" "")])
- (match_operand:SSEMODE124 1 "general_operand" "")
- (match_operand:SSEMODE124 2 "general_operand" "")))]
+ [(match_operand:SSEMODEI 4 "nonimmediate_operand" "")
+ (match_operand:SSEMODEI 5 "nonimmediate_operand" "")])
+ (match_operand:SSEMODEI 1 "general_operand" "")
+ (match_operand:SSEMODEI 2 "general_operand" "")))]
"TARGET_SSE2"
{
if (ix86_expand_int_vcond (operands))
@@ -6373,3 +6383,157 @@
[(set_attr "type" "ssecvt")
(set_attr "prefix_extra" "1")
(set_attr "mode" "V4SF")])
+
+(define_insn "sse4_2_pcmpestri"
+ [(set (reg:SI 2)
+ (unspec:SI
+ [(match_operand:V16QI 0 "register_operand" "x")
+ (reg:SI 0)
+ (match_operand:V16QI 1 "nonimmediate_operand" "xm")
+ (reg:SI 1)
+ (match_operand:SI 2 "const_0_to_255_operand" "n")]
+ UNSPEC_PCMPESTR))
+ (set (reg:CCPCMPESTR FLAGS_REG)
+ (unspec:CCPCMPESTR
+ [(match_dup 0)
+ (reg:SI 0)
+ (match_dup 1)
+ (reg:SI 1)
+ (match_dup 2)]
+ UNSPEC_PCMPESTR))]
+ "TARGET_SSE4_2"
+ "pcmpestri\t{%2, %1, %0|%0, %1, %2}"
+ [(set_attr "type" "sselog1")
+ (set_attr "prefix_data16" "1")
+ (set_attr "prefix_extra" "1")
+ (set_attr "mode" "TI")])
+
+(define_insn "sse4_2_pcmpestrm"
+ [(set (reg:V16QI 21)
+ (unspec:V16QI
+ [(match_operand:V16QI 0 "register_operand" "x")
+ (reg:SI 0)
+ (match_operand:V16QI 1 "nonimmediate_operand" "xm")
+ (reg:SI 1)
+ (match_operand:SI 2 "const_0_to_255_operand" "n")]
+ UNSPEC_PCMPESTR))
+ (set (reg:CCPCMPESTR FLAGS_REG)
+ (unspec:CCPCMPESTR
+ [(match_dup 0)
+ (reg:SI 0)
+ (match_dup 1)
+ (reg:SI 1)
+ (match_dup 2)]
+ UNSPEC_PCMPESTR))]
+ "TARGET_SSE4_2"
+ "pcmpestrm\t{%2, %1, %0|%0, %1, %2}"
+ [(set_attr "type" "sselog1")
+ (set_attr "prefix_data16" "1")
+ (set_attr "prefix_extra" "1")
+ (set_attr "mode" "TI")])
+
+(define_split
+ [(set (reg:SI 2)
+ (unspec:SI
+ [(match_operand:V16QI 0 "register_operand" "")
+ (reg:SI 0)
+ (match_operand:V16QI 1 "nonimmediate_operand" "")
+ (reg:SI 1)
+ (match_operand:SI 2 "const_0_to_255_operand" "")]
+ UNSPEC_PCMPESTR))
+ (set (reg:CCPCMPESTR FLAGS_REG)
+ (unspec:CCPCMPESTR
+ [(match_dup 0)
+ (reg:SI 0)
+ (match_dup 1)
+ (reg:SI 1)
+ (match_dup 2)]
+ UNSPEC_PCMPESTR))]
+ "TARGET_SSE4_2 && ix86_pcmpstrm_ok (insn)"
+ [(parallel
+ [(set (reg:V16QI 21)
+ (unspec:V16QI
+ [(match_dup 0)
+ (reg:SI 0)
+ (match_dup 1)
+ (reg:SI 1)
+ (match_dup 2)]
+ UNSPEC_PCMPESTR))
+ (set (reg:CCPCMPESTR FLAGS_REG)
+ (unspec:CCPCMPESTR
+ [(match_dup 0)
+ (reg:SI 0)
+ (match_dup 1)
+ (reg:SI 1)
+ (match_dup 2)]
+ UNSPEC_PCMPESTR))])]
+ "")
+
+(define_insn "sse4_2_pcmpistri"
+ [(set (reg:SI 2)
+ (unspec:SI
+ [(match_operand:V16QI 0 "register_operand" "x")
+ (match_operand:V16QI 1 "nonimmediate_operand" "xm")
+ (match_operand:SI 2 "const_0_to_255_operand" "n")]
+ UNSPEC_PCMPISTR))
+ (set (reg:CCPCMPISTR FLAGS_REG)
+ (unspec:CCPCMPISTR
+ [(match_dup 0)
+ (match_dup 1)
+ (match_dup 2)]
+ UNSPEC_PCMPISTR))]
+ "TARGET_SSE4_2"
+ "pcmpistri\t{%2, %1, %0|%0, %1, %2}"
+ [(set_attr "type" "sselog1")
+ (set_attr "prefix_data16" "1")
+ (set_attr "prefix_extra" "1")
+ (set_attr "mode" "TI")])
+
+(define_insn "sse4_2_pcmpistrm"
+ [(set (reg:V16QI 21)
+ (unspec:V16QI
+ [(match_operand:V16QI 0 "register_operand" "x")
+ (match_operand:V16QI 1 "nonimmediate_operand" "xm")
+ (match_operand:SI 2 "const_0_to_255_operand" "n")]
+ UNSPEC_PCMPISTR))
+ (set (reg:CCPCMPISTR FLAGS_REG)
+ (unspec:CCPCMPISTR
+ [(match_dup 0)
+ (match_dup 1)
+ (match_dup 2)]
+ UNSPEC_PCMPISTR))]
+ "TARGET_SSE4_2"
+ "pcmpistrm\t{%2, %1, %0|%0, %1, %2}"
+ [(set_attr "type" "sselog1")
+ (set_attr "prefix_data16" "1")
+ (set_attr "prefix_extra" "1")
+ (set_attr "mode" "TI")])
+
+(define_split
+ [(set (reg:SI 2)
+ (unspec:SI
+ [(match_operand:V16QI 0 "register_operand" "")
+ (match_operand:V16QI 1 "nonimmediate_operand" "")
+ (match_operand:SI 2 "const_0_to_255_operand" "")]
+ UNSPEC_PCMPISTR))
+ (set (reg:CCPCMPISTR FLAGS_REG)
+ (unspec:CCPCMPISTR
+ [(match_dup 0)
+ (match_dup 1)
+ (match_dup 2)]
+ UNSPEC_PCMPISTR))]
+ "TARGET_SSE4_2 && ix86_pcmpstrm_ok (insn)"
+ [(parallel
+ [(set (reg:V16QI 21)
+ (unspec:V16QI
+ [(match_dup 0)
+ (match_dup 1)
+ (match_dup 2)]
+ UNSPEC_PCMPISTR))
+ (set (reg:CCPCMPISTR FLAGS_REG)
+ (unspec:CCPCMPISTR
+ [(match_dup 0)
+ (match_dup 1)
+ (match_dup 2)]
+ UNSPEC_PCMPISTR))])]
+ "")
--- gcc/doc/extend.texi.nni 2007-05-29 09:29:01.000000000 -0700
+++ gcc/doc/extend.texi 2007-05-29 09:29:01.000000000 -0700
@@ -7509,6 +7509,54 @@ Generates the @code{pextrd} machine inst
Generates the @code{pextrq} machine instruction in 64bit mode.
@end table
+The following built-in functions are available when @option{-msse4.2} is
+used. All of them generate the machine instruction that is part of the
+name.
+
+@smallexample
+v16qi __builtin_ia32_pcmpestrm128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestri128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestria128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestric128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestrio128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestris128 (v16qi, int, v16qi, int, const int)
+int __builtin_ia32_pcmpestriz128 (v16qi, int, v16qi, int, const int)
+v16qi __builtin_ia32_pcmpistrm128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistri128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistria128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistric128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistrio128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistris128 (v16qi, v16qi, const int)
+int __builtin_ia32_pcmpistriz128 (v16qi, v16qi, const int)
+__v2di __builtin_ia32_pcmpgtq (__v2di, __v2di)
+@end smallexample
+
+The following built-in functions are available when @option{-msse4.2} is
+used.
+
+@table @code
+unsigned int __builtin_ia32_crc32qi (unsigned int, unsigned char)
+Generates the @code{crc32b} machine instruction.
+unsigned int __builtin_ia32_crc32hi (unsigned int, unsigned short)
+Generates the @code{crc32w} machine instruction.
+unsigned int __builtin_ia32_crc32si (unsigned int, unsigned int)
+Generates the @code{crc32l} machine instruction.
+unsigned long long __builtin_ia32_crc32di (unsigned int, unsigned long long)
+@end table
+
+The following built-in functions are changed to generate new SSE4.2
+instructions when @option{-msse4.2} is used.
+
+@table @code
+int __builtin_popcount (unsigned int)
+Generates the @code{popcntl} machine instruction.
+int __builtin_popcountl (unsigned long)
+Generates the @code{popcntl} or @code{popcntq} machine instruction,
+depending on the size of @code{unsigned long}.
+int __builtin_popcountll (unsigned long long)
+Generates the @code{popcntq} machine instruction.
+@end table
+
The following built-in functions are available when @option{-msse4a} is used.
@smallexample
--- gcc/doc/invoke.texi.nni 2007-05-25 17:43:03.000000000 -0700
+++ gcc/doc/invoke.texi 2007-05-29 09:29:01.000000000 -0700
@@ -548,7 +548,7 @@ Objective-C and Objective-C++ Dialects}.
-mno-fp-ret-in-387 -msoft-float @gol
-mno-wide-multiply -mrtd -malign-double @gol
-mpreferred-stack-boundary=@var{num} -mcx16 -msahf @gol
--mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 @gol
+-mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 @gol
-msse4a -m3dnow -mpopcnt -mabm @gol
-mthreads -mno-align-stringops -minline-all-stringops @gol
-mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol
@@ -10273,6 +10273,10 @@ preferred alignment to @option{-mpreferr
@itemx -mno-ssse3
@item -msse4.1
@itemx -mno-sse4.1
+@item -msse4.2
+@itemx -mno-sse4.2
+@item -msse4
+@itemx -mno-sse4
@item -msse4a
@item -mno-sse4a
@item -m3dnow