Patch #1 to add SSE5 support to the x86 GCC compiler

Michael Meissner michael.meissner@amd.com
Thu Sep 6 17:44:00 GMT 2007


Ok, I have made the minimal changes to the current source base in terms of
round, rint, etc. just to use TARGET_ROUND instead of TARGET_SSE4_1.  We can
deal with enhancements, etc. of the round, etc. functions in another patch.

Once again, this patch passes the bootstrap/make check on x86_64 (for both -m64
and -m32) and I'm running make check on my 32-bit system.

Any other comments from the x86 maintainers?

Coming up is patch #2 that adds the rest of the instructions and intrinsics.

-- 
Michael Meissner, AMD
90 Central Street, MS 83-29, Boxborough, MA, 01719, USA
michael.meissner@amd.com
-------------- next part --------------
2007-09-05  Michael Meissner  <michael.meissner@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>
	    Tony Linthicum  <tony.linthicum@amd.com>

	* config/i386/i386.h (TARGET_SSE5): New macro for SSE5.
	(TARGET_ROUND): New macro for the round/ptest instructions which
	are shared between SSE4.1 and SSE5.
	(OPTION_MASK_ISA_ROUND): Ditto.
	(OPTION_ISA_ROUND): Ditto.
	(TARGET_FUSED_MADD): New macro for -mfused-madd swtich.
	(TARGET_CPU_CPP_BUILTINS): Add SSE5 support.

	* config/i386/i386.opt (-msse5): New switch for SSE5 support.
	(-mfused-madd): New switch to give users control over whether the
	compiler optimizes to use the multiply/add SSE5 instructions.
	(-msse5-strict-memory): Make SSE5 instructions strict about
	whether only one memory operand is allowed before register
	allocation.

	* config/i386/i386.c (m_AMD_MULTIPLE): Rename from
	m_ATHLON_K8_AMDFAM10, and change all uses.
	(enum pta_flags): Add PTA_SSE5.
	(override_options): Add SSE5 support.
	(bdesc_ptest): Change OPTION_MASK_ISA_SSE4_1 to
	OPTION_MASK_ISA_ROUND for instructions that are shared between
	SSE4.1 and SSE5.
	(bdesc_2arg): Ditto.
	(bdesc_sse_3arg): Ditto.
	(ix86_sse5_valid_op_p): New function to validate SSE5 3 and 4
	operand instructions.
	(ix86_handle_option): Turn off 3dnow if -msse5.
	(print_operand): Add 'Y' code to print the test for the SSE5
	comparison operators.
	(ix86_expand_sse_movcc): Add SSE5 support.

	* config/i386/i386-protos.h (ix86_sse5_valid_op_p): Add
	declaration.

	* config/i386/i386.md (UNSPEC_SSE5_INTRINSIC_P): Add new UNSPEC
	constant for SSE5 support.
	(UNSPEC_SSE5_INTRINSIC_S): Ditto.
	(UNSPEC_SSE5_INTRINSIC_UNS): Ditto.
	(type attribute): Add ssemuladd, sseiadd1, ssecvt1, sse4arg types.
	(unit attribute): Add support for ssemuladd, ssecvt1, sseiadd1 sse4arg
	types.
	(memory attribute): Ditto.
	(sse4_1_round<mode>2): Use TARGET_ROUND instead of TARGET_SSE4_1.
	Use SSE4_1_ROUND_* constants instead of hard coded numbers.
	(rint<mode>2): Use TARGET_ROUND instead of TARGET_SSE4_1.
	(floor<mode>2): Ditto.
	(ceil<mode>2): Ditto.
	(btrunc<mode>2): Ditto.
	(nearbyintdf2): Ditto.
	(nearbyintsf2): Ditto.
	(sse_setccsf): Disable if SSE5.
	(sse_setccdf): Ditto.
	(sse5_setcc<mode>): New support for SSE5 conditional move.
	(sse5_pcmov_<mode>): Ditto.

	* config/i386/sse.md (SSEMODEF4): New mode macro for SSE5.
	(SSEMODEF2P): Ditto.
	(ssemodesuffixf4): New mode attribute for SSE5.
	(ssemodesuffixf2s): Ditto.
	(ssemodesuffixf2c): Ditto.
	(ssescalarmode): Ditto.
	(sse5_fmadd<mode>4): Add SSE5 floating point multiply/add
	instructions.
	(sse5_fmsub<mode>4): Ditto.
	(sse5_fnmadd<mode>4): Ditto.
	(sse5_fnmsub<mode>4): Ditto.
	(sse5ip_fmadd<mode>4): Ditto.
	(sse5ip_fmsub<mode>4): Ditto.
	(sse5ip_fnmadd<mode>4): Ditto.
	(sse5ip_fnmsub<mode>4): Ditto.
	(sse5is_fmadd<mode>4): Ditto.
	(sse5is_fmsub<mode>4): Ditto.
	(sse5is_fnmadd<mode>4): Ditto.
	(sse5is_fnmsub<mode>4): Ditto.
	(sse4_1_roundpd): Use TARGET_ROUND instead of TARGET_SSE4_1.
	(sse4_1_roundps): Ditto.
	(sse4_1_roundsd): Ditto.
	(sse4_1_roundss): Ditto.
	(sse_maskcmpv4sf3): Disable if SSE5 so the SSE5 instruction will
	be generated.
	(sse_maskcmpsf3): Ditto.
	(sse_vmmaskcmpv4sf3): Ditto.
	(sse2_maskcmpv2df3): Ditto.
	(sse2_maskcmpdf3): Ditto.
	(sse2_vmmaskcmpv2df3): Ditto.
	(sse2_eq<mode>3): Ditto.
	(sse2_gt<mode>3): Ditto.
	(sse5_pcmov_<mode>): Add SSE5 support.

	* config/i386/predicates.md (sse5_comparison_float_operator): New
	predicate to match the comparison operators supported by the SSE5
	com instruction.
	(ix86_comparison_int_operator): New predicate to match just the
	signed int comparisons.
	(ix86_comparison_uns_operator): New predicate to match just the
	unsigned int comparisons.

	* doc/invoke.texi (-msse5): Add documentation.
	(-mfused-madd): Ditto.
	(-msse5-strict-memory): Ditto.

*** gcc/config/i386/i386.h.~1~	2007-09-06 13:22:33.456889000 -0400
--- gcc/config/i386/i386.h	2007-08-31 14:16:20.143928000 -0400
*************** along with GCC; see the file COPYING3.  
*** 47,52 ****
--- 47,58 ----
  #define TARGET_SSE4_1	OPTION_ISA_SSE4_1
  #define TARGET_SSE4_2	OPTION_ISA_SSE4_2
  #define TARGET_SSE4A	OPTION_ISA_SSE4A
+ #define TARGET_SSE5	OPTION_ISA_SSE5
+ #define TARGET_ROUND	OPTION_ISA_ROUND
+ 
+ /* SSE5 and SSE4.1 define the same round instructions */
+ #define	OPTION_MASK_ISA_ROUND	(OPTION_MASK_ISA_SSE4_1 | OPTION_MASK_ISA_SSE5)
+ #define	OPTION_ISA_ROUND	((ix86_isa_flags & OPTION_MASK_ISA_ROUND) != 0)
  
  #include "config/vxworks-dummy.h"
  
*************** extern int x86_prefetch_sse;
*** 367,372 ****
--- 373,379 ----
  #define TARGET_PREFETCH_SSE	x86_prefetch_sse
  #define TARGET_SAHF		x86_sahf
  #define TARGET_RECIP		x86_recip
+ #define TARGET_FUSED_MADD	x86_fused_muladd
  
  #define ASSEMBLER_DIALECT	(ix86_asm_dialect)
  
*************** extern const char *host_detect_local_cpu
*** 580,585 ****
--- 587,594 ----
  	builtin_define ("__SSE4_2__");				\
        if (TARGET_SSE4A)						\
   	builtin_define ("__SSE4A__");		                \
+       if (TARGET_SSE5)						\
+ 	builtin_define ("__SSE5__");				\
        if (TARGET_SSE_MATH && TARGET_SSE)			\
  	builtin_define ("__SSE_MATH__");			\
        if (TARGET_SSE_MATH && TARGET_SSE2)			\
*** gcc/config/i386/i386.opt.~1~	2007-09-06 13:22:33.525958000 -0400
--- gcc/config/i386/i386.opt	2007-08-31 14:20:26.790374000 -0400
*************** msse4a
*** 244,249 ****
--- 244,253 ----
  Target Report Mask(ISA_SSE4A) Var(ix86_isa_flags) VarExists
  Support MMX, SSE, SSE2, SSE3 and SSE4A built-in functions and code generation
  
+ msse5
+ Target Report Mask(ISA_SSE5) Var(ix86_isa_flags) VarExists
+ Support SSE5 built-in functions and code generation
+ 
  ;; Instruction support
  
  mabm
*************** Support code generation of sahf instruct
*** 265,267 ****
--- 269,285 ----
  mrecip
  Target Report RejectNegative Var(x86_recip)
  Generate reciprocals instead of divss and sqrtss.
+ 
+ msse5-strict-memory
+ Target Report Var(x86_sse5_strict_memory)
+ Limit SSE5 instructions to a single memory operand internally before register
+ allocation.  This more closely matches the format of the hardware instructions
+ but it prevents some combinations from being discovered.  It is anticipated
+ that the need for this switch may disappear in the future as the compiler is
+ tuned.
+ 
+ mfused-madd
+ Target Report Var(x86_fused_muladd) Init(1)
+ Enable automatic generation of fused floating point multiply-add instructions
+ if the ISA supports such instructions.  The -mfused-madd option is on by
+ default.
*** gcc/config/i386/i386.c.~1~	2007-09-06 13:22:33.631063000 -0400
--- gcc/config/i386/i386.c	2007-09-06 11:13:37.022303000 -0400
*************** const struct processor_costs *ix86_cost 
*** 1030,1036 ****
  #define m_ATHLON  (1<<PROCESSOR_ATHLON)
  #define m_ATHLON_K8  (m_K8 | m_ATHLON)
  #define m_AMDFAM10  (1<<PROCESSOR_AMDFAM10)
! #define m_ATHLON_K8_AMDFAM10  (m_K8 | m_ATHLON | m_AMDFAM10)
  
  #define m_GENERIC32 (1<<PROCESSOR_GENERIC32)
  #define m_GENERIC64 (1<<PROCESSOR_GENERIC64)
--- 1030,1036 ----
  #define m_ATHLON  (1<<PROCESSOR_ATHLON)
  #define m_ATHLON_K8  (m_K8 | m_ATHLON)
  #define m_AMDFAM10  (1<<PROCESSOR_AMDFAM10)
! #define m_AMD_MULTIPLE  (m_K8 | m_ATHLON | m_AMDFAM10)
  
  #define m_GENERIC32 (1<<PROCESSOR_GENERIC32)
  #define m_GENERIC64 (1<<PROCESSOR_GENERIC64)
*************** unsigned int ix86_tune_features[X86_TUNE
*** 1045,1054 ****
       negatively, so enabling for Generic64 seems like good code size
       tradeoff.  We can't enable it for 32bit generic because it does not
       work well with PPro base chips.  */
!   m_386 | m_K6_GEODE | m_ATHLON_K8_AMDFAM10 | m_CORE2 | m_GENERIC64,
  
    /* X86_TUNE_PUSH_MEMORY */
!   m_386 | m_K6_GEODE | m_ATHLON_K8_AMDFAM10 | m_PENT4
    | m_NOCONA | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_ZERO_EXTEND_WITH_AND */
--- 1045,1054 ----
       negatively, so enabling for Generic64 seems like good code size
       tradeoff.  We can't enable it for 32bit generic because it does not
       work well with PPro base chips.  */
!   m_386 | m_K6_GEODE | m_AMD_MULTIPLE | m_CORE2 | m_GENERIC64,
  
    /* X86_TUNE_PUSH_MEMORY */
!   m_386 | m_K6_GEODE | m_AMD_MULTIPLE | m_PENT4
    | m_NOCONA | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_ZERO_EXTEND_WITH_AND */
*************** unsigned int ix86_tune_features[X86_TUNE
*** 1058,1067 ****
    m_386,
  
    /* X86_TUNE_UNROLL_STRLEN */
!   m_486 | m_PENT | m_PPRO | m_ATHLON_K8_AMDFAM10 | m_K6 | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_DEEP_BRANCH_PREDICTION */
!   m_PPRO | m_K6_GEODE | m_ATHLON_K8_AMDFAM10 | m_PENT4 | m_GENERIC,
  
    /* X86_TUNE_BRANCH_PREDICTION_HINTS: Branch hints were put in P4 based
       on simulation result. But after P4 was made, no performance benefit
--- 1058,1067 ----
    m_386,
  
    /* X86_TUNE_UNROLL_STRLEN */
!   m_486 | m_PENT | m_PPRO | m_AMD_MULTIPLE | m_K6 | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_DEEP_BRANCH_PREDICTION */
!   m_PPRO | m_K6_GEODE | m_AMD_MULTIPLE | m_PENT4 | m_GENERIC,
  
    /* X86_TUNE_BRANCH_PREDICTION_HINTS: Branch hints were put in P4 based
       on simulation result. But after P4 was made, no performance benefit
*************** unsigned int ix86_tune_features[X86_TUNE
*** 1078,1084 ****
  
    /* X86_TUNE_MOVX: Enable to zero extend integer registers to avoid
       partial dependencies.  */
!   m_ATHLON_K8_AMDFAM10 | m_PPRO | m_PENT4 | m_NOCONA
    | m_CORE2 | m_GENERIC | m_GEODE /* m_386 | m_K6 */,
  
    /* X86_TUNE_PARTIAL_REG_STALL: We probably ought to watch for partial
--- 1078,1084 ----
  
    /* X86_TUNE_MOVX: Enable to zero extend integer registers to avoid
       partial dependencies.  */
!   m_AMD_MULTIPLE | m_PPRO | m_PENT4 | m_NOCONA
    | m_CORE2 | m_GENERIC | m_GEODE /* m_386 | m_K6 */,
  
    /* X86_TUNE_PARTIAL_REG_STALL: We probably ought to watch for partial
*************** unsigned int ix86_tune_features[X86_TUNE
*** 1098,1104 ****
    m_386 | m_486 | m_K6_GEODE,
  
    /* X86_TUNE_USE_SIMODE_FIOP */
!   ~(m_PPRO | m_ATHLON_K8_AMDFAM10 | m_PENT | m_CORE2 | m_GENERIC),
  
    /* X86_TUNE_USE_MOV0 */
    m_K6,
--- 1098,1104 ----
    m_386 | m_486 | m_K6_GEODE,
  
    /* X86_TUNE_USE_SIMODE_FIOP */
!   ~(m_PPRO | m_AMD_MULTIPLE | m_PENT | m_CORE2 | m_GENERIC),
  
    /* X86_TUNE_USE_MOV0 */
    m_K6,
*************** unsigned int ix86_tune_features[X86_TUNE
*** 1119,1125 ****
    ~(m_PENT | m_PPRO),
  
    /* X86_TUNE_PROMOTE_QIMODE */
!   m_K6_GEODE | m_PENT | m_386 | m_486 | m_ATHLON_K8_AMDFAM10 | m_CORE2
    | m_GENERIC /* | m_PENT4 ? */,
  
    /* X86_TUNE_FAST_PREFIX */
--- 1119,1125 ----
    ~(m_PENT | m_PPRO),
  
    /* X86_TUNE_PROMOTE_QIMODE */
!   m_K6_GEODE | m_PENT | m_386 | m_486 | m_AMD_MULTIPLE | m_CORE2
    | m_GENERIC /* | m_PENT4 ? */,
  
    /* X86_TUNE_FAST_PREFIX */
*************** unsigned int ix86_tune_features[X86_TUNE
*** 1144,1169 ****
    m_PPRO,
  
    /* X86_TUNE_ADD_ESP_4: Enable if add/sub is preferred over 1/2 push/pop.  */
!   m_ATHLON_K8_AMDFAM10 | m_K6_GEODE | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_ADD_ESP_8 */
!   m_ATHLON_K8_AMDFAM10 | m_PPRO | m_K6_GEODE | m_386
    | m_486 | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_SUB_ESP_4 */
!   m_ATHLON_K8_AMDFAM10 | m_PPRO | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_SUB_ESP_8 */
!   m_ATHLON_K8_AMDFAM10 | m_PPRO | m_386 | m_486
    | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_INTEGER_DFMODE_MOVES: Enable if integer moves are preferred
       for DFmode copies */
!   ~(m_ATHLON_K8_AMDFAM10 | m_PENT4 | m_NOCONA | m_PPRO | m_CORE2
      | m_GENERIC | m_GEODE),
  
    /* X86_TUNE_PARTIAL_REG_DEPENDENCY */
!   m_ATHLON_K8_AMDFAM10 | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY: In the Generic model we have a
       conflict here in between PPro/Pentium4 based chips that thread 128bit
--- 1144,1169 ----
    m_PPRO,
  
    /* X86_TUNE_ADD_ESP_4: Enable if add/sub is preferred over 1/2 push/pop.  */
!   m_AMD_MULTIPLE | m_K6_GEODE | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_ADD_ESP_8 */
!   m_AMD_MULTIPLE | m_PPRO | m_K6_GEODE | m_386
    | m_486 | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_SUB_ESP_4 */
!   m_AMD_MULTIPLE | m_PPRO | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_SUB_ESP_8 */
!   m_AMD_MULTIPLE | m_PPRO | m_386 | m_486
    | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_INTEGER_DFMODE_MOVES: Enable if integer moves are preferred
       for DFmode copies */
!   ~(m_AMD_MULTIPLE | m_PENT4 | m_NOCONA | m_PPRO | m_CORE2
      | m_GENERIC | m_GEODE),
  
    /* X86_TUNE_PARTIAL_REG_DEPENDENCY */
!   m_AMD_MULTIPLE | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY: In the Generic model we have a
       conflict here in between PPro/Pentium4 based chips that thread 128bit
*************** unsigned int ix86_tune_features[X86_TUNE
*** 1186,1198 ****
    m_ATHLON_K8,
  
    /* X86_TUNE_SSE_TYPELESS_STORES */
!   m_ATHLON_K8_AMDFAM10,
  
    /* X86_TUNE_SSE_LOAD0_BY_PXOR */
    m_PPRO | m_PENT4 | m_NOCONA,
  
    /* X86_TUNE_MEMORY_MISMATCH_STALL */
!   m_ATHLON_K8_AMDFAM10 | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_PROLOGUE_USING_MOVE */
    m_ATHLON_K8 | m_PPRO | m_CORE2 | m_GENERIC,
--- 1186,1198 ----
    m_ATHLON_K8,
  
    /* X86_TUNE_SSE_TYPELESS_STORES */
!   m_AMD_MULTIPLE,
  
    /* X86_TUNE_SSE_LOAD0_BY_PXOR */
    m_PPRO | m_PENT4 | m_NOCONA,
  
    /* X86_TUNE_MEMORY_MISMATCH_STALL */
!   m_AMD_MULTIPLE | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_PROLOGUE_USING_MOVE */
    m_ATHLON_K8 | m_PPRO | m_CORE2 | m_GENERIC,
*************** unsigned int ix86_tune_features[X86_TUNE
*** 1204,1229 ****
    ~m_486,
  
    /* X86_TUNE_USE_FFREEP */
!   m_ATHLON_K8_AMDFAM10,
  
    /* X86_TUNE_INTER_UNIT_MOVES */
!   ~(m_ATHLON_K8_AMDFAM10 | m_GENERIC),
  
    /* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more
       than 4 branch instructions in the 16 byte window.  */
!   m_PPRO | m_ATHLON_K8_AMDFAM10 | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_SCHEDULE */
!   m_PPRO | m_ATHLON_K8_AMDFAM10 | m_K6_GEODE | m_PENT | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_USE_BT */
!   m_ATHLON_K8_AMDFAM10,
  
    /* X86_TUNE_USE_INCDEC */
    ~(m_PENT4 | m_NOCONA | m_GENERIC),
  
    /* X86_TUNE_PAD_RETURNS */
!   m_ATHLON_K8_AMDFAM10 | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_EXT_80387_CONSTANTS */
    m_K6_GEODE | m_ATHLON_K8 | m_PENT4 | m_NOCONA | m_PPRO | m_CORE2 | m_GENERIC,
--- 1204,1229 ----
    ~m_486,
  
    /* X86_TUNE_USE_FFREEP */
!   m_AMD_MULTIPLE,
  
    /* X86_TUNE_INTER_UNIT_MOVES */
!   ~(m_AMD_MULTIPLE | m_GENERIC),
  
    /* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more
       than 4 branch instructions in the 16 byte window.  */
!   m_PPRO | m_AMD_MULTIPLE | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_SCHEDULE */
!   m_PPRO | m_AMD_MULTIPLE | m_K6_GEODE | m_PENT | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_USE_BT */
!   m_AMD_MULTIPLE,
  
    /* X86_TUNE_USE_INCDEC */
    ~(m_PENT4 | m_NOCONA | m_GENERIC),
  
    /* X86_TUNE_PAD_RETURNS */
!   m_AMD_MULTIPLE | m_CORE2 | m_GENERIC,
  
    /* X86_TUNE_EXT_80387_CONSTANTS */
    m_K6_GEODE | m_ATHLON_K8 | m_PENT4 | m_NOCONA | m_PPRO | m_CORE2 | m_GENERIC,
*************** unsigned int ix86_arch_features[X86_ARCH
*** 1279,1288 ****
  };
  
  static const unsigned int x86_accumulate_outgoing_args
!   = m_ATHLON_K8_AMDFAM10 | m_PENT4 | m_NOCONA | m_PPRO | m_CORE2 | m_GENERIC;
  
  static const unsigned int x86_arch_always_fancy_math_387
!   = m_PENT | m_PPRO | m_ATHLON_K8_AMDFAM10 | m_PENT4
      | m_NOCONA | m_CORE2 | m_GENERIC;
  
  static enum stringop_alg stringop_alg = no_stringop;
--- 1279,1288 ----
  };
  
  static const unsigned int x86_accumulate_outgoing_args
!   = m_AMD_MULTIPLE | m_PENT4 | m_NOCONA | m_PPRO | m_CORE2 | m_GENERIC;
  
  static const unsigned int x86_arch_always_fancy_math_387
!   = m_PENT | m_PPRO | m_AMD_MULTIPLE | m_PENT4
      | m_NOCONA | m_CORE2 | m_GENERIC;
  
  static enum stringop_alg stringop_alg = no_stringop;
*************** static int ix86_isa_flags_explicit;
*** 1620,1625 ****
--- 1620,1628 ----
  
  #define OPTION_MASK_ISA_SSE4A_UNSET OPTION_MASK_ISA_SSE4
  
+ #define OPTION_MASK_ISA_SSE5_UNSET \
+   (OPTION_MASK_ISA_3DNOW | OPTION_MASK_ISA_3DNOW_UNSET)
+ 
  /* Vectorization library interface and handlers.  */
  tree (*ix86_veclib_handler)(enum built_in_function, tree, tree) = NULL;
  static tree ix86_veclibabi_acml (enum built_in_function, tree, tree);
*************** ix86_handle_option (size_t code, const c
*** 1725,1730 ****
--- 1728,1742 ----
  	}
        return true;
  
+     case OPT_msse5:
+       ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE5;
+       if (!value)
+ 	{
+ 	  ix86_isa_flags &= ~OPTION_MASK_ISA_SSE5_UNSET;
+ 	  ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE5_UNSET;
+ 	}
+       return true;
+ 
      default:
        return true;
      }
*************** override_options (void)
*** 1795,1801 ****
        PTA_SSE4A = 1 << 12,
        PTA_NO_SAHF = 1 << 13,
        PTA_SSE4_1 = 1 << 14,
!       PTA_SSE4_2 = 1 << 15
      };
  
    static struct pta
--- 1807,1814 ----
        PTA_SSE4A = 1 << 12,
        PTA_NO_SAHF = 1 << 13,
        PTA_SSE4_1 = 1 << 14,
!       PTA_SSE4_2 = 1 << 15,
!       PTA_SSE5 = 1 << 16
      };
  
    static struct pta
*************** override_options (void)
*** 2088,2093 ****
--- 2101,2109 ----
  	if (processor_alias_table[i].flags & PTA_SSE4A
  	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_SSE4A))
  	  ix86_isa_flags |= OPTION_MASK_ISA_SSE4A;
+ 	if (processor_alias_table[i].flags & PTA_SSE5
+ 	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_SSE5))
+ 	  ix86_isa_flags |= OPTION_MASK_ISA_SSE5;
  
  	if (processor_alias_table[i].flags & PTA_ABM)
  	  x86_abm = true;
*************** override_options (void)
*** 2315,2320 ****
--- 2331,2340 ----
    if (!TARGET_80387)
      target_flags |= MASK_NO_FANCY_MATH_387;
  
+   /* Turn on SSE4A bultins for -msse5.  */
+   if (TARGET_SSE5)
+     ix86_isa_flags |= OPTION_MASK_ISA_SSE4A;
+ 
    /* Turn on SSE4.1 builtins for -msse4.2.  */
    if (TARGET_SSE4_2)
      ix86_isa_flags |= OPTION_MASK_ISA_SSE4_1;
*************** get_some_local_dynamic_name (void)
*** 8511,8516 ****
--- 8531,8537 ----
     X -- don't print any sort of PIC '@' suffix for a symbol.
     & -- print some in-use local-dynamic symbol name.
     H -- print a memory address offset by 8; used for sse high-parts
+    Y -- print condition for SSE5 com* instruction.
     + -- print a branch hint as 'cs' or 'ds' prefix
     ; -- print a semicolon (after prefixes due to bug in older gas).
   */
*************** print_operand (FILE *file, rtx x, int co
*** 8795,8800 ****
--- 8816,8874 ----
  	    return;
  	  }
  
+ 	case 'Y':
+ 	  switch (GET_CODE (x))
+ 	    {
+ 	    case NE:
+ 	      fputs ("neq", file);
+ 	      break;
+ 	    case EQ:
+ 	      fputs ("eq", file);
+ 	      break;
+ 	    case GE:
+ 	    case GEU:
+ 	      fputs (INTEGRAL_MODE_P (GET_MODE (x)) ? "ge" : "unlt", file);
+ 	      break;
+ 	    case GT:
+ 	    case GTU:
+ 	      fputs (INTEGRAL_MODE_P (GET_MODE (x)) ? "gt" : "unle", file);
+ 	      break;
+ 	    case LE:
+ 	    case LEU:
+ 	      fputs ("le", file);
+ 	      break;
+ 	    case LT:
+ 	    case LTU:
+ 	      fputs ("lt", file);
+ 	      break;
+ 	    case UNORDERED:
+ 	      fputs ("unord", file);
+ 	      break;
+ 	    case ORDERED:
+ 	      fputs ("ord", file);
+ 	      break;
+ 	    case UNEQ:
+ 	      fputs ("ueq", file);
+ 	      break;
+ 	    case UNGE:
+ 	      fputs ("nlt", file);
+ 	      break;
+ 	    case UNGT:
+ 	      fputs ("nle", file);
+ 	      break;
+ 	    case UNLE:
+ 	      fputs ("ule", file);
+ 	      break;
+ 	    case UNLT:
+ 	      fputs ("ult", file);
+ 	      break;
+ 	    case LTGT:
+ 	      fputs ("une", file);
+ 	      break;
+ 	    default:
+ 	      gcc_unreachable ();
+ 	    }
+ 
  	case ';':
  #if TARGET_MACHO
  	  fputs (" ; ", file);
*************** static const struct builtin_description 
*** 17024,17032 ****
  static const struct builtin_description bdesc_ptest[] =
  {
    /* SSE4.1 */
!   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_ptest, "__builtin_ia32_ptestz128", IX86_BUILTIN_PTESTZ, EQ, 0 },
!   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_ptest, "__builtin_ia32_ptestc128", IX86_BUILTIN_PTESTC, LTU, 0 },
!   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_ptest, "__builtin_ia32_ptestnzc128", IX86_BUILTIN_PTESTNZC, GTU, 0 },
  };
  
  static const struct builtin_description bdesc_pcmpestr[] =
--- 17098,17106 ----
  static const struct builtin_description bdesc_ptest[] =
  {
    /* SSE4.1 */
!   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_ptest, "__builtin_ia32_ptestz128", IX86_BUILTIN_PTESTZ, EQ, 0 },
!   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_ptest, "__builtin_ia32_ptestc128", IX86_BUILTIN_PTESTC, LTU, 0 },
!   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_ptest, "__builtin_ia32_ptestnzc128", IX86_BUILTIN_PTESTNZC, GTU, 0 },
  };
  
  static const struct builtin_description bdesc_pcmpestr[] =
*************** static const struct builtin_description 
*** 17076,17083 ****
    { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_mpsadbw, "__builtin_ia32_mpsadbw128", IX86_BUILTIN_MPSADBW128, UNKNOWN, 0 },
    { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendvb, "__builtin_ia32_pblendvb128", IX86_BUILTIN_PBLENDVB128, UNKNOWN, 0 },
    { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendw, "__builtin_ia32_pblendw128", IX86_BUILTIN_PBLENDW128, UNKNOWN, 0 },
!   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundsd, 0, IX86_BUILTIN_ROUNDSD, UNKNOWN, 0 },
!   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundss, 0, IX86_BUILTIN_ROUNDSS, UNKNOWN, 0 },
  };
  
  static const struct builtin_description bdesc_2arg[] =
--- 17150,17157 ----
    { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_mpsadbw, "__builtin_ia32_mpsadbw128", IX86_BUILTIN_MPSADBW128, UNKNOWN, 0 },
    { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendvb, "__builtin_ia32_pblendvb128", IX86_BUILTIN_PBLENDVB128, UNKNOWN, 0 },
    { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendw, "__builtin_ia32_pblendw128", IX86_BUILTIN_PBLENDW128, UNKNOWN, 0 },
!   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundsd, 0, IX86_BUILTIN_ROUNDSD, UNKNOWN, 0 },
!   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundss, 0, IX86_BUILTIN_ROUNDSS, UNKNOWN, 0 },
  };
  
  static const struct builtin_description bdesc_2arg[] =
*************** ix86_init_mmx_sse_builtins (void)
*** 18287,18296 ****
    def_builtin_const (OPTION_MASK_ISA_SSE4_1, "__builtin_ia32_pmovzxwq128", v2di_ftype_v8hi, IX86_BUILTIN_PMOVZXWQ128);
    def_builtin_const (OPTION_MASK_ISA_SSE4_1, "__builtin_ia32_pmovzxdq128", v2di_ftype_v4si, IX86_BUILTIN_PMOVZXDQ128);
    def_builtin_const (OPTION_MASK_ISA_SSE4_1, "__builtin_ia32_pmuldq128", v2di_ftype_v4si_v4si, IX86_BUILTIN_PMULDQ128);
!   def_builtin_const (OPTION_MASK_ISA_SSE4_1, "__builtin_ia32_roundpd", v2df_ftype_v2df_int, IX86_BUILTIN_ROUNDPD);
!   def_builtin_const (OPTION_MASK_ISA_SSE4_1, "__builtin_ia32_roundps", v4sf_ftype_v4sf_int, IX86_BUILTIN_ROUNDPS);
!   def_builtin_const (OPTION_MASK_ISA_SSE4_1, "__builtin_ia32_roundsd", v2df_ftype_v2df_v2df_int, IX86_BUILTIN_ROUNDSD);
!   def_builtin_const (OPTION_MASK_ISA_SSE4_1, "__builtin_ia32_roundss", v4sf_ftype_v4sf_v4sf_int, IX86_BUILTIN_ROUNDSS);
  
    /* SSE4.2. */
    ftype = build_function_type_list (unsigned_type_node,
--- 18361,18372 ----
    def_builtin_const (OPTION_MASK_ISA_SSE4_1, "__builtin_ia32_pmovzxwq128", v2di_ftype_v8hi, IX86_BUILTIN_PMOVZXWQ128);
    def_builtin_const (OPTION_MASK_ISA_SSE4_1, "__builtin_ia32_pmovzxdq128", v2di_ftype_v4si, IX86_BUILTIN_PMOVZXDQ128);
    def_builtin_const (OPTION_MASK_ISA_SSE4_1, "__builtin_ia32_pmuldq128", v2di_ftype_v4si_v4si, IX86_BUILTIN_PMULDQ128);
! 
!   /* SSE4.1 and SSE5 */
!   def_builtin_const (OPTION_MASK_ISA_ROUND, "__builtin_ia32_roundpd", v2df_ftype_v2df_int, IX86_BUILTIN_ROUNDPD);
!   def_builtin_const (OPTION_MASK_ISA_ROUND, "__builtin_ia32_roundps", v4sf_ftype_v4sf_int, IX86_BUILTIN_ROUNDPS);
!   def_builtin_const (OPTION_MASK_ISA_ROUND, "__builtin_ia32_roundsd", v2df_ftype_v2df_v2df_int, IX86_BUILTIN_ROUNDSD);
!   def_builtin_const (OPTION_MASK_ISA_ROUND, "__builtin_ia32_roundss", v4sf_ftype_v4sf_v4sf_int, IX86_BUILTIN_ROUNDSS);
  
    /* SSE4.2. */
    ftype = build_function_type_list (unsigned_type_node,
*************** ix86_expand_lround (rtx op0, rtx op1)
*** 23144,23150 ****
  
    /* load nextafter (0.5, 0.0) */
    fmt = REAL_MODE_FORMAT (mode);
!   real_2expN (&half_minus_pred_half, -(fmt->p) - 1);
    REAL_ARITHMETIC (pred_half, MINUS_EXPR, dconsthalf, half_minus_pred_half);
  
    /* adj = copysign (0.5, op1) */
--- 23220,23226 ----
  
    /* load nextafter (0.5, 0.0) */
    fmt = REAL_MODE_FORMAT (mode);
!   real_2expN (&half_minus_pred_half, -(fmt->p) - 1, mode);
    REAL_ARITHMETIC (pred_half, MINUS_EXPR, dconsthalf, half_minus_pred_half);
  
    /* adj = copysign (0.5, op1) */
*************** ix86_expand_round (rtx operand0, rtx ope
*** 23555,23561 ****
  
    /* load nextafter (0.5, 0.0) */
    fmt = REAL_MODE_FORMAT (mode);
!   real_2expN (&half_minus_pred_half, -(fmt->p) - 1);
    REAL_ARITHMETIC (pred_half, MINUS_EXPR, dconsthalf, half_minus_pred_half);
  
    /* xa = xa + 0.5 */
--- 23631,23637 ----
  
    /* load nextafter (0.5, 0.0) */
    fmt = REAL_MODE_FORMAT (mode);
!   real_2expN (&half_minus_pred_half, -(fmt->p) - 1, mode);
    REAL_ARITHMETIC (pred_half, MINUS_EXPR, dconsthalf, half_minus_pred_half);
  
    /* xa = xa + 0.5 */
*************** ix86_expand_round (rtx operand0, rtx ope
*** 23577,23582 ****
--- 23653,23747 ----
  }
  
  
+ /* Validate whether a SSE5 instruction is valid or not.
+    OPERANDS is the array of operands.
+    NUM is the number of operands.
+    USES_OC0 is true if the instruction uses OC0 and provides 4 varients.  */
+ bool ix86_sse5_valid_op_p (rtx operands[], rtx insn, int num, bool uses_oc0)
+ {
+   int mem_mask;
+   int mem_count;
+   int i;
+ 
+   /* The -msse5-strict-memory switch controls whether the SSE5 3/4 operand
+      instructions should be strict about having only one memory operand.  If
+      memory operands are strict, then the compiler won't optimize:
+ 
+ 	float fmadd (float *a, float *b, float *c) { return (*a * *b) + *c; }
+ 
+     or similar cases that are vectorized into using the fmaddss instruction.
+     This probably needs some tuning in the compiler.  */
+   if (!x86_sse5_strict_memory)
+     return true;
+     
+   /* Count the number of memory arguments */
+   mem_mask = 0;
+   mem_count = 0;
+   for (i = 0; i < num; i++)
+     {
+       enum machine_mode mode = GET_MODE (operands[i]);
+       if (register_operand (operands[i], mode))
+ 	;
+ 
+       else if (memory_operand (operands[i], mode))
+ 	{
+ 	  mem_mask |= (1 << i);
+ 	  mem_count++;
+ 	}
+ 
+       else
+ 	{
+ 	  rtx pattern = PATTERN (insn);
+ 
+ 	  /* allow 0 for pcmov */
+ 	  if (GET_CODE (pattern) != SET
+ 	      || GET_CODE (SET_SRC (pattern)) != IF_THEN_ELSE
+ 	      || i < 2
+ 	      || operands[i] != CONST0_RTX (mode))
+ 	    return false;
+ 	}
+     }
+ 
+   /* If there were no memory operations, allow the insn */
+   if (mem_mask == 0)
+     return true;
+ 
+   else if (num == 4)
+     {
+       /* formats (destination is the first argument), example fmaddss:
+ 	 xmm1, xmm1, xmm2, xmm3/mem
+ 	 xmm1, xmm1, xmm2/mem, xmm3
+ 	 xmm1, xmm2, xmm3/mem, xmm1
+ 	 xmm1, xmm2/mem, xmm3, xmm1 */
+       if (uses_oc0)
+ 	return ((mem_mask == (1 << 1))
+ 		|| (mem_mask == (1 << 2))
+ 		|| (mem_mask == (1 << 3)));
+ 
+       /* format, example pmacsdd:
+ 	 xmm1, xmm2, xmm3/mem, xmm1 */
+       else
+ 	return (mem_mask == (1 << 2));
+     }
+ 
+   else if (num == 3)
+     {
+       /* formats, example protb:
+ 	 xmm1, xmm2, xmm3/mem
+ 	 xmm1, xmm2/mem, xmm3 */
+       if (uses_oc0)
+ 	return ((mem_mask == (1 << 1)) || (mem_mask == (1 << 2)));
+ 
+       /* format, example comeq:
+ 	 xmm1, xmm2, xmm3/mem */
+       else
+ 	return (mem_mask == (1 << 2));
+     }
+ 
+   return false;
+ }
+ 
+ 
  /* Table of valid machine attributes.  */
  static const struct attribute_spec ix86_attribute_table[] =
  {
*** gcc/config/i386/i386-protos.h.~1~	2007-09-06 13:22:34.020190000 -0400
--- gcc/config/i386/i386-protos.h	2007-08-31 14:21:03.901226000 -0400
*************** extern void ix86_expand_vector_set (bool
*** 205,210 ****
--- 205,212 ----
  extern void ix86_expand_vector_extract (bool, rtx, rtx, int);
  extern void ix86_expand_reduc_v4sf (rtx (*)(rtx, rtx, rtx), rtx, rtx);
  
+ extern bool ix86_sse5_valid_op_p (rtx [], rtx, int, bool);
+ 
  /* In winnt.c  */
  extern void i386_pe_unique_section (tree, int);
  extern void i386_pe_declare_function_type (FILE *, const char *, int);
*** gcc/config/i386/i386.md.~1~	2007-09-06 13:22:34.127296000 -0400
--- gcc/config/i386/i386.md	2007-09-06 11:22:40.215995000 -0400
***************
*** 176,181 ****
--- 176,186 ----
     (UNSPEC_CRC32		143)
     (UNSPEC_PCMPESTR		144)
     (UNSPEC_PCMPISTR		145)
+ 
+    ;; For SSE5
+    (UNSPEC_SSE5_INTRINSIC_P	150)
+    (UNSPEC_SSE5_INTRINSIC_S	151)
+    (UNSPEC_SSE5_INTRINSIC_UNS	152)
    ])
  
  (define_constants
***************
*** 232,239 ****
     push,pop,call,callv,leave,
     str,bitmanip,
     fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint,
!    sselog,sselog1,sseiadd,sseishft,sseimul,
!    sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,sseicvt,ssediv,sseins,
     mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
    (const_string "other"))
  
--- 237,245 ----
     push,pop,call,callv,leave,
     str,bitmanip,
     fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint,
!    sselog,sselog1,sseiadd,sseiadd1,sseishft,sseimul,
!    sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,ssediv,sseins,
!    ssemuladd,sse4arg,
     mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
    (const_string "other"))
  
***************
*** 246,253 ****
  (define_attr "unit" "integer,i387,sse,mmx,unknown"
    (cond [(eq_attr "type" "fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint")
  	   (const_string "i387")
! 	 (eq_attr "type" "sselog,sselog1,sseiadd,sseishft,sseimul,
! 			  sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,sseicvt,ssediv,sseins")
  	   (const_string "sse")
  	 (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
  	   (const_string "mmx")
--- 252,260 ----
  (define_attr "unit" "integer,i387,sse,mmx,unknown"
    (cond [(eq_attr "type" "fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint")
  	   (const_string "i387")
! 	 (eq_attr "type" "sselog,sselog1,sseiadd,sseiadd1,sseishft,sseimul,
! 			  sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,
! 			  ssecvt1,sseicvt,ssediv,sseins,ssemuladd,sse4arg")
  	   (const_string "sse")
  	 (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
  	   (const_string "mmx")
***************
*** 447,457 ****
  		 "!alu1,negnot,ishift1,
  		   imov,imovx,icmp,test,bitmanip,
  		   fmov,fcmp,fsgn,
! 		   sse,ssemov,ssecmp,ssecomi,ssecvt,sseicvt,sselog1,
! 		   mmx,mmxmov,mmxcmp,mmxcvt")
  	      (match_operand 2 "memory_operand" ""))
  	   (const_string "load")
! 	 (and (eq_attr "type" "icmov")
  	      (match_operand 3 "memory_operand" ""))
  	   (const_string "load")
  	]
--- 454,464 ----
  		 "!alu1,negnot,ishift1,
  		   imov,imovx,icmp,test,bitmanip,
  		   fmov,fcmp,fsgn,
! 		   sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,sselog1,
! 		   sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt")
  	      (match_operand 2 "memory_operand" ""))
  	   (const_string "load")
! 	 (and (eq_attr "type" "icmov,ssemuladd,sse4arg")
  	      (match_operand 3 "memory_operand" ""))
  	   (const_string "load")
  	]
***************
*** 7627,7632 ****
--- 7634,7642 ----
  		 (match_operand:SF 2 "nonimmediate_operand" "")))]
    "TARGET_80387 || TARGET_SSE_MATH"
    "")
+ 
+ ;; SSE5 scalar multiply/add instructions are defined in sse.md.
+ 
  
  ;; Divide instructions
  
***************
*** 13683,13689 ****
  	(match_operator:SF 1 "sse_comparison_operator"
  	  [(match_operand:SF 2 "register_operand" "0")
  	   (match_operand:SF 3 "nonimmediate_operand" "xm")]))]
!   "TARGET_SSE"
    "cmp%D1ss\t{%3, %0|%0, %3}"
    [(set_attr "type" "ssecmp")
     (set_attr "mode" "SF")])
--- 13693,13699 ----
  	(match_operator:SF 1 "sse_comparison_operator"
  	  [(match_operand:SF 2 "register_operand" "0")
  	   (match_operand:SF 3 "nonimmediate_operand" "xm")]))]
!   "TARGET_SSE && !TARGET_SSE5"
    "cmp%D1ss\t{%3, %0|%0, %3}"
    [(set_attr "type" "ssecmp")
     (set_attr "mode" "SF")])
***************
*** 13693,13702 ****
  	(match_operator:DF 1 "sse_comparison_operator"
  	  [(match_operand:DF 2 "register_operand" "0")
  	   (match_operand:DF 3 "nonimmediate_operand" "xm")]))]
!   "TARGET_SSE2"
    "cmp%D1sd\t{%3, %0|%0, %3}"
    [(set_attr "type" "ssecmp")
     (set_attr "mode" "DF")])
  
  ;; Basic conditional jump instructions.
  ;; We ignore the overflow flag for signed branch instructions.
--- 13703,13723 ----
  	(match_operator:DF 1 "sse_comparison_operator"
  	  [(match_operand:DF 2 "register_operand" "0")
  	   (match_operand:DF 3 "nonimmediate_operand" "xm")]))]
!   "TARGET_SSE2 && !TARGET_SSE5"
    "cmp%D1sd\t{%3, %0|%0, %3}"
    [(set_attr "type" "ssecmp")
     (set_attr "mode" "DF")])
+ 
+ (define_insn "*sse5_setcc<mode>"
+   [(set (match_operand:SSEMODEF 0 "register_operand" "=x")
+ 	(match_operator:SSEMODEF 1 "sse5_comparison_float_operator"
+ 	  [(match_operand:SSEMODEF 2 "register_operand" "x")
+ 	   (match_operand:SSEMODEF 3 "nonimmediate_operand" "xm")]))]
+   "TARGET_SSE5"
+   "com%Y1ss\t{%3, %2, %0|%0, %2, %3}"
+   [(set_attr "type" "sse4arg")
+    (set_attr "mode" "<MODE>")])
+ 
  
  ;; Basic conditional jump instructions.
  ;; We ignore the overflow flag for signed branch instructions.
***************
*** 17269,17275 ****
  	(unspec:SSEMODEF [(match_operand:SSEMODEF 1 "register_operand" "x")
  			  (match_operand:SI 2 "const_0_to_15_operand" "n")]
  		         UNSPEC_ROUND))]
!   "TARGET_SSE4_1"
    "rounds<ssemodefsuffix>\t{%2, %1, %0|%0, %1, %2}"
    [(set_attr "type" "ssecvt")
     (set_attr "prefix_extra" "1")
--- 17290,17296 ----
  	(unspec:SSEMODEF [(match_operand:SSEMODEF 1 "register_operand" "x")
  			  (match_operand:SI 2 "const_0_to_15_operand" "n")]
  		         UNSPEC_ROUND))]
!   "TARGET_ROUND"
    "rounds<ssemodefsuffix>\t{%2, %1, %0|%0, %1, %2}"
    [(set_attr "type" "ssecvt")
     (set_attr "prefix_extra" "1")
***************
*** 17294,17306 ****
      && flag_unsafe_math_optimizations)
     || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH
         && !flag_trapping_math
!        && (TARGET_SSE4_1 || !optimize_size))"
  {
    if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH
        && !flag_trapping_math
!       && (TARGET_SSE4_1 || !optimize_size))
      {
!       if (TARGET_SSE4_1)
  	emit_insn (gen_sse4_1_round<mode>2
  		   (operands[0], operands[1], GEN_INT (0x04)));
        else
--- 17315,17327 ----
      && flag_unsafe_math_optimizations)
     || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH
         && !flag_trapping_math
!        && (TARGET_ROUND || !optimize_size))"
  {
    if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH
        && !flag_trapping_math
!       && (TARGET_ROUND || !optimize_size))
      {
!       if (TARGET_ROUND)
  	emit_insn (gen_sse4_1_round<mode>2
  		   (operands[0], operands[1], GEN_INT (0x04)));
        else
***************
*** 17541,17553 ****
      && flag_unsafe_math_optimizations && !optimize_size)
     || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH
         && !flag_trapping_math
!        && (TARGET_SSE4_1 || !optimize_size))"
  {
    if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH
        && !flag_trapping_math
!       && (TARGET_SSE4_1 || !optimize_size))
      {
!       if (TARGET_SSE4_1)
  	emit_insn (gen_sse4_1_round<mode>2
  		   (operands[0], operands[1], GEN_INT (0x01)));
        else if (TARGET_64BIT || (<MODE>mode != DFmode))
--- 17562,17574 ----
      && flag_unsafe_math_optimizations && !optimize_size)
     || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH
         && !flag_trapping_math
!        && (TARGET_ROUND || !optimize_size))"
  {
    if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH
        && !flag_trapping_math
!       && (TARGET_ROUND || !optimize_size))
      {
!       if (TARGET_ROUND)
  	emit_insn (gen_sse4_1_round<mode>2
  		   (operands[0], operands[1], GEN_INT (0x01)));
        else if (TARGET_64BIT || (<MODE>mode != DFmode))
***************
*** 17806,17818 ****
      && flag_unsafe_math_optimizations && !optimize_size)
     || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH
         && !flag_trapping_math
!        && (TARGET_SSE4_1 || !optimize_size))"
  {
    if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH
        && !flag_trapping_math
!       && (TARGET_SSE4_1 || !optimize_size))
      {
!       if (TARGET_SSE4_1)
  	emit_insn (gen_sse4_1_round<mode>2
  		   (operands[0], operands[1], GEN_INT (0x02)));
        else if (TARGET_64BIT || (<MODE>mode != DFmode))
--- 17827,17839 ----
      && flag_unsafe_math_optimizations && !optimize_size)
     || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH
         && !flag_trapping_math
!        && (TARGET_ROUND || !optimize_size))"
  {
    if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH
        && !flag_trapping_math
!       && (TARGET_ROUND || !optimize_size))
      {
!       if (TARGET_ROUND)
  	emit_insn (gen_sse4_1_round<mode>2
  		   (operands[0], operands[1], GEN_INT (0x02)));
        else if (TARGET_64BIT || (<MODE>mode != DFmode))
***************
*** 18069,18081 ****
      && flag_unsafe_math_optimizations && !optimize_size)
     || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH
         && !flag_trapping_math
!        && (TARGET_SSE4_1 || !optimize_size))"
  {
    if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH
        && !flag_trapping_math
!       && (TARGET_SSE4_1 || !optimize_size))
      {
!       if (TARGET_SSE4_1)
  	emit_insn (gen_sse4_1_round<mode>2
  		   (operands[0], operands[1], GEN_INT (0x03)));
        else if (TARGET_64BIT || (<MODE>mode != DFmode))
--- 18090,18102 ----
      && flag_unsafe_math_optimizations && !optimize_size)
     || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH
         && !flag_trapping_math
!        && (TARGET_ROUND || !optimize_size))"
  {
    if (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH
        && !flag_trapping_math
!       && (TARGET_ROUND || !optimize_size))
      {
!       if (TARGET_ROUND)
  	emit_insn (gen_sse4_1_round<mode>2
  		   (operands[0], operands[1], GEN_INT (0x03)));
        else if (TARGET_64BIT || (<MODE>mode != DFmode))
***************
*** 19317,19322 ****
--- 19338,19354 ----
    [(set_attr "type" "fcmov")
     (set_attr "mode" "XF")])
  
+ ;; SSE5 conditional move
+ (define_insn "*sse5_pcmov_<mode>"
+   [(set (match_operand:SSEMODEF 0 "register_operand" "=x,x,x,x")
+ 	(if_then_else:SSEMODEF 
+ 	  (match_operand:SSEMODEF 1 "nonimmediate_operand" "xm,x,0,0")
+ 	  (match_operand:SSEMODEF 2 "nonimmediate_operand" "0,0,x,xm")
+ 	  (match_operand:SSEMODEF 3 "vector_move_operand" "x,xm,xm,x")))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, true)"
+   "pcmov\t{%1, %3, %2, %0|%0, %2, %3, %1}"
+   [(set_attr "type" "sse4arg")])
+ 
  ;; These versions of the min/max patterns are intentionally ignorant of
  ;; their behavior wrt -0.0 and NaN (via the commutative operand mark).
  ;; Since both the tree-level MAX_EXPR and the rtl-level SMAX operator
*** gcc/config/i386/sse.md.~1~	2007-09-06 13:22:34.623792000 -0400
--- gcc/config/i386/sse.md	2007-08-31 15:07:37.117558000 -0400
***************
*** 32,41 ****
--- 32,52 ----
  (define_mode_iterator SSEMODE14 [V16QI V4SI])
  (define_mode_iterator SSEMODE124 [V16QI V8HI V4SI])
  (define_mode_iterator SSEMODE248 [V8HI V4SI V2DI])
+ (define_mode_iterator SSEMODE1248 [V16QI V8HI V4SI V2DI])
+ (define_mode_iterator SSEMODEF4 [SF DF V4SF V2DF])
+ (define_mode_iterator SSEMODEF2P [V4SF V2DF])
  
  ;; Mapping from integer vector mode to mnemonic suffix
  (define_mode_attr ssevecsize [(V16QI "b") (V8HI "w") (V4SI "d") (V2DI "q")])
  
+ ;; Mapping of the sse5 suffix
+ (define_mode_attr ssemodesuffixf4 [(SF "ss") (DF "sd") (V4SF "ps") (V2DF "pd")])
+ (define_mode_attr ssemodesuffixf2s [(SF "ss") (DF "sd") (V4SF "ss") (V2DF "sd")])
+ (define_mode_attr ssemodesuffixf2c [(V4SF "s") (V2DF "d")])
+ 
+ ;; Mapping of vector modes back to the scalar modes
+ (define_mode_attr ssescalarmode [(V4SF "SF") (V2DF "DF")])
+ 
  ;; Patterns whose name begins with "sse{,2,3}_" are invoked by intrinsics.
  
  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
***************
*** 834,840 ****
  	(match_operator:V4SF 3 "sse_comparison_operator"
  		[(match_operand:V4SF 1 "register_operand" "0")
  		 (match_operand:V4SF 2 "nonimmediate_operand" "xm")]))]
!   "TARGET_SSE"
    "cmp%D3ps\t{%2, %0|%0, %2}"
    [(set_attr "type" "ssecmp")
     (set_attr "mode" "V4SF")])
--- 845,851 ----
  	(match_operator:V4SF 3 "sse_comparison_operator"
  		[(match_operand:V4SF 1 "register_operand" "0")
  		 (match_operand:V4SF 2 "nonimmediate_operand" "xm")]))]
!   "TARGET_SSE && !TARGET_SSE5"
    "cmp%D3ps\t{%2, %0|%0, %2}"
    [(set_attr "type" "ssecmp")
     (set_attr "mode" "V4SF")])
***************
*** 844,850 ****
  	(match_operator:SF 3 "sse_comparison_operator"
  		[(match_operand:SF 1 "register_operand" "0")
  		 (match_operand:SF 2 "nonimmediate_operand" "xm")]))]
!   "TARGET_SSE"
    "cmp%D3ss\t{%2, %0|%0, %2}"
    [(set_attr "type" "ssecmp")
     (set_attr "mode" "SF")])
--- 855,861 ----
  	(match_operator:SF 3 "sse_comparison_operator"
  		[(match_operand:SF 1 "register_operand" "0")
  		 (match_operand:SF 2 "nonimmediate_operand" "xm")]))]
!   "TARGET_SSE && !TARGET_SSE5"
    "cmp%D3ss\t{%2, %0|%0, %2}"
    [(set_attr "type" "ssecmp")
     (set_attr "mode" "SF")])
***************
*** 857,863 ****
  		 (match_operand:V4SF 2 "register_operand" "x")])
  	 (match_dup 1)
  	 (const_int 1)))]
!   "TARGET_SSE"
    "cmp%D3ss\t{%2, %0|%0, %2}"
    [(set_attr "type" "ssecmp")
     (set_attr "mode" "SF")])
--- 868,874 ----
  		 (match_operand:V4SF 2 "register_operand" "x")])
  	 (match_dup 1)
  	 (const_int 1)))]
!   "TARGET_SSE && !TARGET_SSE5"
    "cmp%D3ss\t{%2, %0|%0, %2}"
    [(set_attr "type" "ssecmp")
     (set_attr "mode" "SF")])
***************
*** 1571,1576 ****
--- 1582,1739 ----
  
  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
  ;;
+ ;; SSE5 floating point multiply/accumulate instructions This includes the
+ ;; scalar version of the instructions as well as the vector
+ ;;
+ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+ 
+ (define_insn "*sse5_fmadd<mode>4"
+   [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x,x,x")
+ 	(plus:SSEMODEF4 (mult:SSEMODEF4 (match_operand:SSEMODEF4 1 "nonimmediate_operand" "%0,0,x,xm")
+ 					  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm,xm,x"))
+ 			 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x,0,0")))]
+   "TARGET_SSE5 && TARGET_FUSED_MADD
+    && ix86_sse5_valid_op_p (operands, insn, 4, true)"
+   "fmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "<MODE>")])
+ 
+ (define_insn "*sse5_fmsub<mode>4"
+   [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x,x,x")
+ 	(minus:SSEMODEF4 (mult:SSEMODEF4 (match_operand:SSEMODEF4 1 "nonimmediate_operand" "%0,0,x,xm")
+ 					   (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm,xm,x"))
+ 			  (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x,0,0")))]
+   "TARGET_SSE5 && TARGET_FUSED_MADD
+    && ix86_sse5_valid_op_p (operands, insn, 4, true)"
+   "fmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "<MODE>")])
+ 
+ ;; Rewrite (- (a * b) + c) into the canonical form: c - (a * b)
+ ;; Note operands are out of order to simplify call to ix86_sse5_valid_p
+ (define_insn "*sse5_fnmadd1<mode>4"
+   [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x,x,x")
+ 	(minus:SSEMODEF4 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x,0,0")
+ 			 (mult:SSEMODEF4 (match_operand:SSEMODEF4 1 "nonimmediate_operand" "%0,0,x,xm")
+ 					 (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm,xm,x"))))]
+   "TARGET_SSE5 && TARGET_FUSED_MADD
+    && ix86_sse5_valid_op_p (operands, insn, 4, true)"
+   "fnmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "<MODE>")])
+ 
+ ;; Rewrite (- (a * b) - c) into the canonical form: ((-a) * b) - c
+ (define_insn "*sse5_fnmsub<mode>4"
+   [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x,x,x")
+ 	(minus:SSEMODEF4 (mult:SSEMODEF4 (neg:SSEMODEF4 (match_operand:SSEMODEF4 1 "nonimmediate_operand" "%0,0,x,xm"))
+ 					 (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm,xm,x"))
+ 			 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x,0,0")))]
+   "TARGET_SSE5 && TARGET_FUSED_MADD
+    && ix86_sse5_valid_op_p (operands, insn, 4, true)"
+   "fnmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "<MODE>")])
+ 
+ ;; The same instructions using an UNSPEC to allow the intrinsic to be used
+ ;; even if the user used -mno-fused-madd
+ ;; Parallel instructions
+ (define_insn "sse5ip_fmadd<mode>4"
+   [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x,x,x")
+ 	(unspec:SSEMODEF2P [(plus:SSEMODEF2P (mult:SSEMODEF2P (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "%0,0,x,xm")
+ 							      (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm,xm,x"))
+ 					     (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x,0,0"))]
+ 			   UNSPEC_SSE5_INTRINSIC_P))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, true)"
+   "fmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "<MODE>")])
+ 
+ (define_insn "sse5ip_fmsub<mode>4"
+   [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x,x,x")
+ 	(unspec:SSEMODEF2P [(minus:SSEMODEF2P (mult:SSEMODEF2P (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "%0,0,x,xm")
+ 							       (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm,xm,x"))
+ 					      (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x,0,0"))]
+ 			   UNSPEC_SSE5_INTRINSIC_P))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, true)"
+   "fmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "<MODE>")])
+ 
+ ;; Rewrite (- (a * b) + c) into the canonical form: c - (a * b)
+ ;; Note operands are out of order to simplify call to ix86_sse5_valid_p
+ (define_insn "sse5ip_fnmadd<mode>4"
+   [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x,x,x")
+ 	(unspec:SSEMODEF2P [(minus:SSEMODEF2P (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x,0,0")
+ 					      (mult:SSEMODEF2P (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "%0,0,x,xm")
+ 							       (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm,xm,x")))]
+ 			   UNSPEC_SSE5_INTRINSIC_P))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, true)"
+   "fnmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "<MODE>")])
+ 
+ ;; Rewrite (- (a * b) - c) into the canonical form: ((-a) * b) - c
+ (define_insn "sse5ip_fnmsub<mode>4"
+   [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x,x,x")
+ 	(unspec:SSEMODEF2P [(minus:SSEMODEF2P (mult:SSEMODEF2P (neg:SSEMODEF2P (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "%0,0,x,xm"))
+ 							       (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm,xm,x"))
+ 					      (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x,0,0"))]
+ 			   UNSPEC_SSE5_INTRINSIC_P))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, true)"
+   "fnmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "<MODE>")])
+ 
+ ;; Scalar instructions
+ (define_insn "sse5is_fmadd<mode>4"
+   [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x,x,x")
+ 	(unspec:SSEMODEF2P [(plus:SSEMODEF2P (mult:SSEMODEF2P (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "%0,0,x,xm")
+ 							      (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm,xm,x"))
+ 					     (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x,0,0"))]
+ 			   UNSPEC_SSE5_INTRINSIC_S))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, true)"
+   "fmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "<ssescalarmode>")])
+ 
+ (define_insn "sse5is_fmsub<mode>4"
+   [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x,x,x")
+ 	(unspec:SSEMODEF2P [(minus:SSEMODEF2P (mult:SSEMODEF2P (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "%0,0,x,xm")
+ 							       (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm,xm,x"))
+ 					      (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x,0,0"))]
+ 			   UNSPEC_SSE5_INTRINSIC_S))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, true)"
+   "fmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "<ssescalarmode>")])
+ 
+ ;; Rewrite (- (a * b) + c) into the canonical form: c - (a * b)
+ ;; Note operands are out of order to simplify call to ix86_sse5_valid_p
+ (define_insn "sse5is_fnmadd<mode>4"
+   [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x,x,x")
+ 	(unspec:SSEMODEF2P [(minus:SSEMODEF2P (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x,0,0")
+ 					      (mult:SSEMODEF2P (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "%0,0,x,xm")
+ 							       (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm,xm,x")))]
+ 			   UNSPEC_SSE5_INTRINSIC_S))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, true)"
+   "fnmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "<ssescalarmode>")])
+ 
+ ;; Rewrite (- (a * b) - c) into the canonical form: ((-a) * b) - c
+ (define_insn "sse5is_fnmsub<mode>4"
+   [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x,x,x")
+ 	(unspec:SSEMODEF2P [(minus:SSEMODEF2P (mult:SSEMODEF2P (neg:SSEMODEF2P (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "%0,0,x,xm"))
+ 							       (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm,xm,x"))
+ 					      (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x,0,0"))]
+ 			   UNSPEC_SSE5_INTRINSIC_S))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, true)"
+   "fnmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+   [(set_attr "type" "ssemuladd")
+    (set_attr "mode" "<ssescalarmode>")])
+ 
+ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+ ;;
  ;; Parallel double-precision floating point arithmetic
  ;;
  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
***************
*** 1875,1881 ****
  	(match_operator:V2DF 3 "sse_comparison_operator"
  		[(match_operand:V2DF 1 "register_operand" "0")
  		 (match_operand:V2DF 2 "nonimmediate_operand" "xm")]))]
!   "TARGET_SSE2"
    "cmp%D3pd\t{%2, %0|%0, %2}"
    [(set_attr "type" "ssecmp")
     (set_attr "mode" "V2DF")])
--- 2038,2044 ----
  	(match_operator:V2DF 3 "sse_comparison_operator"
  		[(match_operand:V2DF 1 "register_operand" "0")
  		 (match_operand:V2DF 2 "nonimmediate_operand" "xm")]))]
!   "TARGET_SSE2 && !TARGET_SSE5"
    "cmp%D3pd\t{%2, %0|%0, %2}"
    [(set_attr "type" "ssecmp")
     (set_attr "mode" "V2DF")])
***************
*** 1885,1891 ****
  	(match_operator:DF 3 "sse_comparison_operator"
  		[(match_operand:DF 1 "register_operand" "0")
  		 (match_operand:DF 2 "nonimmediate_operand" "xm")]))]
!   "TARGET_SSE2"
    "cmp%D3sd\t{%2, %0|%0, %2}"
    [(set_attr "type" "ssecmp")
     (set_attr "mode" "DF")])
--- 2048,2054 ----
  	(match_operator:DF 3 "sse_comparison_operator"
  		[(match_operand:DF 1 "register_operand" "0")
  		 (match_operand:DF 2 "nonimmediate_operand" "xm")]))]
!   "TARGET_SSE2 && !TARGET_SSE5"
    "cmp%D3sd\t{%2, %0|%0, %2}"
    [(set_attr "type" "ssecmp")
     (set_attr "mode" "DF")])
***************
*** 1898,1904 ****
  		 (match_operand:V2DF 2 "nonimmediate_operand" "xm")])
  	  (match_dup 1)
  	  (const_int 1)))]
!   "TARGET_SSE2"
    "cmp%D3sd\t{%2, %0|%0, %2}"
    [(set_attr "type" "ssecmp")
     (set_attr "mode" "DF")])
--- 2061,2067 ----
  		 (match_operand:V2DF 2 "nonimmediate_operand" "xm")])
  	  (match_dup 1)
  	  (const_int 1)))]
!   "TARGET_SSE2 && !TARGET_SSE5"
    "cmp%D3sd\t{%2, %0|%0, %2}"
    [(set_attr "type" "ssecmp")
     (set_attr "mode" "DF")])
***************
*** 3694,3700 ****
  	(eq:SSEMODE124
  	  (match_operand:SSEMODE124 1 "nonimmediate_operand" "%0")
  	  (match_operand:SSEMODE124 2 "nonimmediate_operand" "xm")))]
!   "TARGET_SSE2 && ix86_binary_operator_ok (EQ, <MODE>mode, operands)"
    "pcmpeq<ssevecsize>\t{%2, %0|%0, %2}"
    [(set_attr "type" "ssecmp")
     (set_attr "prefix_data16" "1")
--- 3857,3863 ----
  	(eq:SSEMODE124
  	  (match_operand:SSEMODE124 1 "nonimmediate_operand" "%0")
  	  (match_operand:SSEMODE124 2 "nonimmediate_operand" "xm")))]
!   "TARGET_SSE2 && !TARGET_SSE5 && ix86_binary_operator_ok (EQ, <MODE>mode, operands)"
    "pcmpeq<ssevecsize>\t{%2, %0|%0, %2}"
    [(set_attr "type" "ssecmp")
     (set_attr "prefix_data16" "1")
***************
*** 3716,3722 ****
  	(gt:SSEMODE124
  	  (match_operand:SSEMODE124 1 "register_operand" "0")
  	  (match_operand:SSEMODE124 2 "nonimmediate_operand" "xm")))]
!   "TARGET_SSE2"
    "pcmpgt<ssevecsize>\t{%2, %0|%0, %2}"
    [(set_attr "type" "ssecmp")
     (set_attr "prefix_data16" "1")
--- 3879,3885 ----
  	(gt:SSEMODE124
  	  (match_operand:SSEMODE124 1 "register_operand" "0")
  	  (match_operand:SSEMODE124 2 "nonimmediate_operand" "xm")))]
!   "TARGET_SSE2 && !TARGET_SSE5"
    "pcmpgt<ssevecsize>\t{%2, %0|%0, %2}"
    [(set_attr "type" "ssecmp")
     (set_attr "prefix_data16" "1")
***************
*** 6590,6596 ****
  	(unspec:V2DF [(match_operand:V2DF 1 "nonimmediate_operand" "xm")
  		      (match_operand:SI 2 "const_0_to_15_operand" "n")]
  		     UNSPEC_ROUND))]
!   "TARGET_SSE4_1"
    "roundpd\t{%2, %1, %0|%0, %1, %2}"
    [(set_attr "type" "ssecvt")
     (set_attr "prefix_extra" "1")
--- 6753,6759 ----
  	(unspec:V2DF [(match_operand:V2DF 1 "nonimmediate_operand" "xm")
  		      (match_operand:SI 2 "const_0_to_15_operand" "n")]
  		     UNSPEC_ROUND))]
!   "TARGET_ROUND"
    "roundpd\t{%2, %1, %0|%0, %1, %2}"
    [(set_attr "type" "ssecvt")
     (set_attr "prefix_extra" "1")
***************
*** 6601,6607 ****
  	(unspec:V4SF [(match_operand:V4SF 1 "nonimmediate_operand" "xm")
  		      (match_operand:SI 2 "const_0_to_15_operand" "n")]
  		     UNSPEC_ROUND))]
!   "TARGET_SSE4_1"
    "roundps\t{%2, %1, %0|%0, %1, %2}"
    [(set_attr "type" "ssecvt")
     (set_attr "prefix_extra" "1")
--- 6764,6770 ----
  	(unspec:V4SF [(match_operand:V4SF 1 "nonimmediate_operand" "xm")
  		      (match_operand:SI 2 "const_0_to_15_operand" "n")]
  		     UNSPEC_ROUND))]
!   "TARGET_ROUND"
    "roundps\t{%2, %1, %0|%0, %1, %2}"
    [(set_attr "type" "ssecvt")
     (set_attr "prefix_extra" "1")
***************
*** 6615,6621 ****
  		       UNSPEC_ROUND)
  	  (match_operand:V2DF 1 "register_operand" "0")
  	  (const_int 1)))]
!   "TARGET_SSE4_1"
    "roundsd\t{%3, %2, %0|%0, %2, %3}"
    [(set_attr "type" "ssecvt")
     (set_attr "prefix_extra" "1")
--- 6778,6784 ----
  		       UNSPEC_ROUND)
  	  (match_operand:V2DF 1 "register_operand" "0")
  	  (const_int 1)))]
!   "TARGET_ROUND"
    "roundsd\t{%3, %2, %0|%0, %2, %3}"
    [(set_attr "type" "ssecvt")
     (set_attr "prefix_extra" "1")
***************
*** 6629,6635 ****
  		       UNSPEC_ROUND)
  	  (match_operand:V4SF 1 "register_operand" "0")
  	  (const_int 1)))]
!   "TARGET_SSE4_1"
    "roundss\t{%3, %2, %0|%0, %2, %3}"
    [(set_attr "type" "ssecvt")
     (set_attr "prefix_extra" "1")
--- 6792,6798 ----
  		       UNSPEC_ROUND)
  	  (match_operand:V4SF 1 "register_operand" "0")
  	  (const_int 1)))]
!   "TARGET_ROUND"
    "roundss\t{%3, %2, %0|%0, %2, %3}"
    [(set_attr "type" "ssecvt")
     (set_attr "prefix_extra" "1")
***************
*** 6877,6879 ****
--- 7040,7060 ----
     (set_attr "prefix_extra" "1")
     (set_attr "memory" "none,load,none,load")
     (set_attr "mode" "TI")])
+ 
+ ;; SSE5 parallel XMM conditional moves
+ (define_insn "sse5_pcmov_<mode>"
+   [(set (match_operand:SSEMODE 0 "register_operand" "=x,x,x,x,x,x")
+ 	(if_then_else:SSEMODE 
+ 	  (match_operand:SSEMODE 3 "register_operand" "0,0,xm,xm,0,0")
+ 	  (match_operand:SSEMODE 1 "vector_move_operand" "x,xm,0,x,C,x")
+ 	  (match_operand:SSEMODE 2 "vector_move_operand" "xm,x,x,0,x,C")))]
+   "TARGET_SSE5 && ix86_sse5_valid_op_p (operands, insn, 4, true)"
+   "@
+    pcmov\t{%3, %2, %1, %0|%3, %1, %2, %0}
+    pcmov\t{%3, %2, %1, %0|%3, %1, %2, %0}
+    pcmov\t{%3, %2, %1, %0|%3, %1, %2, %0}
+    pcmov\t{%3, %2, %1, %0|%3, %1, %2, %0}
+    andps\t{%2, %0|%0, %2}
+    andnps\t{%1, %0|%0, %1}"
+   [(set_attr "type" "sse4arg")])
+ 
*** gcc/config/i386/predicates.md.~1~	2007-09-06 13:22:34.810884000 -0400
--- gcc/config/i386/predicates.md	2007-08-31 15:06:34.165961000 -0400
***************
*** 903,908 ****
--- 903,920 ----
  (define_special_predicate "sse_comparison_operator"
    (match_code "eq,lt,le,unordered,ne,unge,ungt,ordered"))
  
+ ;; Return 1 if OP is a comparison operator that can be issued by sse predicate
+ ;; generation instructions
+ (define_predicate "sse5_comparison_float_operator"
+   (and (match_test "TARGET_SSE5")
+        (match_code "ne,eq,ge,gt,le,lt,unordered,ordered,uneq,unge,ungt,unle,unlt,ltgt")))
+ 
+ (define_predicate "ix86_comparison_int_operator"
+   (match_code "ne,eq,ge,gt,le,lt"))
+ 
+ (define_predicate "ix86_comparison_uns_operator"
+   (match_code "ne,eq,geu,gtu,leu,ltu"))
+ 
  ;; Return 1 if OP is a valid comparison operator in valid mode.
  (define_predicate "ix86_comparison_operator"
    (match_operand 0 "comparison_operator")
*** gcc/doc/invoke.texi.~1~	2007-09-06 13:22:34.924884000 -0400
--- gcc/doc/invoke.texi	2007-09-06 11:13:26.112149000 -0400
*************** Objective-C and Objective-C++ Dialects}.
*** 553,566 ****
  -mno-wide-multiply  -mrtd  -malign-double @gol
  -mpreferred-stack-boundary=@var{num} -mcx16 -msahf -mrecip @gol
  -mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 @gol
! -msse4a -m3dnow -mpopcnt -mabm @gol
  -mthreads  -mno-align-stringops  -minline-all-stringops @gol
  -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
  -m96bit-long-double  -mregparm=@var{num}  -msseregparm @gol
  -mveclibabi=@var{type} -mpc32 -mpc64 -mpc80 -mstackrealign @gol
  -momit-leaf-frame-pointer  -mno-red-zone -mno-tls-direct-seg-refs @gol
  -mcmodel=@var{code-model} @gol
! -m32  -m64 -mlarge-data-threshold=@var{num}}
  
  @emph{IA-64 Options}
  @gccoptlist{-mbig-endian  -mlittle-endian  -mgnu-as  -mgnu-ld  -mno-pic @gol
--- 553,567 ----
  -mno-wide-multiply  -mrtd  -malign-double @gol
  -mpreferred-stack-boundary=@var{num} -mcx16 -msahf -mrecip @gol
  -mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 @gol
! -msse4a -m3dnow -mpopcnt -mabm -msse5 @gol
  -mthreads  -mno-align-stringops  -minline-all-stringops @gol
  -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
  -m96bit-long-double  -mregparm=@var{num}  -msseregparm @gol
  -mveclibabi=@var{type} -mpc32 -mpc64 -mpc80 -mstackrealign @gol
  -momit-leaf-frame-pointer  -mno-red-zone -mno-tls-direct-seg-refs @gol
  -mcmodel=@var{code-model} @gol
! -m32  -m64 -mlarge-data-threshold=@var{num} @gol
! -mfused-madd -mno-fused-madd -msse5-strict-memory}
  
  @emph{IA-64 Options}
  @gccoptlist{-mbig-endian  -mlittle-endian  -mgnu-as  -mgnu-ld  -mno-pic @gol
*************** preferred alignment to @option{-mpreferr
*** 10435,10440 ****
--- 10436,10443 ----
  @itemx -mno-sse4
  @item -msse4a
  @item -mno-sse4a
+ @item -msse5
+ @itemx -mno-sse5
  @item -m3dnow
  @itemx -mno-3dnow
  @item -mpopcnt
*************** preferred alignment to @option{-mpreferr
*** 10448,10454 ****
  @opindex m3dnow
  @opindex mno-3dnow
  These switches enable or disable the use of instructions in the MMX,
! SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4A, ABM or 3DNow! extended
  instruction sets.
  These extensions are also available as built-in functions: see
  @ref{X86 Built-in Functions}, for details of the functions enabled and
--- 10451,10457 ----
  @opindex m3dnow
  @opindex mno-3dnow
  These switches enable or disable the use of instructions in the MMX,
! SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4A, SSE5, ABM or 3DNow! extended
  instruction sets.
  These extensions are also available as built-in functions: see
  @ref{X86 Built-in Functions}, for details of the functions enabled and
*************** is legal depends on the operating system
*** 10570,10575 ****
--- 10573,10593 ----
  segment to cover the entire TLS area.
  
  For systems that use GNU libc, the default is on.
+ 
+ @item -mfused-madd
+ @itemx -mno-fused-madd
+ @opindex mfused-madd
+ Enable automatic generation of fused floating point multiply-add instructions
+ if the ISA supports such instructions.  The -mfused-madd option is on by
+ default.
+ 
+ @item -msse5-strict-memory
+ @opindex -msse5-strict-memory
+ Limit SSE5 instructions to a single memory operand internally before register
+ allocation.  This more closely matches the format of the hardware instructions
+ but it prevents some combinations from being discovered.  It is anticipated
+ that the need for this switch may disappear in the future as the compiler is
+ tuned.
  @end table
  
  These @samp{-m} switches are supported in addition to the above


More information about the Gcc-patches mailing list