This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
PATCH: Add FMA4 128-bit and 256-bit support for upcoming AMD Orochi processor.

From: Harsha Jagasia <harsha dot jagasia at amd dot com>
To: Harsha Jagasia <harsha dot jagasia at amd dot com>, gcc-patches at gcc dot gnu dot org, hubicka at ucw dot cz, rth at redhat dot com, dwarak dot rajagopal at amd dot com, christophe dot harle at amd dot com
Cc: Harsha Jagasia <harsha dot jagasia at amd dot com>
Date: Wed, 23 Sep 2009 20:59:56 -0500
Subject: PATCH: Add FMA4 128-bit and 256-bit support for upcoming AMD Orochi processor.
Hi,
This patch adds FMA4 128-bit and 256-bit support for the upcoming AMD Orochi processor.
This patch has been pre-reviewed by Honza. 
Bootstrap passes and all the 128-bit and 256-bit FMA4 tests pass under the AMD Orochi simulator.
This patch has to be applied on top of the SSE5 removal omissions patch.
Ok for check-in?

Thanks,
Harsha

	* gcc.target/i386/fma4-check.h
	* gcc.target/i386/fma4-fma.c
	* gcc.target/i386/fma4-maccXX.c
	* gcc.target/i386/fma4-msubXX.c
	* gcc.target/i386/fma4-nmaccXX.c
	* gcc.target/i386/fma4-nmsubXX.c
	* gcc.target/i386/fma4-vector.c
	* gcc.target/i386/fma4-256-maccXX.c
	* gcc.target/i386/fma4-256-msubXX.c
	* gcc.target/i386/fma4-256-nmaccXX.c
	* gcc.target/i386/fma4-256-nmsubXX.c
	* gcc.target/i386/fma4-256-vector.c
	* gcc.target/i386/funcspec-2.c: New file.

	* gcc.target/i386/funcspec-4.c: Test error conditions
	related to FMA4.

	* gcc.target/i386/funcspec-5.c
	* gcc.target/i386/funcspec-6.c
	* gcc.target/i386/funcspec-8.c: Add FMA4.
	
	* gcc.target/i386/funcspec-9.c: New file.

	* gcc.target/i386/i386.exp: Add check_effective_target_fma4.

	* gcc.target/i386/isa-10.c
	* gcc.target/i386/isa-11.c
	* gcc.target/i386/isa-12.c
	* gcc.target/i386/isa-13.c
	* gcc.target/i386/isa-2.c
	* gcc.target/i386/isa-3.c
	* gcc.target/i386/isa-4.c
	* gcc.target/i386/isa-7.c
	* gcc.target/i386/isa-8.c
	* gcc.target/i386/isa-9.c: New file.
	
	* gcc.target/i386/isa-14.c
	* gcc.target/i386/isa-1.c
	* gcc.target/i386/isa-5.c
	* gcc.target/i386/isa-6.c: Add FMA4.

	* gcc.target/i386/sse-12.c
	* gcc.target/i386/sse-13.c
	* gcc.target/i386/sse-14.c
	* gcc.target/i386/sse-22.c: New file.

	* g++.dg/other/i386-2.C
	* g++.dg/other/i386-3.C
	* g++.dg/other/i386-5.C
	* g++.dg/other/i386-6.C: Add -mfma4 in dg-options.

	* config.gcc (i[34567]86-*-*): Include fma4intrin.h.
	(x86_64-*-*): Ditto.

	* doc/invoke.texi (-mfma4): Add documentation.

	* doc/extend.texi (x86 intrinsics): Add FMA4 intrinsics.
	
	* config/i386/fma4intrin.h: New file, provide common x86 compiler
	intrinisics for FMA4.

	* config/i386/cpuid.h (bit_FMA4): Define FMA4 bit.

	* config/i386/x86intrin.h: Fix typo to SSE4A instead of SSE4a.
	Add FMA4 check and fma4intrin.h.
	
	* config/i386/i386-c.c(ix86_target_macros_internal): Check
	ISA_FLAG for FMA4. 
	
	* config/i386/i386.h(TARGET_FMA4): New macro for FMA4.

	* config/i386/i386.md (UNSPEC_FMA4_INTRINSIC): Add new UNSPEC
	constant for FMA4 support.
	(UNSPEC_FMA4_FMADDSUB): Ditto.
	(UNSPEC_FMA4_FMSUBADD): Ditto.
	
	* config/i386/i386.opt (-mfma4): New switch for FMA4 support.
	
	* config/i386/i386-protos.h (ix86_fma4_valid_op_p): Add
	declaration.
	(ix86_expand_fma4_multiple_memory): Ditto.

	* config/i386/i386.c (OPTION_MASK_ISA_FMA4_SET): New.
	(OPTION_MASK_ISA_FMA4_UNSET): New.	
	(OPTION_MASK_ISA_SSE4A_UNSET): Change definition to
	depend on FMA4.
	(OPTION_MASK_ISA_AVX_UNSET): Change definition to
	depend on FMA4.
	(ix86_handle_option): Handle -mfma4.
	(isa_opts): Handle -mfma4.
	(enum pta_flags): Add PTA_FMA4.
	(override_options): Add FMA4 support.	
	(IX86_BUILTIN_VFMADDSS): New for FMA4 intrinsic.
	(IX86_BUILTIN_VFMADDSD): Ditto.
	(IX86_BUILTIN_VFMADDPS): Ditto.
	(IX86_BUILTIN_VFMADDPD): Ditto.
	(IX86_BUILTIN_VFMSUBSS): Ditto.
	(IX86_BUILTIN_VFMSUBSD): Ditto.
	(IX86_BUILTIN_VFMSUBPS): Ditto.
	(IX86_BUILTIN_VFMSUBPD): Ditto.
	(IX86_BUILTIN_VFMADDSUBPS): Ditto.
	(IX86_BUILTIN_VFMADDSUBPD): Ditto.
	(IX86_BUILTIN_VFMSUBADDPS): Ditto.
	(IX86_BUILTIN_VFMSUBADDPD): Ditto.
	(IX86_BUILTIN_VFNMADDSS): Ditto.
	(IX86_BUILTIN_VFNMADDSD): Ditto.
	(IX86_BUILTIN_VFNMADDPS): Ditto.
	(IX86_BUILTIN_VFNMADDPD): Ditto.
	(IX86_BUILTIN_VFNMSUBSS): Ditto.
	(IX86_BUILTIN_VFNMSUBSD): Ditto.
	(IX86_BUILTIN_VFNMSUBPS): Ditto.
	(IX86_BUILTIN_VFNMSUBPD): Ditto.
	(IX86_BUILTIN_VFMADDPS256): Ditto.
	(IX86_BUILTIN_VFMADDPD256): Ditto.
	(IX86_BUILTIN_VFMSUBPS256): Ditto.
	(IX86_BUILTIN_VFMSUBPD256): Ditto.
	(IX86_BUILTIN_VFMADDSUBPS256): Ditto.
	(IX86_BUILTIN_VFMADDSUBPD256): Ditto.
	(IX86_BUILTIN_VFMSUBADDPS256): Ditto.
	(IX86_BUILTIN_VFMSUBADDPD256): Ditto.
	(IX86_BUILTIN_VFNMADDPS256): Ditto.
	(IX86_BUILTIN_VFNMADDPD256): Ditto.
	(IX86_BUILTIN_VFNMSUBPS256): Ditto.
	(IX86_BUILTIN_VFNMSUBPD256): Ditto.
	(enum multi_arg_type): New enum for describing the various FMA4
	intrinsic argument types.
	(bdesc_multi_arg): New table for FMA4 intrinsics.
	(ix86_init_mmx_sse_builtins): Add FMA4 intrinsic support.
	(ix86_expand_multi_arg_builtin): New function for creating FMA4
	intrinsics.
	(ix86_expand_builtin): Add FMA4 intrinsic support.
	(ix86_fma4_valid_op_p): New function to validate FMA4 3 and 4
	operand instructions.
	(ix86_expand_fma4_multiple_memory): New function to split the
	second memory reference from FMA4 instructions.

	* config/i386/sse.md (ssemodesuffixf4): New mode attribute for FMA4.
	(ssemodesuffixf2s): Ditto.

	(fma4_fmadd<mode>4): Add FMA4 floating point multiply/add
	instructions.
	(fma4_fmsub<mode>4): Ditto.
	(fma4_fnmadd<mode>4): Ditto.
	(fma4_fnmsub<mode>4): Ditto.
	(fma4_vmfmadd<mode>4): Ditto.
	(fma4_vmfmsub<mode>4): Ditto.
	(fma4_vmfnmadd<mode>4): Ditto.
	(fma4_vmfnmsub<mode>4): Ditto.
	(fma4_fmadd<mode>4256): Ditto.
	(fma4_fmsub<mode>4256): Ditto.
	(fma4_fnmadd<mode>4256): Ditto.
	(fma4_fnmsub<mode>4256): Ditto.
	(fma4_fmaddsubv8sf4): Ditto.
	(fma4_fmaddsubv4sf4): Ditto.
	(fma4_fmaddsubv4df4): Ditto.
	(fma4_fmaddsubv2df4): Ditto.
	(fma4_fmsubaddv8sf4): Ditto.
	(fma4_fmsubaddv4sf4): Ditto.
	(fma4_fmsubaddv4df4): Ditto.
	(fma4_fmsubaddv2df4): Ditto.
	(fma4i_fmadd<mode>4): Add FMA4 floating point multiply/add
	instructions for intrinsics.
	(fma4i_fmsub<mode>4): Ditto.
	(fma4i_fnmadd<mode>4): Ditto.
	(fma4i_fnmsub<mode>4): Ditto.
	(fma4i_vmfmadd<mode>4): Ditto.
	(fma4i_vmfmsub<mode>4): Ditto.
	(fma4i_vmfnmadd<mode>4): Ditto.
	(fma4i_vmfnmsub<mode>4): Ditto.
	(fma4i_fmadd<mode>4256): Ditto.
	(fma4i_fmsub<mode>4256): Ditto.
	(fma4i_fnmadd<mode>4256): Ditto.
	(fma4i_fnmsub<mode>4256): Ditto.
	(fma4i_fmaddsubv8sf4): Ditto.
	(fma4i_fmaddsubv4sf4): Ditto.
	(fma4i_fmaddsubv4df4): Ditto.
	(fma4i_fmaddsubv2df4): Ditto.
	(fma4i_fmsubaddv8sf4): Ditto.
	(fma4i_fmsubaddv4sf4): Ditto.
	(fma4i_fmsubaddv4df4): Ditto.
	(fma4i_fmsubaddv2df4): Ditto.


diff -upNw gcc-xop-fma4/gcc/config.gcc gcc-xop/gcc/config.gcc
--- gcc-xop-fma4/gcc/config.gcc	2009-09-21 18:45:47.000000000 -0500
+++ gcc-xop/gcc/config.gcc	2009-09-21 23:27:00.000000000 -0500
@@ -286,8 +286,9 @@ i[34567]86-*-*)
 	cxx_target_objs="i386-c.o"
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
-		       nmmintrin.h bmmintrin.h wmmintrin.h immintrin.h
-		       x86intrin.h avxintrin.h ia32intrin.h cross-stdarg.h"
+		       nmmintrin.h bmmintrin.h fma4intrin.h wmmintrin.h
+		       immintrin.h x86intrin.h avxintrin.h 
+		       ia32intrin.h cross-stdarg.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
@@ -295,8 +296,9 @@ x86_64-*-*)
 	cxx_target_objs="i386-c.o"
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
-		       nmmintrin.h bmmintrin.h wmmintrin.h immintrin.h
-		       x86intrin.h avxintrin.h ia32intrin.h cross-stdarg.h"
+		       nmmintrin.h bmmintrin.h fma4intrin.h wmmintrin.h
+		       immintrin.h x86intrin.h avxintrin.h 
+		       ia32intrin.h cross-stdarg.h"
 	need_64bit_hwint=yes
 	;;
 ia64-*-*)
diff -upNw gcc-xop-fma4/gcc/doc/extend.texi gcc-xop/gcc/doc/extend.texi
--- gcc-xop-fma4/gcc/doc/extend.texi	2009-09-21 19:01:27.000000000 -0500
+++ gcc-xop/gcc/doc/extend.texi	2009-09-17 15:34:16.000000000 -0500
@@ -3168,6 +3168,11 @@ Enable/disable the generation of the sse
 @cindex @code{target("sse4a")} attribute
 Enable/disable the generation of the SSE4A instructions.
 
+@item fma4
+@itemx no-fma4
+@cindex @code{target("fma4")} attribute
+Enable/disable the generation of the FMA4 instructions.
+
 @item ssse3
 @itemx no-ssse3
 @cindex @code{target("ssse3")} attribute
@@ -8888,5 +8893,45 @@ v2di __builtin_ia32_insertq (v2di, v2di)
 v2di __builtin_ia32_insertqi (v2di, v2di, const unsigned int, const unsigned int)
 @end smallexample
 
+The following built-in functions are available when @option{-mfma4} is used.
+All of them generate the machine instruction that is part of the name
+with MMX registers.
+
+@smallexample
+v2df __builtin_ia32_fmaddpd (v2df, v2df, v2df)
+v4sf __builtin_ia32_fmaddps (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_fmaddsd (v2df, v2df, v2df)
+v4sf __builtin_ia32_fmaddss (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_fmsubpd (v2df, v2df, v2df)
+v4sf __builtin_ia32_fmsubps (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_fmsubsd (v2df, v2df, v2df)
+v4sf __builtin_ia32_fmsubss (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_fnmaddpd (v2df, v2df, v2df)
+v4sf __builtin_ia32_fnmaddps (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_fnmaddsd (v2df, v2df, v2df)
+v4sf __builtin_ia32_fnmaddss (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_fnmsubpd (v2df, v2df, v2df)
+v4sf __builtin_ia32_fnmsubps (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_fnmsubsd (v2df, v2df, v2df)
+v4sf __builtin_ia32_fnmsubss (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_fmaddsubpd  (v2df, v2df, v2df)
+v4sf __builtin_ia32_fmaddsubps  (v4sf, v4sf, v4sf)
+v2df __builtin_ia32_fmsubaddpd  (v2df, v2df, v2df)
+v4sf __builtin_ia32_fmsubaddps  (v4sf, v4sf, v4sf)
+v4df __builtin_ia32_fmaddpd256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_fmaddps256 (v8sf, v8sf, v8sf)
+v4df __builtin_ia32_fmsubpd256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_fmsubps256 (v8sf, v8sf, v8sf)
+v4df __builtin_ia32_fnmaddpd256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_fnmaddps256 (v8sf, v8sf, v8sf)
+v4df __builtin_ia32_fnmsubpd256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_fnmsubps256 (v8sf, v8sf, v8sf)
+v4df __builtin_ia32_fmaddsubpd256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_fmaddsubps256 (v8sf, v8sf, v8sf)
+v4df __builtin_ia32_fmsubaddpd256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_fmsubaddps256 (v8sf, v8sf, v8sf)
+
+@end smallexample
+
 The following built-in functions are available when @option{-m3dnow} is used.
 All of them generate the machine instruction that is part of the name.
diff -upNw gcc-xop-fma4/gcc/doc/invoke.texi gcc-xop/gcc/doc/invoke.texi
--- gcc-xop-fma4/gcc/doc/invoke.texi	2009-09-21 18:20:30.000000000 -0500
+++ gcc-xop/gcc/doc/invoke.texi	2009-09-21 10:10:55.000000000 -0500
@@ -591,7 +591,7 @@ Objective-C and Objective-C++ Dialects}.
 -mcld -mcx16 -msahf -mmovbe -mcrc32 -mrecip @gol
 -mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol
 -maes -mpclmul @gol
--msse4a -m3dnow -mpopcnt -mabm @gol
+-msse4a -m3dnow -mpopcnt -mabm -mfma4 @gol
 -mthreads  -mno-align-stringops  -minline-all-stringops @gol
 -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
 -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
@@ -11716,6 +11716,8 @@ preferred alignment to @option{-mpreferr
 @itemx -mno-pclmul
 @itemx -msse4a
 @itemx -mno-sse4a
+@itemx -mfma4
+@itemx -mno-fma4
 @itemx -m3dnow
 @itemx -mno-3dnow
 @itemx -mpopcnt
@@ -11729,7 +11731,7 @@ preferred alignment to @option{-mpreferr
 @opindex m3dnow
 @opindex mno-3dnow
 These switches enable or disable the use of instructions in the MMX,
-SSE, SSE2, SSE3, SSSE3, SSE4.1, AVX, AES, PCLMUL, SSE4A, ABM or
+SSE, SSE2, SSE3, SSSE3, SSE4.1, AVX, AES, PCLMUL, SSE4A, FMA4, ABM or
 3DNow!@: extended instruction sets.
 These extensions are also available as built-in functions: see
 @ref{X86 Built-in Functions}, for details of the functions enabled and
diff -upNw gcc-xop-fma4/gcc/config/i386/cpuid.h gcc-xop/gcc/config/i386/cpuid.h
--- gcc-xop-fma4/gcc/config/i386/cpuid.h	2009-09-21 18:58:58.000000000 -0500
+++ gcc-xop/gcc/config/i386/cpuid.h	2009-09-17 12:20:06.000000000 -0500
@@ -48,6 +48,7 @@
 /* %ecx */
 #define bit_LAHF_LM	(1 << 0)
 #define bit_SSE4a	(1 << 6)
+#define bit_FMA4	(1 << 16)
 
 /* %edx */
 #define bit_LM		(1 << 29)
diff -upNw gcc-xop-fma4/gcc/config/i386/fma4intrin.h gcc-xop/gcc/config/i386/fma4intrin.h
--- gcc-xop-fma4/gcc/config/i386/fma4intrin.h	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/config/i386/fma4intrin.h	2009-09-22 09:43:14.000000000 -0500
@@ -0,0 +1,245 @@
+/* Copyright (C) 2007, 2008, 2009 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _X86INTRIN_H_INCLUDED
+# error "Never use <fma4intrin.h> directly; include <x86intrin.h> instead."
+#endif
+
+#ifndef _FMA4INTRIN_H_INCLUDED
+#define _FMA4INTRIN_H_INCLUDED
+
+#ifndef __FMA4__
+# error "FMA4 instruction set not enabled"
+#else
+
+/* We need definitions from the SSE4A, SSE3, SSE2 and SSE header files.  */
+#include <ammintrin.h>
+
+/* Internal data types for implementing the intrinsics.  */
+typedef float __v8sf __attribute__ ((__vector_size__ (32)));
+typedef double __v4df __attribute__ ((__vector_size__ (32)));
+
+typedef float __m256 __attribute__ ((__vector_size__ (32),
+				     __may_alias__));
+typedef double __m256d __attribute__ ((__vector_size__ (32),
+				       __may_alias__));
+
+/* 128b Floating point multiply/add type instructions.  */
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_macc_ps (__m128 __A, __m128 __B, __m128 __C)
+{
+  return (__m128) __builtin_ia32_vfmaddps ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_macc_pd (__m128d __A, __m128d __B, __m128d __C)
+{
+  return (__m128d) __builtin_ia32_vfmaddpd ((__v2df)__A, (__v2df)__B, (__v2df)__C);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_macc_ss (__m128 __A, __m128 __B, __m128 __C)
+{
+  return (__m128) __builtin_ia32_vfmaddss ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_macc_sd (__m128d __A, __m128d __B, __m128d __C)
+{
+  return (__m128d) __builtin_ia32_vfmaddsd ((__v2df)__A, (__v2df)__B, (__v2df)__C);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_msub_ps (__m128 __A, __m128 __B, __m128 __C)
+
+{
+  return (__m128) __builtin_ia32_vfmsubps ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_msub_pd (__m128d __A, __m128d __B, __m128d __C)
+{
+  return (__m128d) __builtin_ia32_vfmsubpd ((__v2df)__A, (__v2df)__B, (__v2df)__C);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_msub_ss (__m128 __A, __m128 __B, __m128 __C)
+{
+  return (__m128) __builtin_ia32_vfmsubss ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_msub_sd (__m128d __A, __m128d __B, __m128d __C)
+{
+  return (__m128d) __builtin_ia32_vfmsubsd ((__v2df)__A, (__v2df)__B, (__v2df)__C);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_nmacc_ps (__m128 __A, __m128 __B, __m128 __C)
+{
+  return (__m128) __builtin_ia32_vfnmaddps ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_nmacc_pd (__m128d __A, __m128d __B, __m128d __C)
+{
+  return (__m128d) __builtin_ia32_vfnmaddpd ((__v2df)__A, (__v2df)__B, (__v2df)__C);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_nmacc_ss (__m128 __A, __m128 __B, __m128 __C)
+{
+  return (__m128) __builtin_ia32_vfnmaddss ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_nmacc_sd (__m128d __A, __m128d __B, __m128d __C)
+{
+  return (__m128d) __builtin_ia32_vfnmaddsd ((__v2df)__A, (__v2df)__B, (__v2df)__C);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_nmsub_ps (__m128 __A, __m128 __B, __m128 __C)
+{
+  return (__m128) __builtin_ia32_vfnmsubps ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_nmsub_pd (__m128d __A, __m128d __B, __m128d __C)
+{
+  return (__m128d) __builtin_ia32_vfnmsubpd ((__v2df)__A, (__v2df)__B, (__v2df)__C);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_nmsub_ss (__m128 __A, __m128 __B, __m128 __C)
+{
+  return (__m128) __builtin_ia32_vfnmsubss ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_nmsub_sd (__m128d __A, __m128d __B, __m128d __C)
+{
+  return (__m128d) __builtin_ia32_vfnmsubsd ((__v2df)__A, (__v2df)__B, (__v2df)__C);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maddsub_ps (__m128 __A, __m128 __B, __m128 __C)
+{
+  return (__m128) __builtin_ia32_vfmaddsubps ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maddsub_pd (__m128d __A, __m128d __B, __m128d __C)
+{
+  return (__m128d) __builtin_ia32_vfmaddsubpd ((__v2df)__A, (__v2df)__B, (__v2df)__C);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_msubadd_ps (__m128 __A, __m128 __B, __m128 __C)
+{
+  return (__m128) __builtin_ia32_vfmsubaddps ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C);
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_msubadd_pd (__m128d __A, __m128d __B, __m128d __C)
+{
+  return (__m128d) __builtin_ia32_vfmsubaddpd ((__v2df)__A, (__v2df)__B, (__v2df)__C);
+}
+
+/* 256b Floating point multiply/add type instructions.  */
+extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_macc_ps (__m256 __A, __m256 __B, __m256 __C)
+{
+  return (__m256) __builtin_ia32_vfmaddps256 ((__v8sf)__A, (__v8sf)__B, (__v8sf)__C);
+}
+
+extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_macc_pd (__m256d __A, __m256d __B, __m256d __C)
+{
+  return (__m256d) __builtin_ia32_vfmaddpd256 ((__v4df)__A, (__v4df)__B, (__v4df)__C);
+}
+
+extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_msub_ps (__m256 __A, __m256 __B, __m256 __C)
+
+{
+  return (__m256) __builtin_ia32_vfmsubps256 ((__v8sf)__A, (__v8sf)__B, (__v8sf)__C);
+}
+
+extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_msub_pd (__m256d __A, __m256d __B, __m256d __C)
+{
+  return (__m256d) __builtin_ia32_vfmsubpd256 ((__v4df)__A, (__v4df)__B, (__v4df)__C);
+}
+
+extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_nmacc_ps (__m256 __A, __m256 __B, __m256 __C)
+{
+  return (__m256) __builtin_ia32_vfnmaddps256 ((__v8sf)__A, (__v8sf)__B, (__v8sf)__C);
+}
+
+extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_nmacc_pd (__m256d __A, __m256d __B, __m256d __C)
+{
+  return (__m256d) __builtin_ia32_vfnmaddpd256 ((__v4df)__A, (__v4df)__B, (__v4df)__C);
+}
+
+extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_nmsub_ps (__m256 __A, __m256 __B, __m256 __C)
+{
+  return (__m256) __builtin_ia32_vfnmsubps256 ((__v8sf)__A, (__v8sf)__B, (__v8sf)__C);
+}
+
+extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_nmsub_pd (__m256d __A, __m256d __B, __m256d __C)
+{
+  return (__m256d) __builtin_ia32_vfnmsubpd256 ((__v4df)__A, (__v4df)__B, (__v4df)__C);
+}
+
+extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maddsub_ps (__m256 __A, __m256 __B, __m256 __C)
+{
+  return (__m256) __builtin_ia32_vfmaddsubps256 ((__v8sf)__A, (__v8sf)__B, (__v8sf)__C);
+}
+
+extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_maddsub_pd (__m256d __A, __m256d __B, __m256d __C)
+{
+  return (__m256d) __builtin_ia32_vfmaddsubpd256 ((__v4df)__A, (__v4df)__B, (__v4df)__C);
+}
+
+extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_msubadd_ps (__m256 __A, __m256 __B, __m256 __C)
+{
+  return (__m256) __builtin_ia32_vfmsubaddps256 ((__v8sf)__A, (__v8sf)__B, (__v8sf)__C);
+}
+
+extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_msubadd_pd (__m256d __A, __m256d __B, __m256d __C)
+{
+  return (__m256d) __builtin_ia32_vfmsubaddpd256 ((__v4df)__A, (__v4df)__B, (__v4df)__C);
+}
+
+#endif
+
+#endif
diff -upNw gcc-xop-fma4/gcc/config/i386/i386.c gcc-xop/gcc/config/i386/i386.c
--- gcc-xop-fma4/gcc/config/i386/i386.c	2009-09-21 18:20:31.000000000 -0500
+++ gcc-xop/gcc/config/i386/i386.c	2009-09-22 11:14:17.000000000 -0500
@@ -1954,6 +1954,9 @@ static int ix86_isa_flags_explicit;
 
 #define OPTION_MASK_ISA_SSE4A_SET \
   (OPTION_MASK_ISA_SSE4A | OPTION_MASK_ISA_SSE3_SET)
+#define OPTION_MASK_ISA_FMA4_SET \
+  (OPTION_MASK_ISA_FMA4 | OPTION_MASK_ISA_SSE4A_SET \
+   | OPTION_MASK_ISA_AVX_SET)
 
 /* AES and PCLMUL need SSE2 because they use xmm registers */
 #define OPTION_MASK_ISA_AES_SET \
@@ -1994,7 +1997,8 @@ static int ix86_isa_flags_explicit;
 #define OPTION_MASK_ISA_SSE4_2_UNSET \
   (OPTION_MASK_ISA_SSE4_2 | OPTION_MASK_ISA_AVX_UNSET )
 #define OPTION_MASK_ISA_AVX_UNSET \
-  (OPTION_MASK_ISA_AVX | OPTION_MASK_ISA_FMA_UNSET)
+  (OPTION_MASK_ISA_AVX | OPTION_MASK_ISA_FMA_UNSET \
+   | OPTION_MASK_ISA_FMA4_UNSET)
 #define OPTION_MASK_ISA_FMA_UNSET OPTION_MASK_ISA_FMA
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
@@ -2002,7 +2006,10 @@ static int ix86_isa_flags_explicit;
 #define OPTION_MASK_ISA_SSE4_UNSET OPTION_MASK_ISA_SSE4_1_UNSET
 
 #define OPTION_MASK_ISA_SSE4A_UNSET \
-  (OPTION_MASK_ISA_SSE4A)
+  (OPTION_MASK_ISA_SSE4A | OPTION_MASK_ISA_FMA4_UNSET)
+
+#define OPTION_MASK_ISA_FMA4_UNSET OPTION_MASK_ISA_FMA4
+
 #define OPTION_MASK_ISA_AES_UNSET OPTION_MASK_ISA_AES
 #define OPTION_MASK_ISA_PCLMUL_UNSET OPTION_MASK_ISA_PCLMUL
 #define OPTION_MASK_ISA_ABM_UNSET OPTION_MASK_ISA_ABM
@@ -2236,6 +2243,19 @@ ix86_handle_option (size_t code, const c
 	}
       return true;
 
+    case OPT_mfma4:
+      if (value)
+	{
+	  ix86_isa_flags |= OPTION_MASK_ISA_FMA4_SET;
+	  ix86_isa_flags_explicit |= OPTION_MASK_ISA_FMA4_SET;
+	}
+      else
+	{
+	  ix86_isa_flags &= ~OPTION_MASK_ISA_FMA4_UNSET;
+	  ix86_isa_flags_explicit |= OPTION_MASK_ISA_FMA4_UNSET;
+	}
+      return true;
+
     case OPT_mabm:
       if (value)
 	{
@@ -2363,6 +2383,7 @@ ix86_target_string (int isa, int flags, 
   static struct ix86_target_opts isa_opts[] =
   {
     { "-m64",		OPTION_MASK_ISA_64BIT },
+    { "-mfma4",		OPTION_MASK_ISA_FMA4 },
     { "-msse4a",	OPTION_MASK_ISA_SSE4A },
     { "-msse4.2",	OPTION_MASK_ISA_SSE4_2 },
     { "-msse4.1",	OPTION_MASK_ISA_SSE4_1 },
@@ -2592,7 +2613,8 @@ override_options (bool main_args_p)
       PTA_PCLMUL = 1 << 17,
       PTA_AVX = 1 << 18,
       PTA_FMA = 1 << 19,
-      PTA_MOVBE = 1 << 20
+      PTA_MOVBE = 1 << 20,
+      PTA_FMA4 = 1 << 21
     };
 
   static struct pta
@@ -2935,6 +2957,9 @@ override_options (bool main_args_p)
 	if (processor_alias_table[i].flags & PTA_SSE4A
 	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_SSE4A))
 	  ix86_isa_flags |= OPTION_MASK_ISA_SSE4A;
+	if (processor_alias_table[i].flags & PTA_FMA4
+	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_FMA4))
+	  ix86_isa_flags |= OPTION_MASK_ISA_FMA4;
 	if (processor_alias_table[i].flags & PTA_ABM
 	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_ABM))
 	  ix86_isa_flags |= OPTION_MASK_ISA_ABM;
@@ -3618,6 +3643,7 @@ ix86_valid_target_attribute_inner_p (tre
     IX86_ATTR_ISA ("sse4.2",	OPT_msse4_2),
     IX86_ATTR_ISA ("sse4a",	OPT_msse4a),
     IX86_ATTR_ISA ("ssse3",	OPT_mssse3),
+    IX86_ATTR_ISA ("fma4",	OPT_mfma4),
 
     /* string options */
     IX86_ATTR_STR ("arch=",	IX86_FUNCTION_SPECIFIC_ARCH),
@@ -20552,6 +20578,39 @@ enum ix86_builtins
 
   IX86_BUILTIN_CVTUDQ2PS,
 
+  /* FMA4 instructions.  */
+  IX86_BUILTIN_VFMADDSS,
+  IX86_BUILTIN_VFMADDSD,
+  IX86_BUILTIN_VFMADDPS,
+  IX86_BUILTIN_VFMADDPD,
+  IX86_BUILTIN_VFMSUBSS,
+  IX86_BUILTIN_VFMSUBSD,
+  IX86_BUILTIN_VFMSUBPS,
+  IX86_BUILTIN_VFMSUBPD,
+  IX86_BUILTIN_VFMADDSUBPS,
+  IX86_BUILTIN_VFMADDSUBPD,
+  IX86_BUILTIN_VFMSUBADDPS,
+  IX86_BUILTIN_VFMSUBADDPD,
+  IX86_BUILTIN_VFNMADDSS,
+  IX86_BUILTIN_VFNMADDSD,
+  IX86_BUILTIN_VFNMADDPS,
+  IX86_BUILTIN_VFNMADDPD,
+  IX86_BUILTIN_VFNMSUBSS,
+  IX86_BUILTIN_VFNMSUBSD,
+  IX86_BUILTIN_VFNMSUBPS,
+  IX86_BUILTIN_VFNMSUBPD,
+  IX86_BUILTIN_VFMADDPS256,
+  IX86_BUILTIN_VFMADDPD256,
+  IX86_BUILTIN_VFMSUBPS256,
+  IX86_BUILTIN_VFMSUBPD256,
+  IX86_BUILTIN_VFMADDSUBPS256,
+  IX86_BUILTIN_VFMADDSUBPD256,
+  IX86_BUILTIN_VFMSUBADDPS256,
+  IX86_BUILTIN_VFMSUBADDPD256,
+  IX86_BUILTIN_VFNMADDPS256,
+  IX86_BUILTIN_VFNMADDPD256,
+  IX86_BUILTIN_VFNMSUBPS256,
+  IX86_BUILTIN_VFNMSUBPD256,
   IX86_BUILTIN_MAX
 };
 
@@ -21625,6 +21684,56 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_movmskps256, "__builtin_ia32_movmskps256", IX86_BUILTIN_MOVMSKPS256, UNKNOWN, (int) INT_FTYPE_V8SF },
 };
 
+/* FMA4.  */
+enum multi_arg_type {
+  MULTI_ARG_UNKNOWN,
+  MULTI_ARG_3_SF,
+  MULTI_ARG_3_DF,
+  MULTI_ARG_3_SF2,
+  MULTI_ARG_3_DF2
+};
+
+static const struct builtin_description bdesc_multi_arg[] =
+{
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_vmfmaddv4sf4,     "__builtin_ia32_vfmaddss",    IX86_BUILTIN_VFMADDSS,    UNKNOWN,      (int)MULTI_ARG_3_SF },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_vmfmaddv2df4,     "__builtin_ia32_vfmaddsd",    IX86_BUILTIN_VFMADDSD,    UNKNOWN,      (int)MULTI_ARG_3_DF },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmaddv4sf4,       "__builtin_ia32_vfmaddps",    IX86_BUILTIN_VFMADDPS,    UNKNOWN,      (int)MULTI_ARG_3_SF },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmaddv2df4,       "__builtin_ia32_vfmaddpd",    IX86_BUILTIN_VFMADDPD,    UNKNOWN,      (int)MULTI_ARG_3_DF },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_vmfmsubv4sf4,     "__builtin_ia32_vfmsubss",    IX86_BUILTIN_VFMSUBSS,    UNKNOWN,      (int)MULTI_ARG_3_SF },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_vmfmsubv2df4,     "__builtin_ia32_vfmsubsd",    IX86_BUILTIN_VFMSUBSD,    UNKNOWN,      (int)MULTI_ARG_3_DF },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsubv4sf4,       "__builtin_ia32_vfmsubps",    IX86_BUILTIN_VFMSUBPS,    UNKNOWN,      (int)MULTI_ARG_3_SF },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsubv2df4,       "__builtin_ia32_vfmsubpd",    IX86_BUILTIN_VFMSUBPD,    UNKNOWN,      (int)MULTI_ARG_3_DF },
+    
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_vmfnmaddv4sf4,    "__builtin_ia32_vfnmaddss",   IX86_BUILTIN_VFNMADDSS,   UNKNOWN,      (int)MULTI_ARG_3_SF },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_vmfnmaddv2df4,    "__builtin_ia32_vfnmaddsd",   IX86_BUILTIN_VFNMADDSD,   UNKNOWN,      (int)MULTI_ARG_3_DF },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fnmaddv4sf4,      "__builtin_ia32_vfnmaddps",   IX86_BUILTIN_VFNMADDPS,   UNKNOWN,      (int)MULTI_ARG_3_SF },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fnmaddv2df4,      "__builtin_ia32_vfnmaddpd",   IX86_BUILTIN_VFNMADDPD,   UNKNOWN,      (int)MULTI_ARG_3_DF },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_vmfnmsubv4sf4,    "__builtin_ia32_vfnmsubss",   IX86_BUILTIN_VFNMSUBSS,   UNKNOWN,      (int)MULTI_ARG_3_SF },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_vmfnmsubv2df4,    "__builtin_ia32_vfnmsubsd",   IX86_BUILTIN_VFNMSUBSD,   UNKNOWN,      (int)MULTI_ARG_3_DF },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fnmsubv4sf4,      "__builtin_ia32_vfnmsubps",   IX86_BUILTIN_VFNMSUBPS,   UNKNOWN,      (int)MULTI_ARG_3_SF },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fnmsubv2df4,      "__builtin_ia32_vfnmsubpd",   IX86_BUILTIN_VFNMSUBPD,   UNKNOWN,      (int)MULTI_ARG_3_DF },
+
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmaddsubv4sf4,	   "__builtin_ia32_vfmaddsubps", IX86_BUILTIN_VFMADDSUBPS,    UNKNOWN,      (int)MULTI_ARG_3_SF },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmaddsubv2df4,	   "__builtin_ia32_vfmaddsubpd", IX86_BUILTIN_VFMADDSUBPD,    UNKNOWN,      (int)MULTI_ARG_3_DF },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsubaddv4sf4,	   "__builtin_ia32_vfmsubaddps", IX86_BUILTIN_VFMSUBADDPS,    UNKNOWN,      (int)MULTI_ARG_3_SF },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsubaddv2df4,	   "__builtin_ia32_vfmsubaddpd", IX86_BUILTIN_VFMSUBADDPD,    UNKNOWN,      (int)MULTI_ARG_3_DF },
+
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmaddv8sf4256,       "__builtin_ia32_vfmaddps256",    IX86_BUILTIN_VFMADDPS256,    UNKNOWN,      (int)MULTI_ARG_3_SF2 },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmaddv4df4256,       "__builtin_ia32_vfmaddpd256",    IX86_BUILTIN_VFMADDPD256,    UNKNOWN,      (int)MULTI_ARG_3_DF2 },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsubv8sf4256,       "__builtin_ia32_vfmsubps256",    IX86_BUILTIN_VFMSUBPS256,    UNKNOWN,      (int)MULTI_ARG_3_SF2 },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsubv4df4256,       "__builtin_ia32_vfmsubpd256",    IX86_BUILTIN_VFMSUBPD256,    UNKNOWN,      (int)MULTI_ARG_3_DF2 },
+  
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fnmaddv8sf4256,      "__builtin_ia32_vfnmaddps256",   IX86_BUILTIN_VFNMADDPS256,   UNKNOWN,      (int)MULTI_ARG_3_SF2 },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fnmaddv4df4256,      "__builtin_ia32_vfnmaddpd256",   IX86_BUILTIN_VFNMADDPD256,   UNKNOWN,      (int)MULTI_ARG_3_DF2 },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fnmsubv8sf4256,      "__builtin_ia32_vfnmsubps256",   IX86_BUILTIN_VFNMSUBPS256,   UNKNOWN,      (int)MULTI_ARG_3_SF2 },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fnmsubv4df4256,      "__builtin_ia32_vfnmsubpd256",   IX86_BUILTIN_VFNMSUBPD256,   UNKNOWN,      (int)MULTI_ARG_3_DF2 },
+
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmaddsubv8sf4,	   "__builtin_ia32_vfmaddsubps256", IX86_BUILTIN_VFMADDSUBPS256,    UNKNOWN,      (int)MULTI_ARG_3_SF2 },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmaddsubv4df4,	   "__builtin_ia32_vfmaddsubpd256", IX86_BUILTIN_VFMADDSUBPD256,    UNKNOWN,      (int)MULTI_ARG_3_DF2 },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsubaddv8sf4,	   "__builtin_ia32_vfmsubaddps256", IX86_BUILTIN_VFMSUBADDPS256,    UNKNOWN,      (int)MULTI_ARG_3_SF2 },
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsubaddv4df4,	   "__builtin_ia32_vfmsubaddpd256", IX86_BUILTIN_VFMSUBADDPD256,    UNKNOWN,      (int)MULTI_ARG_3_DF2 }
+  
+};
 
 /* Set up all the MMX/SSE builtins, even builtins for instructions that are not
    in the current target ISA to allow the user to compile particular modules
@@ -23058,6 +23167,29 @@ ix86_init_mmx_sse_builtins (void)
 				    intQI_type_node,
 				    integer_type_node, NULL_TREE);
   def_builtin_const (OPTION_MASK_ISA_SSE4_1, "__builtin_ia32_vec_set_v16qi", ftype, IX86_BUILTIN_VEC_SET_V16QI);
+  /* Add FMA4 multi-arg argument instructions */
+  for (i = 0, d = bdesc_multi_arg; i < ARRAY_SIZE (bdesc_multi_arg); i++, d++)
+    {
+      tree mtype = NULL_TREE;
+
+      if (d->name == 0)
+	continue;
+
+      switch ((enum multi_arg_type)d->flag)
+	{
+	case MULTI_ARG_3_SF:     mtype = v4sf_ftype_v4sf_v4sf_v4sf; 	break;
+	case MULTI_ARG_3_DF:     mtype = v2df_ftype_v2df_v2df_v2df; 	break;
+	case MULTI_ARG_3_SF2:    mtype = v8sf_ftype_v8sf_v8sf_v8sf; 	break;
+	case MULTI_ARG_3_DF2:    mtype = v4df_ftype_v4df_v4df_v4df; 	break;
+
+	case MULTI_ARG_UNKNOWN:
+	default:
+	  gcc_unreachable ();
+	}
+
+      if (mtype)
+	def_builtin_const (d->mask, d->name, mtype, d->code);
+    }
 }
 
 /* Internal method for ix86_init_builtins.  */
@@ -23230,6 +23362,122 @@ ix86_expand_binop_builtin (enum insn_cod
   return target;
 }
 
+/* Subroutine of ix86_expand_builtin to take care of 2-4 argument insns.  */
+
+static rtx
+ix86_expand_multi_arg_builtin (enum insn_code icode, tree exp, rtx target,
+			       enum multi_arg_type m_type,
+			       enum rtx_code sub_code)
+{
+  rtx pat;
+  int i;
+  int nargs;
+  bool comparison_p = false;
+  bool tf_p = false;
+  bool last_arg_constant = false;
+  int num_memory = 0;
+  struct {
+    rtx op;
+    enum machine_mode mode;
+  } args[4];
+
+  enum machine_mode tmode = insn_data[icode].operand[0].mode;
+
+  switch (m_type)
+    {
+    case MULTI_ARG_3_SF:
+    case MULTI_ARG_3_DF:
+    case MULTI_ARG_3_SF2:
+    case MULTI_ARG_3_DF2:
+      nargs = 3;
+      break;
+
+    case MULTI_ARG_UNKNOWN:
+    default:
+      gcc_unreachable ();
+    }
+
+  if (optimize || !target
+      || GET_MODE (target) != tmode
+      || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
+    target = gen_reg_rtx (tmode);
+
+  gcc_assert (nargs <= 4);
+
+  for (i = 0; i < nargs; i++)
+    {
+      tree arg = CALL_EXPR_ARG (exp, i);
+      rtx op = expand_normal (arg);
+      int adjust = (comparison_p) ? 1 : 0;
+      enum machine_mode mode = insn_data[icode].operand[i+adjust+1].mode;
+
+      if (last_arg_constant && i == nargs-1)
+	{
+	  if (!CONST_INT_P (op))
+	    {
+	      error ("last argument must be an immediate");
+	      return gen_reg_rtx (tmode);
+	    }
+	}
+      else
+	{
+	  if (VECTOR_MODE_P (mode))
+	    op = safe_vector_operand (op, mode);
+
+	  /* If we aren't optimizing, only allow one memory operand to be
+	     generated.  */
+	  if (memory_operand (op, mode))
+	    num_memory++;
+
+	  gcc_assert (GET_MODE (op) == mode || GET_MODE (op) == VOIDmode);
+
+	  if (optimize
+	      || ! (*insn_data[icode].operand[i+adjust+1].predicate) (op, mode)
+	      || num_memory > 1)
+	    op = force_reg (mode, op);
+	}
+
+      args[i].op = op;
+      args[i].mode = mode;
+    }
+
+  switch (nargs)
+    {
+    case 1:
+      pat = GEN_FCN (icode) (target, args[0].op);
+      break;
+
+    case 2:
+      if (tf_p)
+	pat = GEN_FCN (icode) (target, args[0].op, args[1].op,
+			       GEN_INT ((int)sub_code));
+      else if (! comparison_p)
+	pat = GEN_FCN (icode) (target, args[0].op, args[1].op);
+      else
+	{
+	  rtx cmp_op = gen_rtx_fmt_ee (sub_code, GET_MODE (target),
+				       args[0].op,
+				       args[1].op);
+
+	  pat = GEN_FCN (icode) (target, cmp_op, args[0].op, args[1].op);
+	}
+      break;
+
+    case 3:
+      pat = GEN_FCN (icode) (target, args[0].op, args[1].op, args[2].op);
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  if (! pat)
+    return 0;
+
+  emit_insn (pat);
+  return target;
+}
+
 /* Subroutine of ix86_expand_args_builtin to take care of scalar unop
    insns with vec_merge.  */
 
@@ -24499,6 +24747,12 @@ ix86_expand_builtin (tree exp, rtx targe
     if (d->code == fcode)
       return ix86_expand_sse_pcmpistr (d, exp, target);
 
+  for (i = 0, d = bdesc_multi_arg; i < ARRAY_SIZE (bdesc_multi_arg); i++, d++)
+    if (d->code == fcode)
+      return ix86_expand_multi_arg_builtin (d->icode, exp, target,
+					    (enum multi_arg_type)d->flag,
+					    d->comparison);
+
   gcc_unreachable ();
 }
 
@@ -28881,6 +29135,200 @@ ix86_expand_round (rtx operand0, rtx ope
   emit_move_insn (operand0, res);
 }
 
+/* Validate whether a FMA4 instruction is valid or not.
+   OPERANDS is the array of operands.
+   NUM is the number of operands.
+   USES_OC0 is true if the instruction uses OC0 and provides 4 variants.
+   NUM_MEMORY is the maximum number of memory operands to accept.
+   NUM_MEMORY less than zero is a special case to allow an operand
+   of an instruction to be memory operation.
+   when COMMUTATIVE is set, operand 1 and 2 can be swapped.  */
+
+bool
+ix86_fma4_valid_op_p (rtx operands[], rtx insn ATTRIBUTE_UNUSED, int num,
+		      bool uses_oc0, int num_memory, bool commutative)
+{
+  int mem_mask;
+  int mem_count;
+  int i;
+
+  /* Count the number of memory arguments */
+  mem_mask = 0;
+  mem_count = 0;
+  for (i = 0; i < num; i++)
+    {
+      enum machine_mode mode = GET_MODE (operands[i]);
+      if (register_operand (operands[i], mode))
+	;
+
+      else if (memory_operand (operands[i], mode))
+	{
+	  mem_mask |= (1 << i);
+	  mem_count++;
+	}
+
+      else
+	{
+	  rtx pattern = PATTERN (insn);
+
+	  /* allow 0 for pcmov */
+	  if (GET_CODE (pattern) != SET
+	      || GET_CODE (SET_SRC (pattern)) != IF_THEN_ELSE
+	      || i < 2
+	      || operands[i] != CONST0_RTX (mode))
+	    return false;
+	}
+    }
+
+  /* Special case pmacsdq{l,h} where we allow the 3rd argument to be
+     a memory operation.  */
+  if (num_memory < 0)
+    {
+      num_memory = -num_memory;
+      if ((mem_mask & (1 << (num-1))) != 0)
+	{
+	  mem_mask &= ~(1 << (num-1));
+	  mem_count--;
+	}
+    }
+
+  /* If there were no memory operations, allow the insn */
+  if (mem_mask == 0)
+    return true;
+
+  /* Do not allow the destination register to be a memory operand.  */
+  else if (mem_mask & (1 << 0))
+    return false;
+
+  /* If there are too many memory operations, disallow the instruction.  While
+     the hardware only allows 1 memory reference, before register allocation
+     for some insns, we allow two memory operations sometimes in order to allow
+     code like the following to be optimized:
+
+	float fmadd (float *a, float *b, float *c) { return (*a * *b) + *c; }
+
+    or similar cases that are vectorized into using the vfmaddss
+    instruction.  */
+  else if (mem_count > num_memory)
+    return false;
+
+  /* Don't allow more than one memory operation if not optimizing.  */
+  else if (mem_count > 1 && !optimize)
+    return false;
+
+  else if (num == 4 && mem_count == 1)
+    {
+      /* formats (destination is the first argument), example vfmaddss:
+	 xmm1, xmm1, xmm2, xmm3/mem
+	 xmm1, xmm1, xmm2/mem, xmm3
+	 xmm1, xmm2, xmm3/mem, xmm1
+	 xmm1, xmm2/mem, xmm3, xmm1 */
+      if (uses_oc0)
+	return ((mem_mask == (1 << 1))
+		|| (mem_mask == (1 << 2))
+		|| (mem_mask == (1 << 3)));
+
+      /* format, example vpmacsdd:
+	 xmm1, xmm2, xmm3/mem, xmm1 */
+      if (commutative)
+	return (mem_mask == (1 << 2) || mem_mask == (1 << 1));
+      else
+	return (mem_mask == (1 << 2));
+    }
+
+  else if (num == 4 && num_memory == 2)
+    {
+      /* If there are two memory operations, we can load one of the memory ops
+	 into the destination register.  This is for optimizing the
+	 multiply/add ops, which the combiner has optimized both the multiply
+	 and the add insns to have a memory operation.  We have to be careful
+	 that the destination doesn't overlap with the inputs.  */
+      rtx op0 = operands[0];
+
+      if (reg_mentioned_p (op0, operands[1])
+	  || reg_mentioned_p (op0, operands[2])
+	  || reg_mentioned_p (op0, operands[3]))
+	return false;
+
+      /* formats (destination is the first argument), example vfmaddss:
+	 xmm1, xmm1, xmm2, xmm3/mem
+	 xmm1, xmm1, xmm2/mem, xmm3
+	 xmm1, xmm2, xmm3/mem, xmm1
+	 xmm1, xmm2/mem, xmm3, xmm1
+
+         For the oc0 case, we will load either operands[1] or operands[3] into
+         operands[0], so any combination of 2 memory operands is ok.  */
+      if (uses_oc0)
+	return true;
+
+      /* format, example vpmacsdd:
+	 xmm1, xmm2, xmm3/mem, xmm1
+
+         For the integer multiply/add instructions be more restrictive and
+         require operands[2] and operands[3] to be the memory operands.  */
+      if (commutative)
+	return (mem_mask == ((1 << 1) | (1 << 3)) || ((1 << 2) | (1 << 3)));
+      else
+	return (mem_mask == ((1 << 2) | (1 << 3)));
+    }
+
+  else if (num == 3 && num_memory == 1)
+    {
+      /* formats, example vprotb:
+	 xmm1, xmm2, xmm3/mem
+	 xmm1, xmm2/mem, xmm3 */
+      if (uses_oc0)
+	return ((mem_mask == (1 << 1)) || (mem_mask == (1 << 2)));
+
+      /* format, example vpcomeq:
+	 xmm1, xmm2, xmm3/mem */
+      else
+	return (mem_mask == (1 << 2));
+    }
+
+  else
+    gcc_unreachable ();
+
+  return false;
+}
+
+
+/* Fixup an FMA4 instruction that has 2 memory input references into a form the
+   hardware will allow by using the destination register to load one of the
+   memory operations.  Presently this is used by the multiply/add routines to
+   allow 2 memory references.  */
+
+void
+ix86_expand_fma4_multiple_memory (rtx operands[],
+				  int num,
+				  enum machine_mode mode)
+{
+  rtx op0 = operands[0];
+  if (num != 4
+      || memory_operand (op0, mode)
+      || reg_mentioned_p (op0, operands[1])
+      || reg_mentioned_p (op0, operands[2])
+      || reg_mentioned_p (op0, operands[3]))
+    gcc_unreachable ();
+
+  /* For 2 memory operands, pick either operands[1] or operands[3] to move into
+     the destination register.  */
+  if (memory_operand (operands[1], mode))
+    {
+      emit_move_insn (op0, operands[1]);
+      operands[1] = op0;
+    }
+  else if (memory_operand (operands[3], mode))
+    {
+      emit_move_insn (op0, operands[3]);
+      operands[3] = op0;
+    }
+  else
+    gcc_unreachable ();
+
+  return;
+}
+
 /* Table of valid machine attributes.  */
 static const struct attribute_spec ix86_attribute_table[] =
 {
diff -upNw gcc-xop-fma4/gcc/config/i386/i386-c.c gcc-xop/gcc/config/i386/i386-c.c
--- gcc-xop-fma4/gcc/config/i386/i386-c.c	2009-09-21 18:20:30.000000000 -0500
+++ gcc-xop/gcc/config/i386/i386-c.c	2009-09-17 12:20:06.000000000 -0500
@@ -230,6 +230,8 @@ ix86_target_macros_internal (int isa_fla
     def_or_undef (parse_in, "__FMA__");
   if (isa_flag & OPTION_MASK_ISA_SSE4A)
     def_or_undef (parse_in, "__SSE4A__");
+  if (isa_flag & OPTION_MASK_ISA_FMA4)
+    def_or_undef (parse_in, "__FMA4__");
   if ((fpmath & FPMATH_SSE) && (isa_flag & OPTION_MASK_ISA_SSE))
     def_or_undef (parse_in, "__SSE_MATH__");
   if ((fpmath & FPMATH_SSE) && (isa_flag & OPTION_MASK_ISA_SSE2))
diff -upNw gcc-xop-fma4/gcc/config/i386/i386.h gcc-xop/gcc/config/i386/i386.h
--- gcc-xop-fma4/gcc/config/i386/i386.h	2009-09-21 18:20:30.000000000 -0500
+++ gcc-xop/gcc/config/i386/i386.h	2009-09-17 14:40:46.000000000 -0500
@@ -54,6 +54,7 @@ see the files COPYING3 and COPYING.RUNTI
 #define TARGET_AVX	OPTION_ISA_AVX
 #define TARGET_FMA	OPTION_ISA_FMA
 #define TARGET_SSE4A	OPTION_ISA_SSE4A
+#define TARGET_FMA4	OPTION_ISA_FMA4
 #define TARGET_ROUND	OPTION_ISA_ROUND
 #define TARGET_ABM	OPTION_ISA_ABM
 #define TARGET_POPCNT	OPTION_ISA_POPCNT
@@ -65,8 +66,8 @@ see the files COPYING3 and COPYING.RUNTI
 #define TARGET_CMPXCHG16B OPTION_ISA_CX16
 
 
-/* SSE4.1 define round instructions */
-#define	OPTION_MASK_ISA_ROUND	(OPTION_MASK_ISA_SSE4_1)
+/* SSE4.1 defines round instructions */
+#define	OPTION_MASK_ISA_ROUND	OPTION_MASK_ISA_SSE4_1
 #define	OPTION_ISA_ROUND	((ix86_isa_flags & OPTION_MASK_ISA_ROUND) != 0)
 
 #include "config/vxworks-dummy.h"
@@ -1356,6 +1357,10 @@ enum reg_class
   (TARGET_AVX && ((MODE) == V4SFmode || (MODE) == V2DFmode \
 		  || (MODE) == V8SFmode || (MODE) == V4DFmode))
 
+#define FMA4_VEC_FLOAT_MODE_P(MODE) \
+  (TARGET_FMA4 && ((MODE) == V4SFmode || (MODE) == V2DFmode \
+		  || (MODE) == V8SFmode || (MODE) == V4DFmode))
+
 #define MMX_REG_P(XOP) (REG_P (XOP) && MMX_REGNO_P (REGNO (XOP)))
 #define MMX_REGNO_P(N) IN_RANGE ((N), FIRST_MMX_REG, LAST_MMX_REG)
 
diff -upNw gcc-xop-fma4/gcc/config/i386/i386.md gcc-xop/gcc/config/i386/i386.md
--- gcc-xop-fma4/gcc/config/i386/i386.md	2009-09-21 18:20:30.000000000 -0500
+++ gcc-xop/gcc/config/i386/i386.md	2009-09-17 12:20:06.000000000 -0500
@@ -195,6 +195,10 @@
    (UNSPEC_PCMPESTR		144)
    (UNSPEC_PCMPISTR		145)
 
+   ; For FMA4 support
+   (UNSPEC_FMA4_INTRINSIC	150)
+   (UNSPEC_FMA4_FMADDSUB	151)
+   (UNSPEC_FMA4_FMSUBADD	152)
    ; For AES support
    (UNSPEC_AESENC		159)
    (UNSPEC_AESENCLAST		160)
diff -upNw gcc-xop-fma4/gcc/config/i386/i386.opt gcc-xop/gcc/config/i386/i386.opt
--- gcc-xop-fma4/gcc/config/i386/i386.opt	2009-09-21 18:20:30.000000000 -0500
+++ gcc-xop/gcc/config/i386/i386.opt	2009-09-17 14:56:08.000000000 -0500
@@ -310,6 +310,10 @@ msse4a
 Target Report Mask(ISA_SSE4A) Var(ix86_isa_flags) VarExists Save
 Support MMX, SSE, SSE2, SSE3 and SSE4A built-in functions and code generation
 
+mfma4
+Target Report Mask(ISA_FMA4) Var(ix86_isa_flags) VarExists Save
+Support FMA4 built-in functions and code generation 
+
 mabm
 Target Report Mask(ISA_ABM) Var(ix86_isa_flags) VarExists Save
 Support code generation of Advanced Bit Manipulation (ABM) instructions.
diff -upNw gcc-xop-fma4/gcc/config/i386/i386-protos.h gcc-xop/gcc/config/i386/i386-protos.h
--- gcc-xop-fma4/gcc/config/i386/i386-protos.h	2009-09-21 18:20:31.000000000 -0500
+++ gcc-xop/gcc/config/i386/i386-protos.h	2009-09-17 12:22:14.000000000 -0500
@@ -214,6 +214,9 @@ extern void ix86_expand_vector_set (bool
 extern void ix86_expand_vector_extract (bool, rtx, rtx, int);
 extern void ix86_expand_reduc_v4sf (rtx (*)(rtx, rtx, rtx), rtx, rtx);
 
+extern bool ix86_fma4_valid_op_p (rtx [], rtx, int, bool, int, bool);
+extern void ix86_expand_fma4_multiple_memory (rtx [], int, enum machine_mode);
+
 /* In i386-c.c  */
 extern void ix86_target_macros (void);
 extern void ix86_register_pragmas (void);
diff -upNw gcc-xop-fma4/gcc/config/i386/sse.md gcc-xop/gcc/config/i386/sse.md
--- gcc-xop-fma4/gcc/config/i386/sse.md	2009-09-21 18:20:30.000000000 -0500
+++ gcc-xop/gcc/config/i386/sse.md	2009-09-17 17:04:54.000000000 -0500
@@ -49,6 +49,7 @@
 (define_mode_iterator SSEMODE248 [V8HI V4SI V2DI])
 (define_mode_iterator SSEMODE1248 [V16QI V8HI V4SI V2DI])
 (define_mode_iterator SSEMODEF4 [SF DF V4SF V2DF])
+(define_mode_iterator FMA4MODEF4 [V8SF V4DF])
 (define_mode_iterator SSEMODEF2P [V4SF V2DF])
 
 (define_mode_iterator AVX256MODEF2P [V8SF V4DF])
@@ -74,6 +75,11 @@
 ;; Mapping from integer vector mode to mnemonic suffix
 (define_mode_attr ssevecsize [(V16QI "b") (V8HI "w") (V4SI "d") (V2DI "q")])
 
+;; Mapping of the fma4 suffix
+(define_mode_attr fma4modesuffixf4 [(V8SF "ps") (V4DF "pd")])
+(define_mode_attr ssemodesuffixf2s [(SF "ss") (DF "sd")
+				    (V4SF "ss") (V2DF "sd")])
+
 ;; Mapping of the avx suffix
 (define_mode_attr ssemodesuffixf4 [(SF "ss") (DF "sd")
 				   (V4SF "ps") (V2DF "pd")])
@@ -1661,6 +1667,936 @@
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
+;; FMA4 floating point multiply/accumulate instructions This includes the
+;; scalar version of the instructions as well as the vector
+;;
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+;; In order to match (*a * *b) + *c, particularly when vectorizing, allow
+;; combine to generate a multiply/add with two memory references.  We then
+;; split this insn, into loading up the destination register with one of the
+;; memory operations.  If we don't manage to split the insn, reload will
+;; generate the appropriate moves.  The reason this is needed, is that combine
+;; has already folded one of the memory references into both the multiply and
+;; add insns, and it can't generate a new pseudo.  I.e.:
+;;	(set (reg1) (mem (addr1)))
+;;	(set (reg2) (mult (reg1) (mem (addr2))))
+;;	(set (reg3) (plus (reg2) (mem (addr3))))
+
+(define_insn "fma4_fmadd<mode>4256"
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x,x")
+	(plus:FMA4MODEF4
+	 (mult:FMA4MODEF4
+	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x,xm")
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm,x"))
+	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x,x")))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+;; Split fmadd with two memory operands into a load and the fmadd.
+(define_split
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "")
+	(plus:FMA4MODEF4
+	 (mult:FMA4MODEF4
+	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "")
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
+	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
+  "TARGET_FMA4
+   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
+   && !reg_mentioned_p (operands[0], operands[1])
+   && !reg_mentioned_p (operands[0], operands[2])
+   && !reg_mentioned_p (operands[0], operands[3])"
+  [(const_int 0)]
+{
+  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  emit_insn (gen_fma4_fmadd<mode>4256 (operands[0], operands[1],
+				    operands[2], operands[3]));
+  DONE;
+})
+
+;; Floating multiply and subtract
+;; Allow two memory operands the same as fmadd
+(define_insn "fma4_fmsub<mode>4256"
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x,x")
+	(minus:FMA4MODEF4
+	 (mult:FMA4MODEF4
+	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x,xm")
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm,x"))
+	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x,x")))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+;; Split fmsub with two memory operands into a load and the fmsub.
+(define_split
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "")
+	(minus:FMA4MODEF4
+	 (mult:FMA4MODEF4
+	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "")
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
+	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
+  "TARGET_FMA4
+   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
+   && !reg_mentioned_p (operands[0], operands[1])
+   && !reg_mentioned_p (operands[0], operands[2])
+   && !reg_mentioned_p (operands[0], operands[3])"
+  [(const_int 0)]
+{
+  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  emit_insn (gen_fma4_fmsub<mode>4256 (operands[0], operands[1],
+				    operands[2], operands[3]));
+  DONE;
+})
+
+;; Floating point negative multiply and add
+;; Rewrite (- (a * b) + c) into the canonical form: c - (a * b)
+;; Note operands are out of order to simplify call to ix86_fma4_valid_p
+;; Allow two memory operands to help in optimizing.
+(define_insn "fma4_fnmadd<mode>4256"
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x,x")
+	(minus:FMA4MODEF4
+	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x,x")
+	 (mult:FMA4MODEF4
+	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x,xm")
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm,x"))))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfnmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+;; Split fnmadd with two memory operands into a load and the fnmadd.
+(define_split
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "")
+	(minus:FMA4MODEF4
+	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")
+	 (mult:FMA4MODEF4
+	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "")
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))))]
+  "TARGET_FMA4
+   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
+   && !reg_mentioned_p (operands[0], operands[1])
+   && !reg_mentioned_p (operands[0], operands[2])
+   && !reg_mentioned_p (operands[0], operands[3])"
+  [(const_int 0)]
+{
+  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  emit_insn (gen_fma4_fnmadd<mode>4256 (operands[0], operands[1],
+				     operands[2], operands[3]));
+  DONE;
+})
+
+;; Floating point negative multiply and subtract
+;; Rewrite (- (a * b) - c) into the canonical form: ((-a) * b) - c
+;; Allow 2 memory operands to help with optimization
+(define_insn "fma4_fnmsub<mode>4256"
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
+	(minus:FMA4MODEF4
+	 (mult:FMA4MODEF4
+	  (neg:FMA4MODEF4
+	   (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x"))
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
+	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)"
+  "vfnmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+;; Split fnmsub with two memory operands into a load and the fmsub.
+(define_split
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "")
+	(minus:FMA4MODEF4
+	 (mult:FMA4MODEF4
+	  (neg:FMA4MODEF4
+	   (match_operand:FMA4MODEF4 1 "nonimmediate_operand" ""))
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
+	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
+  "TARGET_FMA4
+   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)
+   && !reg_mentioned_p (operands[0], operands[1])
+   && !reg_mentioned_p (operands[0], operands[2])
+   && !reg_mentioned_p (operands[0], operands[3])"
+  [(const_int 0)]
+{
+  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  emit_insn (gen_fma4_fnmsub<mode>4256 (operands[0], operands[1],
+				        operands[2], operands[3]));
+  DONE;
+})
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+(define_insn "fma4_fmadd<mode>4"
+  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x,x")
+	(plus:SSEMODEF4
+	 (mult:SSEMODEF4
+	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x,xm")
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm,x"))
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x,x")))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+;; Split fmadd with two memory operands into a load and the fmadd.
+(define_split
+  [(set (match_operand:SSEMODEF4 0 "register_operand" "")
+	(plus:SSEMODEF4
+	 (mult:SSEMODEF4
+	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "")
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")))]
+  "TARGET_FMA4
+   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
+   && !reg_mentioned_p (operands[0], operands[1])
+   && !reg_mentioned_p (operands[0], operands[2])
+   && !reg_mentioned_p (operands[0], operands[3])"
+  [(const_int 0)]
+{
+  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  emit_insn (gen_fma4_fmadd<mode>4 (operands[0], operands[1],
+				    operands[2], operands[3]));
+  DONE;
+})
+
+;; For the scalar operations, use operand1 for the upper words that aren't
+;; modified, so restrict the forms that are generated.
+;; Scalar version of fmadd
+(define_insn "fma4_vmfmadd<mode>4"
+  [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x")
+	(vec_merge:SSEMODEF2P
+	 (plus:SSEMODEF2P
+	  (mult:SSEMODEF2P
+	   (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
+	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
+	 (match_dup 0)
+	 (const_int 1)))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "vfmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+;; Floating multiply and subtract
+;; Allow two memory operands the same as fmadd
+(define_insn "fma4_fmsub<mode>4"
+  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x,x")
+	(minus:SSEMODEF4
+	 (mult:SSEMODEF4
+	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x,xm")
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm,x"))
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x,x")))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+;; Split fmsub with two memory operands into a load and the fmsub.
+(define_split
+  [(set (match_operand:SSEMODEF4 0 "register_operand" "")
+	(minus:SSEMODEF4
+	 (mult:SSEMODEF4
+	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "")
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")))]
+  "TARGET_FMA4
+   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
+   && !reg_mentioned_p (operands[0], operands[1])
+   && !reg_mentioned_p (operands[0], operands[2])
+   && !reg_mentioned_p (operands[0], operands[3])"
+  [(const_int 0)]
+{
+  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  emit_insn (gen_fma4_fmsub<mode>4 (operands[0], operands[1],
+				    operands[2], operands[3]));
+  DONE;
+})
+
+;; For the scalar operations, use operand1 for the upper words that aren't
+;; modified, so restrict the forms that are generated.
+;; Scalar version of fmsub
+(define_insn "fma4_vmfmsub<mode>4"
+  [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x")
+	(vec_merge:SSEMODEF2P
+	 (minus:SSEMODEF2P
+	  (mult:SSEMODEF2P
+	   (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
+	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
+	 (match_dup 0)
+	 (const_int 1)))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "vfmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+;; Floating point negative multiply and add
+;; Rewrite (- (a * b) + c) into the canonical form: c - (a * b)
+;; Note operands are out of order to simplify call to ix86_fma4_valid_p
+;; Allow two memory operands to help in optimizing.
+(define_insn "fma4_fnmadd<mode>4"
+  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x,x")
+	(minus:SSEMODEF4
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x,x")
+	 (mult:SSEMODEF4
+	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x,xm")
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm,x"))))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfnmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+;; Split fnmadd with two memory operands into a load and the fnmadd.
+(define_split
+  [(set (match_operand:SSEMODEF4 0 "register_operand" "")
+	(minus:SSEMODEF4
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")
+	 (mult:SSEMODEF4
+	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "")
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))))]
+  "TARGET_FMA4
+   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
+   && !reg_mentioned_p (operands[0], operands[1])
+   && !reg_mentioned_p (operands[0], operands[2])
+   && !reg_mentioned_p (operands[0], operands[3])"
+  [(const_int 0)]
+{
+  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  emit_insn (gen_fma4_fnmadd<mode>4 (operands[0], operands[1],
+				     operands[2], operands[3]));
+  DONE;
+})
+
+;; For the scalar operations, use operand1 for the upper words that aren't
+;; modified, so restrict the forms that are generated.
+;; Scalar version of fnmadd
+(define_insn "fma4_vmfnmadd<mode>4"
+  [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x")
+	(vec_merge:SSEMODEF2P
+	 (minus:SSEMODEF2P
+	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x")
+	  (mult:SSEMODEF2P
+	   (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
+	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm")))
+	 (match_dup 0)
+	 (const_int 1)))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "vfnmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+;; Floating point negative multiply and subtract
+;; Rewrite (- (a * b) - c) into the canonical form: ((-a) * b) - c
+;; Allow 2 memory operands to help with optimization
+(define_insn "fma4_fnmsub<mode>4"
+  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
+	(minus:SSEMODEF4
+	 (mult:SSEMODEF4
+	  (neg:SSEMODEF4
+	   (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x"))
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm"))
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)"
+  "vfnmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+;; Split fnmsub with two memory operands into a load and the fmsub.
+(define_split
+  [(set (match_operand:SSEMODEF4 0 "register_operand" "")
+	(minus:SSEMODEF4
+	 (mult:SSEMODEF4
+	  (neg:SSEMODEF4
+	   (match_operand:SSEMODEF4 1 "nonimmediate_operand" ""))
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")))]
+  "TARGET_FMA4
+   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)
+   && !reg_mentioned_p (operands[0], operands[1])
+   && !reg_mentioned_p (operands[0], operands[2])
+   && !reg_mentioned_p (operands[0], operands[3])"
+  [(const_int 0)]
+{
+  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  emit_insn (gen_fma4_fnmsub<mode>4 (operands[0], operands[1],
+				     operands[2], operands[3]));
+  DONE;
+})
+
+;; For the scalar operations, use operand1 for the upper words that aren't
+;; modified, so restrict the forms that are generated.
+;; Scalar version of fnmsub
+(define_insn "fma4_vmfnmsub<mode>4"
+  [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x")
+	(vec_merge:SSEMODEF2P
+	 (minus:SSEMODEF2P
+	  (mult:SSEMODEF2P
+	   (neg:SSEMODEF2P
+	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x"))
+	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
+	 (match_dup 0)
+	 (const_int 1)))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)"
+  "vfnmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define_insn "fma4i_fmadd<mode>4256"
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
+	(unspec:FMA4MODEF4
+	 [(plus:FMA4MODEF4
+	   (mult:FMA4MODEF4
+	    (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x")
+	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
+	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
+	 UNSPEC_FMA4_INTRINSIC))]
+  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "vfmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "fma4i_fmsub<mode>4256"
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
+	(unspec:FMA4MODEF4
+	 [(minus:FMA4MODEF4
+	   (mult:FMA4MODEF4
+	    (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x")
+	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
+	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
+	 UNSPEC_FMA4_INTRINSIC))]
+  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "vfmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "fma4i_fnmadd<mode>4256"
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
+	(unspec:FMA4MODEF4
+	 [(minus:FMA4MODEF4
+	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")
+	   (mult:FMA4MODEF4
+	    (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x")
+	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm")))]
+	 UNSPEC_FMA4_INTRINSIC))]
+  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "vfnmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "fma4i_fnmsub<mode>4256"
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
+	(unspec:FMA4MODEF4
+	 [(minus:FMA4MODEF4
+	   (mult:FMA4MODEF4
+	    (neg:FMA4MODEF4
+	     (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x"))
+	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
+	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
+	 UNSPEC_FMA4_INTRINSIC))]
+  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "vfnmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define_insn "fma4i_fmadd<mode>4"
+  [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x")
+	(unspec:SSEMODEF2P
+	 [(plus:SSEMODEF2P
+	   (mult:SSEMODEF2P
+	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
+	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
+	 UNSPEC_FMA4_INTRINSIC))]
+  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "fma4i_fmsub<mode>4"
+  [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x")
+	(unspec:SSEMODEF2P
+	 [(minus:SSEMODEF2P
+	   (mult:SSEMODEF2P
+	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
+	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
+	 UNSPEC_FMA4_INTRINSIC))]
+  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "vfmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "fma4i_fnmadd<mode>4"
+  [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x")
+	(unspec:SSEMODEF2P
+	 [(minus:SSEMODEF2P
+	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x")
+	   (mult:SSEMODEF2P
+	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
+	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm")))]
+	 UNSPEC_FMA4_INTRINSIC))]
+  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "vfnmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "fma4i_fnmsub<mode>4"
+  [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x")
+	(unspec:SSEMODEF2P
+	 [(minus:SSEMODEF2P
+	   (mult:SSEMODEF2P
+	    (neg:SSEMODEF2P
+	     (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x"))
+	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
+	 UNSPEC_FMA4_INTRINSIC))]
+  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "vfnmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+;; For the scalar operations, use operand1 for the upper words that aren't
+;; modified, so restrict the forms that are accepted.
+(define_insn "fma4i_vmfmadd<mode>4"
+  [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x")
+	(unspec:SSEMODEF2P
+	 [(vec_merge:SSEMODEF2P
+	   (plus:SSEMODEF2P
+	    (mult:SSEMODEF2P
+	     (match_operand:SSEMODEF2P 1 "register_operand" "x,x")
+	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
+	   (match_dup 0)
+	   (const_int 1))]
+	 UNSPEC_FMA4_INTRINSIC))]
+  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "vfmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<ssescalarmode>")])
+
+(define_insn "fma4i_vmfmsub<mode>4"
+  [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x")
+	(unspec:SSEMODEF2P
+	 [(vec_merge:SSEMODEF2P
+	   (minus:SSEMODEF2P
+	    (mult:SSEMODEF2P
+	     (match_operand:SSEMODEF2P 1 "register_operand" "x,x")
+	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
+	   (match_dup 0)
+	   (const_int 1))]
+	 UNSPEC_FMA4_INTRINSIC))]
+  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "vfmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<ssescalarmode>")])
+
+(define_insn "fma4i_vmfnmadd<mode>4"
+  [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x")
+	(unspec:SSEMODEF2P
+	 [(vec_merge:SSEMODEF2P
+	   (minus:SSEMODEF2P
+	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x")
+	    (mult:SSEMODEF2P
+	     (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
+	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm")))
+	   (match_dup 0)
+	   (const_int 1))]
+	 UNSPEC_FMA4_INTRINSIC))]
+  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "vfnmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<ssescalarmode>")])
+
+(define_insn "fma4i_vmfnmsub<mode>4"
+  [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x")
+	(unspec:SSEMODEF2P
+	 [(vec_merge:SSEMODEF2P
+	   (minus:SSEMODEF2P
+	    (mult:SSEMODEF2P
+	     (neg:SSEMODEF2P
+	      (match_operand:SSEMODEF2P 1 "register_operand" "x,x"))
+	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
+	   (match_dup 0)
+	   (const_int 1))]
+	 UNSPEC_FMA4_INTRINSIC))]
+  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "vfnmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<ssescalarmode>")])
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;;
+;; FMA4 Parallel floating point multiply addsub and subadd operations
+;;
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define_insn "fma4_fmaddsubv8sf4"
+  [(set (match_operand:V8SF 0 "register_operand" "=x,x")
+	(vec_merge:V8SF
+	  (plus:V8SF
+	    (mult:V8SF
+	      (match_operand:V8SF 1 "nonimmediate_operand" "x,x")
+	      (match_operand:V8SF 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
+	  (minus:V8SF
+	    (mult:V8SF
+	      (match_dup 1)
+	      (match_dup 2))
+	    (match_dup 3))
+	  (const_int 170)))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "V8SF")])
+
+(define_insn "fma4_fmaddsubv4df4"
+  [(set (match_operand:V4DF 0 "register_operand" "=x,x")
+	(vec_merge:V4DF
+	  (plus:V4DF
+	    (mult:V4DF
+	      (match_operand:V4DF 1 "nonimmediate_operand" "x,x")
+	      (match_operand:V4DF 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
+	  (minus:V4DF
+	    (mult:V4DF
+	      (match_dup 1)
+	      (match_dup 2))
+	    (match_dup 3))
+	  (const_int 10)))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "V4DF")])
+
+(define_insn "fma4_fmaddsubv4sf4"
+  [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+	(vec_merge:V4SF
+	  (plus:V4SF
+	    (mult:V4SF
+	      (match_operand:V4SF 1 "nonimmediate_operand" "x,x")
+	      (match_operand:V4SF 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
+	  (minus:V4SF
+	    (mult:V4SF
+	      (match_dup 1)
+	      (match_dup 2))
+	    (match_dup 3))
+	  (const_int 10)))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "V4SF")])
+
+(define_insn "fma4_fmaddsubv2df4"
+  [(set (match_operand:V2DF 0 "register_operand" "=x,x")
+	(vec_merge:V2DF
+	  (plus:V2DF
+	    (mult:V2DF
+	      (match_operand:V2DF 1 "nonimmediate_operand" "x,x")
+	      (match_operand:V2DF 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
+	  (minus:V2DF
+	    (mult:V2DF
+	      (match_dup 1)
+	      (match_dup 2))
+	    (match_dup 3))
+	  (const_int 2)))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "V2DF")])
+
+(define_insn "fma4_fmsubaddv8sf4"
+  [(set (match_operand:V8SF 0 "register_operand" "=x,x")
+	(vec_merge:V8SF
+	  (plus:V8SF
+	    (mult:V8SF
+	      (match_operand:V8SF 1 "nonimmediate_operand" "x,x")
+	      (match_operand:V8SF 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
+	  (minus:V8SF
+	    (mult:V8SF
+	      (match_dup 1)
+	      (match_dup 2))
+	    (match_dup 3))
+	  (const_int 85)))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "V8SF")])
+
+(define_insn "fma4_fmsubaddv4df4"
+  [(set (match_operand:V4DF 0 "register_operand" "=x,x")
+	(vec_merge:V4DF
+	  (plus:V4DF
+	    (mult:V4DF
+	      (match_operand:V4DF 1 "nonimmediate_operand" "x,x")
+	      (match_operand:V4DF 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
+	  (minus:V4DF
+	    (mult:V4DF
+	      (match_dup 1)
+	      (match_dup 2))
+	    (match_dup 3))
+	  (const_int 5)))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "V4DF")])
+
+(define_insn "fma4_fmsubaddv4sf4"
+  [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+	(vec_merge:V4SF
+	  (plus:V4SF
+	    (mult:V4SF
+	      (match_operand:V4SF 1 "nonimmediate_operand" "x,x")
+	      (match_operand:V4SF 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
+	  (minus:V4SF
+	    (mult:V4SF
+	      (match_dup 1)
+	      (match_dup 2))
+	    (match_dup 3))
+	  (const_int 5)))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "V4SF")])
+
+(define_insn "fma4_fmsubaddv2df4"
+  [(set (match_operand:V2DF 0 "register_operand" "=x,x")
+	(vec_merge:V2DF
+	  (plus:V2DF
+	    (mult:V2DF
+	      (match_operand:V2DF 1 "nonimmediate_operand" "x,x")
+	      (match_operand:V2DF 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
+	  (minus:V2DF
+	    (mult:V2DF
+	      (match_dup 1)
+	      (match_dup 2))
+	    (match_dup 3))
+	  (const_int 1)))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "V2DF")])
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define_insn "fma4i_fmaddsubv8sf4"
+  [(set (match_operand:V8SF 0 "register_operand" "=x,x")
+	(unspec:V8SF
+	 [(vec_merge:V8SF
+	   (plus:V8SF
+	     (mult:V8SF
+	       (match_operand:V8SF 1 "nonimmediate_operand" "x,x")
+	       (match_operand:V8SF 2 "nonimmediate_operand" "x,xm"))
+	     (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
+	   (minus:V8SF
+	     (mult:V8SF
+	       (match_dup 1)
+	       (match_dup 2))
+	     (match_dup 3))
+	   (const_int 170))]
+	 UNSPEC_FMA4_INTRINSIC))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "V8SF")])
+
+(define_insn "fma4i_fmaddsubv4df4"
+  [(set (match_operand:V4DF 0 "register_operand" "=x,x")
+	(unspec:V4DF
+	 [(vec_merge:V4DF
+	   (plus:V4DF
+	     (mult:V4DF
+	       (match_operand:V4DF 1 "nonimmediate_operand" "x,x")
+	       (match_operand:V4DF 2 "nonimmediate_operand" "x,xm"))
+	     (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
+	   (minus:V4DF
+	     (mult:V4DF
+	       (match_dup 1)
+	       (match_dup 2))
+	     (match_dup 3))
+	   (const_int 10))]
+	 UNSPEC_FMA4_INTRINSIC))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "V4DF")])
+
+(define_insn "fma4i_fmaddsubv4sf4"
+  [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+	(unspec:V4SF
+	 [(vec_merge:V4SF
+	   (plus:V4SF
+	     (mult:V4SF
+	       (match_operand:V4SF 1 "nonimmediate_operand" "x,x")
+	       (match_operand:V4SF 2 "nonimmediate_operand" "x,xm"))
+	     (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
+	   (minus:V4SF
+	     (mult:V4SF
+	       (match_dup 1)
+	       (match_dup 2))
+	     (match_dup 3))
+	   (const_int 10))]
+	 UNSPEC_FMA4_INTRINSIC))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "V4SF")])
+
+(define_insn "fma4i_fmaddsubv2df4"
+  [(set (match_operand:V2DF 0 "register_operand" "=x,x")
+	(unspec:V2DF
+	 [(vec_merge:V2DF
+	   (plus:V2DF
+	     (mult:V2DF
+	       (match_operand:V2DF 1 "nonimmediate_operand" "x,x")
+	       (match_operand:V2DF 2 "nonimmediate_operand" "x,xm"))
+	     (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
+	   (minus:V2DF
+	     (mult:V2DF
+	       (match_dup 1)
+	       (match_dup 2))
+	     (match_dup 3))
+	   (const_int 2))]
+	 UNSPEC_FMA4_INTRINSIC))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "V2DF")])
+
+(define_insn "fma4i_fmsubaddv8sf4"
+  [(set (match_operand:V8SF 0 "register_operand" "=x,x")
+	(unspec:V8SF
+	 [(vec_merge:V8SF
+	   (plus:V8SF
+	     (mult:V8SF
+	       (match_operand:V8SF 1 "nonimmediate_operand" "x,x")
+	       (match_operand:V8SF 2 "nonimmediate_operand" "x,xm"))
+	     (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
+	   (minus:V8SF
+	     (mult:V8SF
+	       (match_dup 1)
+	       (match_dup 2))
+	     (match_dup 3))
+	   (const_int 85))]
+	 UNSPEC_FMA4_INTRINSIC))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "V8SF")])
+
+(define_insn "fma4i_fmsubaddv4df4"
+  [(set (match_operand:V4DF 0 "register_operand" "=x,x")
+	(unspec:V4DF
+	 [(vec_merge:V4DF
+	   (plus:V4DF
+	     (mult:V4DF
+	       (match_operand:V4DF 1 "nonimmediate_operand" "x,x")
+	       (match_operand:V4DF 2 "nonimmediate_operand" "x,xm"))
+	     (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
+	   (minus:V4DF
+	     (mult:V4DF
+	       (match_dup 1)
+	       (match_dup 2))
+	     (match_dup 3))
+	   (const_int 5))]
+	 UNSPEC_FMA4_INTRINSIC))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "V4DF")])
+
+(define_insn "fma4i_fmsubaddv4sf4"
+  [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+	(unspec:V4SF
+	 [(vec_merge:V4SF
+	   (plus:V4SF
+	     (mult:V4SF
+	       (match_operand:V4SF 1 "nonimmediate_operand" "x,x")
+	       (match_operand:V4SF 2 "nonimmediate_operand" "x,xm"))
+	     (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
+	   (minus:V4SF
+	     (mult:V4SF
+	       (match_dup 1)
+	       (match_dup 2))
+	     (match_dup 3))
+	   (const_int 5))]
+	 UNSPEC_FMA4_INTRINSIC))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "V4SF")])
+
+(define_insn "fma4i_fmsubaddv2df4"
+  [(set (match_operand:V2DF 0 "register_operand" "=x,x")
+	(unspec:V2DF
+	 [(vec_merge:V2DF
+	   (plus:V2DF
+	     (mult:V2DF
+	       (match_operand:V2DF 1 "nonimmediate_operand" "x,x")
+	       (match_operand:V2DF 2 "nonimmediate_operand" "x,xm"))
+	     (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
+	   (minus:V2DF
+	     (mult:V2DF
+	       (match_dup 1)
+	       (match_dup 2))
+	     (match_dup 3))
+	   (const_int 1))]
+	 UNSPEC_FMA4_INTRINSIC))]
+  "TARGET_FMA4
+   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "V2DF")])
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;;
 ;; Parallel single-precision floating point conversion operations
 ;;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
diff -upNw gcc-xop-fma4/gcc/config/i386/x86intrin.h gcc-xop/gcc/config/i386/x86intrin.h
--- gcc-xop-fma4/gcc/config/i386/x86intrin.h	2009-09-21 18:59:49.000000000 -0500
+++ gcc-xop/gcc/config/i386/x86intrin.h	2009-09-21 16:36:29.000000000 -0500
@@ -46,7 +46,7 @@
 #include <tmmintrin.h>
 #endif
 
-#ifdef __SSE4a__
+#ifdef __SSE4A__
 #include <ammintrin.h>
 #endif
 
@@ -54,6 +54,10 @@
 #include <smmintrin.h>
 #endif
 
+#ifdef __FMA4__
+#include <fma4intrin.h>
+#endif
+
 #if defined (__AES__) || defined (__PCLMUL__)
 #include <wmmintrin.h>
 #endif
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-256-maccXX.c gcc-xop/gcc/testsuite/gcc.target/i386/fma4-256-maccXX.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-256-maccXX.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/fma4-256-maccXX.c	2009-09-22 13:10:35.000000000 -0500
@@ -0,0 +1,96 @@
+/* { dg-do run } */
+/* { dg-require-effective-target fma4 } */
+/* { dg-options "-O2 -mfma4" } */
+
+#include "fma4-check.h"
+
+#include <x86intrin.h>
+#include <string.h>
+
+#define NUM 20
+
+union
+{
+  __m256 x[NUM];
+  float f[NUM * 8];
+  __m256d y[NUM];
+  double d[NUM * 4];
+} dst, res, src1, src2, src3;
+
+
+/* Note that in macc*,msub*,mnmacc* and mnsub* instructions, the intermdediate 
+   product is not rounded, only the addition is rounded. */
+
+static void
+init_maccps ()
+{
+  int i;
+  for (i = 0; i < NUM * 8; i++)
+    {
+      src1.f[i] = i;
+      src2.f[i] = i + 10;
+      src3.f[i] = i + 20;
+    }
+}
+
+static void
+init_maccpd ()
+{
+  int i;
+  for (i = 0; i < NUM * 4; i++)
+    {
+      src1.d[i] = i;
+      src2.d[i] = i + 10;
+      src3.d[i] = i + 20;
+    }
+}
+
+static int
+check_maccps ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 8; i = i + 8)
+    for (j = 0; j < 8; j++)
+      {
+	res.f[i + j] = (src1.f[i + j] * src2.f[i + j]) + src3.f[i + j];
+	if (dst.f[i + j] != res.f[i + j]) 
+	  check_fails++;
+      }
+  return check_fails++;
+}
+
+static int
+check_maccpd ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 4; i = i + 4)
+    for (j = 0; j < 4; j++)
+      {
+	res.d[i + j] = (src1.d[i + j] * src2.d[i + j]) + src3.d[i + j];
+	if (dst.d[i + j] != res.d[i + j]) 
+	  check_fails++;
+      }
+  return check_fails++;
+}
+
+static void
+fma4_test (void)
+{
+  int i;
+
+  init_maccps ();
+  
+  for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm256_macc_ps (src1.x[i], src2.x[i], src3.x[i]);
+  
+  if (check_maccps ()) 
+    abort ();
+
+  init_maccpd ();
+  
+  for (i = 0; i < NUM; i++)
+    dst.y[i] = _mm256_macc_pd (src1.y[i], src2.y[i], src3.y[i]);
+  
+  if (check_maccpd ()) 
+    abort ();  
+}
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-256-msubXX.c gcc-xop/gcc/testsuite/gcc.target/i386/fma4-256-msubXX.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-256-msubXX.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/fma4-256-msubXX.c	2009-09-22 13:10:46.000000000 -0500
@@ -0,0 +1,96 @@
+/* { dg-do run } */
+/* { dg-require-effective-target fma4 } */
+/* { dg-options "-O2 -mfma4" } */
+
+#include "fma4-check.h"
+
+#include <x86intrin.h>
+#include <string.h>
+
+#define NUM 20
+
+union
+{
+  __m256 x[NUM];
+  float f[NUM * 8];
+  __m256d y[NUM];
+  double d[NUM * 4];
+} dst, res, src1, src2, src3;
+
+/* Note that in macc*,msub*,mnmacc* and mnsub* instructions, the intermdediate 
+   product is not rounded, only the addition is rounded. */
+
+static void
+init_msubps ()
+{
+  int i;
+  for (i = 0; i < NUM * 8; i++)
+    {
+      src1.f[i] = i;
+      src2.f[i] = i + 10;
+      src3.f[i] = i + 20;
+    }
+}
+
+static void
+init_msubpd ()
+{
+  int i;
+  for (i = 0; i < NUM * 4; i++)
+    {
+      src1.d[i] = i;
+      src2.d[i] = i + 10;
+      src3.d[i] = i + 20;
+    }
+}
+
+static int
+check_msubps ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 8; i = i + 8)
+    for (j = 0; j < 8; j++)
+      {
+	res.f[i + j] = (src1.f[i + j] * src2.f[i + j]) - src3.f[i + j];
+	if (dst.f[i + j] != res.f[i + j]) 
+	  check_fails++;
+      }
+  return check_fails++;
+}
+
+static int
+check_msubpd ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 4; i = i + 4)
+    for (j = 0; j < 4; j++)
+      {
+	res.d[i + j] = (src1.d[i + j] * src2.d[i + j]) - src3.d[i + j];
+	if (dst.d[i + j] != res.d[i + j]) 
+	  check_fails++;
+      }
+  return check_fails++;
+}
+
+static void
+fma4_test (void)
+{
+  int i;
+
+  init_msubps ();
+  
+  for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm256_msub_ps (src1.x[i], src2.x[i], src3.x[i]);
+  
+  if (check_msubps ()) 
+    abort ();
+
+  init_msubpd ();
+  
+  for (i = 0; i < NUM; i++)
+    dst.y[i] = _mm256_msub_pd (src1.y[i], src2.y[i], src3.y[i]);
+  
+  if (check_msubpd ()) 
+    abort ();
+  
+}
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-256-nmaccXX.c gcc-xop/gcc/testsuite/gcc.target/i386/fma4-256-nmaccXX.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-256-nmaccXX.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/fma4-256-nmaccXX.c	2009-09-22 13:11:01.000000000 -0500
@@ -0,0 +1,96 @@
+/* { dg-do run } */
+/* { dg-require-effective-target fma4 } */
+/* { dg-options "-O2 -mfma4" } */
+
+#include "fma4-check.h"
+
+#include <x86intrin.h>
+#include <string.h>
+
+#define NUM 20
+
+union
+{
+  __m256 x[NUM];
+  float f[NUM * 8];
+  __m256d y[NUM];
+  double d[NUM * 4];
+} dst, res, src1, src2, src3;
+
+/* Note that in macc*,msub*,mnmacc* and mnsub* instructions, the intermdediate 
+   product is not rounded, only the addition is rounded. */
+
+static void
+init_nmaccps ()
+{
+  int i;
+  for (i = 0; i < NUM * 8; i++)
+    {
+      src1.f[i] = i;
+      src2.f[i] = i + 10;
+      src3.f[i] = i + 20;
+    }
+}
+
+static void
+init_nmaccpd ()
+{
+  int i;
+  for (i = 0; i < NUM * 4; i++)
+    {
+      src1.d[i] = i;
+      src2.d[i] = i + 10;
+      src3.d[i] = i + 20;
+    }
+}
+
+static int
+check_nmaccps ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 8; i = i + 8)
+    for (j = 0; j < 8; j++)
+      {
+	res.f[i + j] = - (src1.f[i + j] * src2.f[i + j]) + src3.f[i + j];
+	if (dst.f[i + j] != res.f[i + j]) 
+	  check_fails++;
+      }
+  return check_fails++;
+}
+
+static int
+check_nmaccpd ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 4; i = i + 4)
+    for (j = 0; j < 4; j++)
+      {
+	res.d[i + j] = - (src1.d[i + j] * src2.d[i + j]) + src3.d[i + j];
+	if (dst.d[i + j] != res.d[i + j]) 
+	  check_fails++;
+      }
+  return check_fails++;
+}
+
+static void
+fma4_test (void)
+{
+  int i;
+
+  init_nmaccps ();
+  
+  for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm256_nmacc_ps (src1.x[i], src2.x[i], src3.x[i]);
+  
+  if (check_nmaccps ()) 
+    abort ();
+  
+  init_nmaccpd ();
+  
+  for (i = 0; i < NUM; i++)
+    dst.y[i] = _mm256_nmacc_pd (src1.y[i], src2.y[i], src3.y[i]);
+  
+  if (check_nmaccpd ()) 
+    abort ();
+
+}
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-256-nmsubXX.c gcc-xop/gcc/testsuite/gcc.target/i386/fma4-256-nmsubXX.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-256-nmsubXX.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/fma4-256-nmsubXX.c	2009-09-22 13:11:57.000000000 -0500
@@ -0,0 +1,95 @@
+/* { dg-require-effective-target fma4 } */
+/* { dg-options "-O2 -mfma4" } */
+
+#include "fma4-check.h"
+
+#include <x86intrin.h>
+#include <string.h>
+
+#define NUM 20
+
+union
+{
+  __m256 x[NUM];
+  float f[NUM * 8];
+  __m256d y[NUM];
+  double d[NUM * 4];
+} dst, res, src1, src2, src3;
+
+/* Note that in macc*,msub*,mnmacc* and mnsub* instructions, the intermdediate 
+   product is not rounded, only the addition is rounded. */
+
+static void
+init_nmsubps ()
+{
+  int i;
+  for (i = 0; i < NUM * 8; i++)
+    {
+      src1.f[i] = i;
+      src2.f[i] = i + 10;
+      src3.f[i] = i + 20;
+    }
+}
+
+static void
+init_nmsubpd ()
+{
+  int i;
+  for (i = 0; i < NUM * 4; i++)
+    {
+      src1.d[i] = i;
+      src2.d[i] = i + 10;
+      src3.d[i] = i + 20;
+    }
+}
+
+static int
+check_nmsubps ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 8; i = i + 8)
+    for (j = 0; j < 8; j++)
+      {
+	res.f[i + j] = - (src1.f[i + j] * src2.f[i + j]) - src3.f[i + j];
+	if (dst.f[i + j] != res.f[i + j]) 
+	  check_fails++;
+      }
+  return check_fails++;
+}
+
+static int
+check_nmsubpd ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 4; i = i + 4)
+    for (j = 0; j < 4; j++)
+      {
+	res.d[i + j] = - (src1.d[i + j] * src2.d[i + j]) - src3.d[i + j];
+	if (dst.d[i + j] != res.d[i + j]) 
+	  check_fails++;
+      }
+  return check_fails++;
+}
+
+static void
+fma4_test (void)
+{
+  int i;
+
+  init_nmsubps ();
+  
+  for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm256_nmsub_ps (src1.x[i], src2.x[i], src3.x[i]);
+  
+  if (check_nmsubps (&dst.x[i], &src1.f[i * 4], &src2.f[i * 4], &src3.f[i * 4])) 
+    abort ();
+
+  init_nmsubpd ();
+  
+  for (i = 0; i < NUM; i++)
+    dst.y[i] = _mm256_nmsub_pd (src1.y[i], src2.y[i], src3.y[i]);
+  
+  if (check_nmsubpd (&dst.y[i], &src1.d[i * 2], &src2.d[i * 2], &src3.d[i * 2])) 
+    abort ();
+
+}
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-256-vector.c gcc-xop/gcc/testsuite/gcc.target/i386/fma4-256-vector.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-256-vector.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/fma4-256-vector.c	2009-09-17 12:22:31.000000000 -0500
@@ -0,0 +1,93 @@
+/* Test that the compiler properly optimizes floating point multiply and add
+   instructions vector into vfmaddps on FMA4 systems.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mfma4 -ftree-vectorize" } */
+
+extern void exit (int);
+
+typedef float     __m256  __attribute__ ((__vector_size__ (32), __may_alias__));
+typedef double    __m256d __attribute__ ((__vector_size__ (32), __may_alias__));
+
+#define SIZE 10240
+
+union {
+  __m256 f_align;
+  __m256d d_align;
+  float f[SIZE];
+  double d[SIZE];
+} a, b, c, d;
+
+void
+flt_mul_add (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.f[i] = (b.f[i] * c.f[i]) + d.f[i];
+}
+
+void
+dbl_mul_add (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.d[i] = (b.d[i] * c.d[i]) + d.d[i];
+}
+
+void
+flt_mul_sub (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.f[i] = (b.f[i] * c.f[i]) - d.f[i];
+}
+
+void
+dbl_mul_sub (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.d[i] = (b.d[i] * c.d[i]) - d.d[i];
+}
+
+void
+flt_neg_mul_add (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.f[i] = (-(b.f[i] * c.f[i])) + d.f[i];
+}
+
+void
+dbl_neg_mul_add (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.d[i] = (-(b.d[i] * c.d[i])) + d.d[i];
+}
+
+int main ()
+{
+  flt_mul_add ();
+  flt_mul_sub ();
+  flt_neg_mul_add ();
+
+  dbl_mul_add ();
+  dbl_mul_sub ();
+  dbl_neg_mul_add ();
+  exit (0);
+}
+
+/* { dg-final { scan-assembler "vfmaddps" } } */
+/* { dg-final { scan-assembler "vfmaddpd" } } */
+/* { dg-final { scan-assembler "vfmsubps" } } */
+/* { dg-final { scan-assembler "vfmsubpd" } } */
+/* { dg-final { scan-assembler "vfnmaddps" } } */
+/* { dg-final { scan-assembler "vfnmaddpd" } } */
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-check.h gcc-xop/gcc/testsuite/gcc.target/i386/fma4-check.h
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-check.h	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/fma4-check.h	2009-09-17 12:22:31.000000000 -0500
@@ -0,0 +1,20 @@
+#include <stdlib.h>
+
+#include "cpuid.h"
+
+static void fma4_test (void);
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+ 
+  if (!__get_cpuid (0x80000001, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run FMA4 test only if host has FMA4 support.  */
+  if (ecx & bit_FMA4)
+    fma4_test ();
+
+  exit (0);
+}
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-fma.c gcc-xop/gcc/testsuite/gcc.target/i386/fma4-fma.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-fma.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/fma4-fma.c	2009-09-17 12:22:31.000000000 -0500
@@ -0,0 +1,83 @@
+/* Test that the compiler properly optimizes floating point multiply
+   and add instructions into vfmaddss, vfmsubss, vfnmaddss,
+   vfnmsubss on FMA4 systems.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mfma4" } */
+
+extern void exit (int);
+
+float
+flt_mul_add (float a, float b, float c)
+{
+  return (a * b) + c;
+}
+
+double
+dbl_mul_add (double a, double b, double c)
+{
+  return (a * b) + c;
+}
+
+float
+flt_mul_sub (float a, float b, float c)
+{
+  return (a * b) - c;
+}
+
+double
+dbl_mul_sub (double a, double b, double c)
+{
+  return (a * b) - c;
+}
+
+float
+flt_neg_mul_add (float a, float b, float c)
+{
+  return (-(a * b)) + c;
+}
+
+double
+dbl_neg_mul_add (double a, double b, double c)
+{
+  return (-(a * b)) + c;
+}
+
+float
+flt_neg_mul_sub (float a, float b, float c)
+{
+  return (-(a * b)) - c;
+}
+
+double
+dbl_neg_mul_sub (double a, double b, double c)
+{
+  return (-(a * b)) - c;
+}
+
+float  f[10] = { 2, 3, 4 };
+double d[10] = { 2, 3, 4 };
+
+int main ()
+{
+  f[3] = flt_mul_add (f[0], f[1], f[2]);
+  f[4] = flt_mul_sub (f[0], f[1], f[2]);
+  f[5] = flt_neg_mul_add (f[0], f[1], f[2]);
+  f[6] = flt_neg_mul_sub (f[0], f[1], f[2]);
+
+  d[3] = dbl_mul_add (d[0], d[1], d[2]);
+  d[4] = dbl_mul_sub (d[0], d[1], d[2]);
+  d[5] = dbl_neg_mul_add (d[0], d[1], d[2]);
+  d[6] = dbl_neg_mul_sub (d[0], d[1], d[2]);
+  exit (0);
+}
+
+/* { dg-final { scan-assembler "vfmaddss" } } */
+/* { dg-final { scan-assembler "vfmaddsd" } } */
+/* { dg-final { scan-assembler "vfmsubss" } } */
+/* { dg-final { scan-assembler "vfmsubsd" } } */
+/* { dg-final { scan-assembler "vfnmaddss" } } */
+/* { dg-final { scan-assembler "vfnmaddsd" } } */
+/* { dg-final { scan-assembler "vfnmsubss" } } */
+/* { dg-final { scan-assembler "vfnmsubsd" } } */
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-maccXX.c gcc-xop/gcc/testsuite/gcc.target/i386/fma4-maccXX.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-maccXX.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/fma4-maccXX.c	2009-09-22 13:09:48.000000000 -0500
@@ -0,0 +1,136 @@
+/* { dg-do run } */
+/* { dg-require-effective-target fma4 } */
+/* { dg-options "-O0 -mfma4" } */
+
+#include "fma4-check.h"
+
+#include <x86intrin.h>
+#include <string.h>
+
+#define NUM 20
+
+union
+{
+  __m128 x[NUM];
+  float f[NUM * 4];
+  __m128d y[NUM];
+  double d[NUM * 2];
+} dst, res, src1, src2, src3;
+
+
+/* Note that in macc*,msub*,mnmacc* and mnsub* instructions, the intermdediate 
+   product is not rounded, only the addition is rounded. */
+
+static void
+init_maccps ()
+{
+  int i;
+  for (i = 0; i < NUM * 4; i++)
+    {
+      src1.f[i] = i;
+      src2.f[i] = i + 10;
+      src3.f[i] = i + 20;
+    }
+}
+
+static void
+init_maccpd ()
+{
+  int i;
+  for (i = 0; i < NUM * 4; i++)
+    {
+      src1.d[i] = i;
+      src2.d[i] = i + 10;
+      src3.d[i] = i + 20;
+    }
+}
+
+static int
+check_maccps ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 4; i = i + 4)
+    for (j = 0; j < 4; j++)
+      {
+	res.f[i + j] = (src1.f[i + j] * src2.f[i + j]) + src3.f[i + j];
+	if (dst.f[i + j] != res.f[i + j]) 
+	  check_fails++;
+      }
+  return check_fails++;
+}
+
+static int
+check_maccpd ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 2; i = i + 2)
+    for (j = 0; j < 2; j++)
+      {
+	res.d[i + j] = (src1.d[i + j] * src2.d[i + j]) + src3.d[i + j];
+	if (dst.d[i + j] != res.d[i + j]) 
+	  check_fails++;
+      }
+  return check_fails++;
+}
+
+
+static int
+check_maccss ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 4; i= i + 4)
+    {
+      res.f[i] = (src1.f[i] * src2.f[i]) + src3.f[i];
+      if (dst.f[i] != res.f[i]) 
+	check_fails++;
+    }	
+  return check_fails++;
+}
+
+static int
+check_maccsd ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 2; i = i + 2)
+    {
+      res.d[i] = (src1.d[i] * src2.d[i]) + src3.d[i];
+      if (dst.d[i] != res.d[i]) 
+	check_fails++;
+    }
+  return check_fails++;
+}
+
+static void
+fma4_test (void)
+{
+  int i;
+
+  init_maccps ();
+  
+  for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm_macc_ps (src1.x[i], src2.x[i], src3.x[i]);
+
+  if (check_maccps ()) 
+    abort ();
+
+  for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm_macc_ss (src1.x[i], src2.x[i], src3.x[i]);
+  
+  if (check_maccss ()) 
+    abort ();
+
+  init_maccpd ();
+  
+  for (i = 0; i < NUM; i++)
+    dst.y[i] = _mm_macc_pd (src1.y[i], src2.y[i], src3.y[i]);
+  
+  if (check_maccpd ()) 
+    abort ();
+
+  for (i = 0; i < NUM; i++)
+    dst.y[i] = _mm_macc_sd (src1.y[i], src2.y[i], src3.y[i]);
+  
+  if (check_maccsd ()) 
+    abort ();
+
+}
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-msubXX.c gcc-xop/gcc/testsuite/gcc.target/i386/fma4-msubXX.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-msubXX.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/fma4-msubXX.c	2009-09-22 13:10:00.000000000 -0500
@@ -0,0 +1,134 @@
+/* { dg-do run } */
+/* { dg-require-effective-target fma4 } */
+/* { dg-options "-O0 -mfma4" } */
+
+#include "fma4-check.h"
+
+#include <x86intrin.h>
+#include <string.h>
+
+#define NUM 20
+
+union
+{
+  __m128 x[NUM];
+  float f[NUM * 4];
+  __m128d y[NUM];
+  double d[NUM * 2];
+} dst, res, src1, src2, src3;
+
+/* Note that in macc*,msub*,mnmacc* and mnsub* instructions, the intermdediate 
+   product is not rounded, only the addition is rounded. */
+
+static void
+init_msubps ()
+{
+  int i;
+  for (i = 0; i < NUM * 4; i++)
+    {
+      src1.f[i] = i;
+      src2.f[i] = i + 10;
+      src3.f[i] = i + 20;
+    }
+}
+
+static void
+init_msubpd ()
+{
+  int i;
+  for (i = 0; i < NUM * 4; i++)
+    {
+      src1.d[i] = i;
+      src2.d[i] = i + 10;
+      src3.d[i] = i + 20;
+    }
+}
+
+static int
+check_msubps ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 4; i = i + 4)
+    for (j = 0; j < 4; j++)
+      {
+	res.f[i + j] = (src1.f[i + j] * src2.f[i + j]) - src3.f[i + j];
+	if (dst.f[i + j] != res.f[i + j]) 
+	  check_fails++;
+      }
+  return check_fails++;
+}
+
+static int
+check_msubpd ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 2; i = i + 2)
+    for (j = 0; j < 2; j++)
+      {
+	res.d[i + j] = (src1.d[i + j] * src2.d[i + j]) - src3.d[i + j];
+	if (dst.d[i + j] != res.d[i + j]) 
+	  check_fails++;
+      }
+  return check_fails++;
+}
+
+
+static int
+check_msubss ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 4; i = i + 4)
+    {
+      res.f[i] = (src1.f[i] * src2.f[i]) - src3.f[i];
+      if (dst.f[i] != res.f[i]) 
+	check_fails++;
+    }	
+  return check_fails++;
+}
+
+static int
+check_msubsd ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 2; i = i + 2)
+    {
+      res.d[i] = (src1.d[i] * src2.d[i]) - src3.d[i];
+      if (dst.d[i] != res.d[i]) 
+	check_fails++;
+    }
+  return check_fails++;
+}
+
+static void
+fma4_test (void)
+{
+  int i;
+
+  init_msubps ();
+  
+  for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm_msub_ps (src1.x[i], src2.x[i], src3.x[i]);
+  
+  if (check_msubps ()) 
+    abort ();
+
+  for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm_msub_ss (src1.x[i], src2.x[i], src3.x[i]);
+  
+  if (check_msubss ()) 
+    abort ();
+
+  init_msubpd ();
+  
+  for (i = 0; i < NUM; i++)
+    dst.y[i] = _mm_msub_pd (src1.y[i], src2.y[i], src3.y[i]);
+  
+  if (check_msubpd ()) 
+    abort ();
+  
+  for (i = 0; i < NUM; i++)
+    dst.y[i] = _mm_msub_sd (src1.y[i], src2.y[i], src3.y[i]);
+  
+  if (check_msubsd ()) 
+    abort ();
+}
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-nmaccXX.c gcc-xop/gcc/testsuite/gcc.target/i386/fma4-nmaccXX.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-nmaccXX.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/fma4-nmaccXX.c	2009-09-22 13:10:10.000000000 -0500
@@ -0,0 +1,137 @@
+/* { dg-do run } */
+/* { dg-require-effective-target fma4 } */
+/* { dg-options "-O0 -mfma4" } */
+
+#include "fma4-check.h"
+
+#include <x86intrin.h>
+#include <string.h>
+
+#define NUM 20
+
+union
+{
+  __m128 x[NUM];
+  float f[NUM * 4];
+  __m128d y[NUM];
+  double d[NUM * 2];
+} dst, res, src1, src2, src3;
+
+/* Note that in macc*,msub*,mnmacc* and mnsub* instructions, the intermdediate 
+   product is not rounded, only the addition is rounded. */
+
+static void
+init_nmaccps ()
+{
+  int i;
+  for (i = 0; i < NUM * 4; i++)
+    {
+      src1.f[i] = i;
+      src2.f[i] = i + 10;
+      src3.f[i] = i + 20;
+    }
+}
+
+static void
+init_nmaccpd ()
+{
+  int i;
+  for (i = 0; i < NUM * 4; i++)
+    {
+      src1.d[i] = i;
+      src2.d[i] = i + 10;
+      src3.d[i] = i + 20;
+    }
+}
+
+static int
+check_nmaccps ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 4; i = i + 4)
+    for (j = 0; j < 4; j++)
+      {
+	res.f[i + j] = - (src1.f[i + j] * src2.f[i + j]) + src3.f[i + j];
+	if (dst.f[i + j] != res.f[i + j]) 
+	  check_fails++;
+      }
+  return check_fails++;
+}
+
+static int
+check_nmaccpd ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 2; i = i + 2)
+    for (j = 0; j < 2; j++)
+      {
+	res.d[i + j] = - (src1.d[i + j] * src2.d[i + j]) + src3.d[i + j];
+	if (dst.d[i + j] != res.d[i + j]) 
+	  check_fails++;
+      }
+  return check_fails++;
+}
+
+
+static int
+check_nmaccss ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 4; i = i + 4)
+    {
+      res.f[i] = - (src1.f[i] * src2.f[i]) + src3.f[i];
+      if (dst.f[i] != res.f[i]) 
+	check_fails++;
+    }	
+  return check_fails++;
+}
+
+static int
+check_nmaccsd ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 2; i = i + 2)
+    {
+      res.d[i] = - (src1.d[i] * src2.d[i]) + src3.d[i];
+      if (dst.d[i] != res.d[i]) 
+	check_fails++;
+    }
+  return check_fails++;
+}
+
+static void
+fma4_test (void)
+{
+  int i;
+
+  init_nmaccps ();
+  
+  for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm_nmacc_ps (src1.x[i], src2.x[i], src3.x[i]);
+  
+  if (check_nmaccps ()) 
+    abort ();
+  
+
+  for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm_nmacc_ss (src1.x[i], src2.x[i], src3.x[i]);
+  
+  if (check_nmaccss ()) 
+    abort ();
+
+  init_nmaccpd ();
+  
+  for (i = 0; i < NUM; i++)
+    dst.y[i] = _mm_nmacc_pd (src1.y[i], src2.y[i], src3.y[i]);
+  
+  if (check_nmaccpd ()) 
+    abort ();
+  
+
+  for (i = 0; i < NUM; i++)
+    dst.y[i] = _mm_nmacc_sd (src1.y[i], src2.y[i], src3.y[i]);
+  
+  if (check_nmaccsd ()) 
+    abort ();
+
+}
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-nmsubXX.c gcc-xop/gcc/testsuite/gcc.target/i386/fma4-nmsubXX.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-nmsubXX.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/fma4-nmsubXX.c	2009-09-22 13:10:21.000000000 -0500
@@ -0,0 +1,137 @@
+/* { dg-do run } */
+/* { dg-require-effective-target fma4 } */
+/* { dg-options "-O0 -mfma4" } */
+
+#include "fma4-check.h"
+
+#include <x86intrin.h>
+#include <string.h>
+
+#define NUM 20
+
+union
+{
+  __m128 x[NUM];
+  float f[NUM * 4];
+  __m128d y[NUM];
+  double d[NUM * 2];
+} dst, res, src1, src2, src3;
+
+/* Note that in macc*,msub*,mnmacc* and mnsub* instructions, the intermdediate 
+   product is not rounded, only the addition is rounded. */
+
+static void
+init_nmsubps ()
+{
+  int i;
+  for (i = 0; i < NUM * 4; i++)
+    {
+      src1.f[i] = i;
+      src2.f[i] = i + 10;
+      src3.f[i] = i + 20;
+    }
+}
+
+static void
+init_nmsubpd ()
+{
+  int i;
+  for (i = 0; i < NUM * 4; i++)
+    {
+      src1.d[i] = i;
+      src2.d[i] = i + 10;
+      src3.d[i] = i + 20;
+    }
+}
+
+static int
+check_nmsubps ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 4; i = i + 4)
+    for (j = 0; j < 4; j++)
+      {
+	res.f[i + j] = - (src1.f[i + j] * src2.f[i + j]) - src3.f[i + j];
+	if (dst.f[i + j] != res.f[i + j]) 
+	  check_fails++;
+      }
+  return check_fails++;
+}
+
+static int
+check_nmsubpd ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 2; i = i + 2)
+    for (j = 0; j < 2; j++)
+      {
+	res.d[i + j] = - (src1.d[i + j] * src2.d[i + j]) - src3.d[i + j];
+	if (dst.d[i + j] != res.d[i + j]) 
+	  check_fails++;
+      }
+  return check_fails++;
+}
+
+
+static int
+check_nmsubss ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 4; i = i + 4)
+    {
+      res.f[i] = - (src1.f[i] * src2.f[i]) - src3.f[i];
+      if (dst.f[i] != res.f[i]) 
+	check_fails++;
+    }	
+  return check_fails++;
+}
+
+static int
+check_nmsubsd ()
+{
+  int i, j, check_fails = 0;
+  for (i = 0; i < NUM * 2; i = i + 2)
+    {
+      res.d[i] = - (src1.d[i] * src2.d[i]) - src3.d[i];
+      if (dst.d[i] != res.d[i]) 
+	check_fails++;
+    }
+  return check_fails++;
+}
+
+static void
+fma4_test (void)
+{
+  int i;
+
+  init_nmsubps ();
+  
+  for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm_nmsub_ps (src1.x[i], src2.x[i], src3.x[i]);
+  
+  if (check_nmsubps (&dst.x[i], &src1.f[i * 4], &src2.f[i * 4], &src3.f[i * 4])) 
+    abort ();
+  
+
+  for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm_nmsub_ss (src1.x[i], src2.x[i], src3.x[i]);
+  
+  if (check_nmsubss (&dst.x[i], &src1.f[i * 4], &src2.f[i * 4], &src3.f[i * 4])) 
+    abort ();
+
+  init_nmsubpd ();
+  
+  for (i = 0; i < NUM; i++)
+    dst.y[i] = _mm_nmsub_pd (src1.y[i], src2.y[i], src3.y[i]);
+  
+  if (check_nmsubpd (&dst.y[i], &src1.d[i * 2], &src2.d[i * 2], &src3.d[i * 2])) 
+    abort ();
+  
+
+  for (i = 0; i < NUM; i++)
+    dst.y[i] = _mm_nmsub_sd (src1.y[i], src2.y[i], src3.y[i]);
+  
+  if (check_nmsubsd (&dst.y[i], &src1.d[i * 2], &src2.d[i * 2], &src3.d[i * 2])) 
+    abort ();
+
+}
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-vector.c gcc-xop/gcc/testsuite/gcc.target/i386/fma4-vector.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/fma4-vector.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/fma4-vector.c	2009-09-17 12:22:31.000000000 -0500
@@ -0,0 +1,93 @@
+/* Test that the compiler properly optimizes floating point multiply and add
+   instructions vector into vfmaddps on FMA4 systems.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mfma4 -ftree-vectorize" } */
+
+extern void exit (int);
+
+typedef float     __m128  __attribute__ ((__vector_size__ (16), __may_alias__));
+typedef double    __m128d __attribute__ ((__vector_size__ (16), __may_alias__));
+
+#define SIZE 10240
+
+union {
+  __m128 f_align;
+  __m128d d_align;
+  float f[SIZE];
+  double d[SIZE];
+} a, b, c, d;
+
+void
+flt_mul_add (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.f[i] = (b.f[i] * c.f[i]) + d.f[i];
+}
+
+void
+dbl_mul_add (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.d[i] = (b.d[i] * c.d[i]) + d.d[i];
+}
+
+void
+flt_mul_sub (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.f[i] = (b.f[i] * c.f[i]) - d.f[i];
+}
+
+void
+dbl_mul_sub (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.d[i] = (b.d[i] * c.d[i]) - d.d[i];
+}
+
+void
+flt_neg_mul_add (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.f[i] = (-(b.f[i] * c.f[i])) + d.f[i];
+}
+
+void
+dbl_neg_mul_add (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.d[i] = (-(b.d[i] * c.d[i])) + d.d[i];
+}
+
+int main ()
+{
+  flt_mul_add ();
+  flt_mul_sub ();
+  flt_neg_mul_add ();
+
+  dbl_mul_add ();
+  dbl_mul_sub ();
+  dbl_neg_mul_add ();
+  exit (0);
+}
+
+/* { dg-final { scan-assembler "vfmaddps" } } */
+/* { dg-final { scan-assembler "vfmaddpd" } } */
+/* { dg-final { scan-assembler "vfmsubps" } } */
+/* { dg-final { scan-assembler "vfmsubpd" } } */
+/* { dg-final { scan-assembler "vfnmaddps" } } */
+/* { dg-final { scan-assembler "vfnmaddpd" } } */
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/funcspec-2.c gcc-xop/gcc/testsuite/gcc.target/i386/funcspec-2.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/funcspec-2.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/funcspec-2.c	2009-09-17 12:22:31.000000000 -0500
@@ -0,0 +1,99 @@
+/* Test whether using target specific options, we can generate FMA4 code.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -march=k8" } */
+
+extern void exit (int);
+
+#define FMA4_ATTR __attribute__((__target__("fma4")))
+extern float  flt_mul_add     (float a, float b, float c) FMA4_ATTR;
+extern float  flt_mul_sub     (float a, float b, float c) FMA4_ATTR;
+extern float  flt_neg_mul_add (float a, float b, float c) FMA4_ATTR;
+extern float  flt_neg_mul_sub (float a, float b, float c) FMA4_ATTR;
+
+extern double dbl_mul_add     (double a, double b, double c) FMA4_ATTR;
+extern double dbl_mul_sub     (double a, double b, double c) FMA4_ATTR;
+extern double dbl_neg_mul_add (double a, double b, double c) FMA4_ATTR;
+extern double dbl_neg_mul_sub (double a, double b, double c) FMA4_ATTR;
+
+float
+flt_mul_add (float a, float b, float c)
+{
+  return (a * b) + c;
+}
+
+double
+dbl_mul_add (double a, double b, double c)
+{
+  return (a * b) + c;
+}
+
+float
+flt_mul_sub (float a, float b, float c)
+{
+  return (a * b) - c;
+}
+
+double
+dbl_mul_sub (double a, double b, double c)
+{
+  return (a * b) - c;
+}
+
+float
+flt_neg_mul_add (float a, float b, float c)
+{
+  return (-(a * b)) + c;
+}
+
+double
+dbl_neg_mul_add (double a, double b, double c)
+{
+  return (-(a * b)) + c;
+}
+
+float
+flt_neg_mul_sub (float a, float b, float c)
+{
+  return (-(a * b)) - c;
+}
+
+double
+dbl_neg_mul_sub (double a, double b, double c)
+{
+  return (-(a * b)) - c;
+}
+
+float  f[10] = { 2, 3, 4 };
+double d[10] = { 2, 3, 4 };
+
+int main ()
+{
+  f[3] = flt_mul_add (f[0], f[1], f[2]);
+  f[4] = flt_mul_sub (f[0], f[1], f[2]);
+  f[5] = flt_neg_mul_add (f[0], f[1], f[2]);
+  f[6] = flt_neg_mul_sub (f[0], f[1], f[2]);
+
+  d[3] = dbl_mul_add (d[0], d[1], d[2]);
+  d[4] = dbl_mul_sub (d[0], d[1], d[2]);
+  d[5] = dbl_neg_mul_add (d[0], d[1], d[2]);
+  d[6] = dbl_neg_mul_sub (d[0], d[1], d[2]);
+  exit (0);
+}
+
+/* { dg-final { scan-assembler "vfmaddss" } } */
+/* { dg-final { scan-assembler "vfmaddsd" } } */
+/* { dg-final { scan-assembler "vfmsubss" } } */
+/* { dg-final { scan-assembler "vfmsubsd" } } */
+/* { dg-final { scan-assembler "vfnmaddss" } } */
+/* { dg-final { scan-assembler "vfnmaddsd" } } */
+/* { dg-final { scan-assembler "vfnmsubss" } } */
+/* { dg-final { scan-assembler "vfnmsubsd" } } */
+/* { dg-final { scan-assembler "call\t(.*)flt_mul_add" } } */
+/* { dg-final { scan-assembler "call\t(.*)flt_mul_sub" } } */
+/* { dg-final { scan-assembler "call\t(.*)flt_neg_mul_add" } } */
+/* { dg-final { scan-assembler "call\t(.*)flt_neg_mul_sub" } } */
+/* { dg-final { scan-assembler "call\t(.*)dbl_mul_add" } } */
+/* { dg-final { scan-assembler "call\t(.*)dbl_mul_sub" } } */
+/* { dg-final { scan-assembler "call\t(.*)dbl_neg_mul_add" } } */
+/* { dg-final { scan-assembler "call\t(.*)dbl_neg_mul_sub" } } */
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/funcspec-4.c gcc-xop/gcc/testsuite/gcc.target/i386/funcspec-4.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/funcspec-4.c	2009-09-21 19:03:30.000000000 -0500
+++ gcc-xop/gcc/testsuite/gcc.target/i386/funcspec-4.c	2009-09-17 12:22:31.000000000 -0500
@@ -1,6 +1,9 @@
 /* Test some error conditions with function specific options.  */
 /* { dg-do compile } */
 
+/* no fma400 switch */
+extern void error1 (void) __attribute__((__target__("fma400"))); /* { dg-error "unknown" } */
+
 /* Multiple arch switches */
 extern void error2 (void) __attribute__((__target__("arch=core2,arch=k8"))); /* { dg-error "already specified" } */
 
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/funcspec-5.c gcc-xop/gcc/testsuite/gcc.target/i386/funcspec-5.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/funcspec-5.c	2009-09-21 18:20:30.000000000 -0500
+++ gcc-xop/gcc/testsuite/gcc.target/i386/funcspec-5.c	2009-09-17 12:59:01.000000000 -0500
@@ -16,6 +16,7 @@ extern void test_sse4 (void)			__attribu
 extern void test_sse4_1 (void)			__attribute__((__target__("sse4.1")));
 extern void test_sse4_2 (void)			__attribute__((__target__("sse4.2")));
 extern void test_sse4a (void)			__attribute__((__target__("sse4a")));
+extern void test_fma4 (void)			__attribute__((__target__("fma4")));
 extern void test_ssse3 (void)			__attribute__((__target__("ssse3")));
 
 extern void test_no_abm (void)			__attribute__((__target__("no-abm")));
@@ -31,6 +32,7 @@ extern void test_no_sse4 (void)			__attr
 extern void test_no_sse4_1 (void)		__attribute__((__target__("no-sse4.1")));
 extern void test_no_sse4_2 (void)		__attribute__((__target__("no-sse4.2")));
 extern void test_no_sse4a (void)		__attribute__((__target__("no-sse4a")));
+extern void test_no_fma4 (void)			__attribute__((__target__("no-fma4")));
 extern void test_no_ssse3 (void)		__attribute__((__target__("no-ssse3")));
 
 extern void test_arch_i386 (void)		__attribute__((__target__("arch=i386")));
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/funcspec-6.c gcc-xop/gcc/testsuite/gcc.target/i386/funcspec-6.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/funcspec-6.c	2009-09-21 18:20:30.000000000 -0500
+++ gcc-xop/gcc/testsuite/gcc.target/i386/funcspec-6.c	2009-09-17 13:01:38.000000000 -0500
@@ -16,6 +16,7 @@ extern void test_sse4 (void)			__attribu
 extern void test_sse4_1 (void)			__attribute__((__target__("sse4.1")));
 extern void test_sse4_2 (void)			__attribute__((__target__("sse4.2")));
 extern void test_sse4a (void)			__attribute__((__target__("sse4a")));
+extern void test_fma4 (void)			__attribute__((__target__("fma4")));
 extern void test_ssse3 (void)			__attribute__((__target__("ssse3")));
 
 extern void test_no_abm (void)			__attribute__((__target__("no-abm")));
@@ -31,6 +32,7 @@ extern void test_no_sse4 (void)			__attr
 extern void test_no_sse4_1 (void)		__attribute__((__target__("no-sse4.1")));
 extern void test_no_sse4_2 (void)		__attribute__((__target__("no-sse4.2")));
 extern void test_no_sse4a (void)		__attribute__((__target__("no-sse4a")));
+extern void test_no_fma4 (void)			__attribute__((__target__("no-fma4")));
 extern void test_no_ssse3 (void)		__attribute__((__target__("no-ssse3")));
 
 extern void test_arch_nocona (void)		__attribute__((__target__("arch=nocona")));
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/funcspec-8.c gcc-xop/gcc/testsuite/gcc.target/i386/funcspec-8.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/funcspec-8.c	2009-09-21 18:20:30.000000000 -0500
+++ gcc-xop/gcc/testsuite/gcc.target/i386/funcspec-8.c	2009-09-17 13:07:50.000000000 -0500
@@ -104,6 +104,25 @@ generic_insertq (__m128i a, __m128i b)
   return __builtin_ia32_insertq (a, b);			/* { dg-error "needs isa option" } */
 }
 
+#ifdef __FMA4__
+#error "-mfma4 should not be set for this test"
+#endif
+
+__m128d fma4_fmaddpd (__m128d a, __m128d b, __m128d c) __attribute__((__target__("fma4")));
+__m128d generic_fmaddpd (__m128d a, __m128d b, __m128d c);
+
+__m128d
+fma4_fmaddpd  (__m128d a, __m128d b, __m128d c)
+{
+  return __builtin_ia32_vfmaddpd (a, b, c);
+}
+
+__m128d
+generic_fmaddpd  (__m128d a, __m128d b, __m128d c)
+{
+  return __builtin_ia32_vfmaddpd (a, b, c);		/* { dg-error "needs isa option" } */
+}
+
 #ifdef __AES__
 #error "-maes should not be set for this test"
 #endif
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/funcspec-9.c gcc-xop/gcc/testsuite/gcc.target/i386/funcspec-9.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/funcspec-9.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/funcspec-9.c	2009-09-17 13:12:46.000000000 -0500
@@ -0,0 +1,36 @@
+/* Test whether using target specific options, we can generate FMA4 code.  */
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=k8 -mfpmath=sse -msse2" } */
+
+extern void exit (int);
+
+#ifdef __FMA4__
+#warning "__FMA4__ should not be defined before #pragma GCC target."
+#endif
+
+#pragma GCC push_options
+#pragma GCC target ("fma4")
+
+#ifndef __FMA4__
+#warning "__FMA4__ should have be defined after #pragma GCC target."
+#endif
+
+float
+flt_mul_add (float a, float b, float c)
+{
+  return (a * b) + c;
+}
+
+#pragma GCC pop_options
+#ifdef __FMA4__
+#warning "__FMA4__ should not be defined after #pragma GCC pop target."
+#endif
+
+double
+dbl_mul_add (double a, double b, double c)
+{
+  return (a * b) + c;
+}
+
+/* { dg-final { scan-assembler "vfmaddss" } } */
+/* { dg-final { scan-assembler "addsd" } } */
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/i386.exp gcc-xop/gcc/testsuite/gcc.target/i386/i386.exp
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/i386.exp	2009-09-21 18:56:55.000000000 -0500
+++ gcc-xop/gcc/testsuite/gcc.target/i386/i386.exp	2009-09-17 12:22:39.000000000 -0500
@@ -120,6 +120,20 @@ proc check_effective_target_sse4a { } {
     } "-O2 -msse4a" ]
 }
 
+# Return 1 if fma4 instructions can be compiled.
+proc check_effective_target_fma4 { } {
+    return [check_no_compiler_messages fma4 object {
+        typedef float __m128 __attribute__ ((__vector_size__ (16)));
+	typedef float __v4sf __attribute__ ((__vector_size__ (16)));
+	__m128 _mm_macc_ps(__m128 __A, __m128 __B, __m128 __C)
+	{
+	    return (__m128) __builtin_ia32_vfmaddps ((__v4sf)__A,
+						     (__v4sf)__B,
+						     (__v4sf)__C);
+	}
+    } "-O2 -mfma4" ]
+}
+
 # If a testcase doesn't have special options, use these.
 global DEFAULT_CFLAGS
 if ![info exists DEFAULT_CFLAGS] then {
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-10.c gcc-xop/gcc/testsuite/gcc.target/i386/isa-10.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-10.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/isa-10.c	2009-09-22 16:23:58.000000000 -0500
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-march=x86-64 -mfma4 -mno-sse4" } */
+
+extern void abort (void);
+
+int
+main ()
+{
+#if !defined __SSE__
+  abort ();
+#endif
+#if !defined __SSE2__
+  abort ();
+#endif
+#if !defined __SSE3__
+  abort ();
+#endif
+#if !defined __SSSE3__
+  abort ();
+#endif
+#if defined __SSE4_1__
+  abort ();
+#endif
+#if defined __SSE4_2__
+  abort ();
+#endif
+#if !defined __SSE4A__
+  abort ();
+#endif
+#if defined __FMA4__
+  abort ();
+#endif
+  return 0;
+}
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-11.c gcc-xop/gcc/testsuite/gcc.target/i386/isa-11.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-11.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/isa-11.c	2009-09-22 16:15:06.000000000 -0500
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-march=x86-64 -mfma4 -mno-ssse3" } */
+
+extern void abort (void);
+
+int
+main ()
+{
+#if !defined __SSE__
+  abort ();
+#endif
+#if !defined __SSE2__
+  abort ();
+#endif
+#if !defined __SSE3__
+  abort ();
+#endif
+#if defined __SSSE3__
+  abort ();
+#endif
+#if defined __SSE4_1__
+  abort ();
+#endif
+#if defined __SSE4_2__
+  abort ();
+#endif
+#if !defined __SSE4A__
+  abort ();
+#endif
+#if defined __FMA4__
+  abort ();
+#endif
+  return 0;
+}
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-12.c gcc-xop/gcc/testsuite/gcc.target/i386/isa-12.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-12.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/isa-12.c	2009-09-17 12:22:39.000000000 -0500
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-march=x86-64 -mfma4 -mno-sse3" } */
+
+extern void abort (void);
+
+int
+main ()
+{
+#if !defined __SSE__
+  abort ();
+#endif
+#if !defined __SSE2__
+  abort ();
+#endif
+#if defined __SSE3__
+  abort ();
+#endif
+#if defined __SSSE3__
+  abort ();
+#endif
+#if defined __SSE4_1__
+  abort ();
+#endif
+#if defined __SSE4_2__
+  abort ();
+#endif
+#if defined __SSE4A__
+  abort ();
+#endif
+#if defined __FMA4__
+  abort ();
+#endif
+  return 0;
+}
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-13.c gcc-xop/gcc/testsuite/gcc.target/i386/isa-13.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-13.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/isa-13.c	2009-09-17 12:22:39.000000000 -0500
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-march=x86-64 -mfma4 -mno-sse2" } */
+
+extern void abort (void);
+
+int
+main ()
+{
+#if !defined __SSE__
+  abort ();
+#endif
+#if defined __SSE2__
+  abort ();
+#endif
+#if defined __SSE3__
+  abort ();
+#endif
+#if defined __SSSE3__
+  abort ();
+#endif
+#if defined __SSE4_1__
+  abort ();
+#endif
+#if defined __SSE4_2__
+  abort ();
+#endif
+#if defined __SSE4A__
+  abort ();
+#endif
+#if defined __FMA4__
+  abort ();
+#endif
+  return 0;
+}
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-14.c gcc-xop/gcc/testsuite/gcc.target/i386/isa-14.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-14.c	2009-09-21 18:20:30.000000000 -0500
+++ gcc-xop/gcc/testsuite/gcc.target/i386/isa-14.c	2009-09-17 12:55:16.000000000 -0500
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-march=x86-64 -msse4a -mno-sse" } */
+/* { dg-options "-march=x86-64 -msse4a -mfma4 -mno-sse" } */
 
 extern void abort (void);
 
@@ -27,5 +27,8 @@ main ()
 #if defined __SSE4A__
   abort ();
 #endif
+#if defined __FMA4__
+  abort ();
+#endif
   return 0;
 }
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-1.c gcc-xop/gcc/testsuite/gcc.target/i386/isa-1.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-1.c	2009-09-21 19:05:13.000000000 -0500
+++ gcc-xop/gcc/testsuite/gcc.target/i386/isa-1.c	2009-09-22 11:02:12.000000000 -0500
@@ -27,5 +27,11 @@ main ()
 #if defined __SSE4A__
   abort ();
 #endif
+#if defined __AVX__
+  abort ();
+#endif
+#if defined __FMA4__
+  abort ();
+#endif
   return 0;
 }
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-2.c gcc-xop/gcc/testsuite/gcc.target/i386/isa-2.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-2.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/isa-2.c	2009-09-22 10:55:17.000000000 -0500
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-options "-march=x86-64 -msse4 -mfma4" } */
+
+extern void abort (void);
+
+int
+main ()
+{
+#if !defined __SSE__
+  abort ();
+#endif
+#if !defined __SSE2__
+  abort ();
+#endif
+#if !defined __SSE3__
+  abort ();
+#endif
+#if !defined __SSSE3__
+  abort ();
+#endif
+#if !defined __SSE4_1__
+  abort ();
+#endif
+#if !defined __SSE4_2__
+  abort ();
+#endif
+#if !defined __SSE4A__
+  abort ();
+#endif
+#if !defined __AVX__
+  abort ();
+#endif
+#if !defined __FMA4__
+  abort ();
+#endif
+  return 0;
+}
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-3.c gcc-xop/gcc/testsuite/gcc.target/i386/isa-3.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-3.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/isa-3.c	2009-09-22 10:56:51.000000000 -0500
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-options "-march=x86-64 -msse4 -mfma4 -msse4a" } */
+
+extern void abort (void);
+
+int
+main ()
+{
+#if !defined __SSE__
+  abort ();
+#endif
+#if !defined __SSE2__
+  abort ();
+#endif
+#if !defined __SSE3__
+  abort ();
+#endif
+#if !defined __SSSE3__
+  abort ();
+#endif
+#if !defined __SSE4_1__
+  abort ();
+#endif
+#if !defined __SSE4_2__
+  abort ();
+#endif
+#if !defined __SSE4A__
+  abort ();
+#endif
+#if !defined __AVX__
+  abort ();
+#endif
+#if !defined __FMA4__
+  abort ();
+#endif
+  return 0;
+}
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-4.c gcc-xop/gcc/testsuite/gcc.target/i386/isa-4.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-4.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/isa-4.c	2009-09-22 13:45:05.000000000 -0500
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-options "-march=core2 -mfma4 -mno-sse4" } */
+
+extern void abort (void);
+
+int
+main ()
+{
+#if !defined __SSE__
+  abort ();
+#endif
+#if !defined __SSE2__
+  abort ();
+#endif
+#if !defined __SSE3__
+  abort ();
+#endif
+#if !defined __SSSE3__
+  abort ();
+#endif
+#if defined __SSE4_1__
+  abort ();
+#endif
+#if defined __SSE4_2__
+  abort ();
+#endif
+#if !defined __SSE4A__
+  abort ();
+#endif
+#if defined __AVX__
+  abort ();
+#endif
+#if defined __FMA4__
+  abort ();
+#endif
+  return 0;
+}
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-5.c gcc-xop/gcc/testsuite/gcc.target/i386/isa-5.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-5.c	2009-09-21 19:06:38.000000000 -0500
+++ gcc-xop/gcc/testsuite/gcc.target/i386/isa-5.c	2009-09-22 10:58:47.000000000 -0500
@@ -27,5 +27,11 @@ main ()
 #if !defined __SSE4A__
   abort ();
 #endif
+#if defined __AVX__
+  abort ();
+#endif
+#if defined __FMA4__
+  abort ();
+#endif
   return 0;
 }
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-6.c gcc-xop/gcc/testsuite/gcc.target/i386/isa-6.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-6.c	2009-09-21 19:05:53.000000000 -0500
+++ gcc-xop/gcc/testsuite/gcc.target/i386/isa-6.c	2009-09-22 10:59:14.000000000 -0500
@@ -28,5 +28,11 @@ main ()
 #if !defined __SSE4A__
   abort ();
 #endif
+#if defined __AVX__
+  abort ();
+#endif
+#if defined __FMA4__
+  abort ();
+#endif
   return 0;
 }
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-7.c gcc-xop/gcc/testsuite/gcc.target/i386/isa-7.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-7.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/isa-7.c	2009-09-22 16:27:30.000000000 -0500
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-options "-march=amdfam10 -mfma4 -mno-sse4" } */
+
+extern void abort (void);
+
+int
+main ()
+{
+#if !defined __SSE__
+  abort ();
+#endif
+#if !defined __SSE2__
+  abort ();
+#endif
+#if !defined __SSE3__
+  abort ();
+#endif
+#if !defined __SSSE3__
+  abort ();
+#endif
+#if defined __SSE4_1__
+  abort ();
+#endif
+#if defined __SSE4_2__
+  abort ();
+#endif
+#if !defined __SSE4A__
+  abort ();
+#endif
+#if defined __AVX__
+  abort ();
+#endif
+#if defined __FMA4__
+  abort ();
+#endif
+  return 0;
+}
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-8.c gcc-xop/gcc/testsuite/gcc.target/i386/isa-8.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-8.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/isa-8.c	2009-09-22 15:50:22.000000000 -0500
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-march=amdfam10 -mfma4 -mno-sse4a" } */
+
+extern void abort (void);
+
+int
+main ()
+{
+#if !defined __SSE__
+  abort ();
+#endif
+#if !defined __SSE2__
+  abort ();
+#endif
+#if !defined __SSE3__
+  abort ();
+#endif
+#if !defined __SSSE3__
+  abort ();
+#endif
+#if !defined __SSE4_1__
+  abort ();
+#endif
+#if !defined __SSE4_2__
+  abort ();
+#endif
+#if defined __SSE4A__
+  abort ();
+#endif
+#if defined __FMA4__
+  abort ();
+#endif
+  return 0;
+}
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-9.c gcc-xop/gcc/testsuite/gcc.target/i386/isa-9.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/isa-9.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/isa-9.c	2009-09-17 12:22:39.000000000 -0500
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-march=amdfam10 -mno-fma4" } */
+
+extern void abort (void);
+
+int
+main ()
+{
+#if !defined __SSE__
+  abort ();
+#endif
+#if !defined __SSE2__
+  abort ();
+#endif
+#if !defined __SSE3__
+  abort ();
+#endif
+#if defined __SSSE3__
+  abort ();
+#endif
+#if defined __SSE4_1__
+  abort ();
+#endif
+#if defined __SSE4_2__
+  abort ();
+#endif
+#if !defined __SSE4A__
+  abort ();
+#endif
+#if defined __FMA4__
+  abort ();
+#endif
+  return 0;
+}
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/sse-12.c gcc-xop/gcc/testsuite/gcc.target/i386/sse-12.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/sse-12.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/sse-12.c	2009-09-17 12:22:40.000000000 -0500
@@ -0,0 +1,8 @@
+/* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h and mm_malloc.h are
+   usable with -O -std=c89 -pedantic-errors.  */
+/* { dg-do compile } */
+/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -m3dnow -mavx -mfma4 -maes -mpclmul" } */
+
+#include <x86intrin.h>
+
+int dummy;
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/sse-13.c gcc-xop/gcc/testsuite/gcc.target/i386/sse-13.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/sse-13.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/sse-13.c	2009-09-17 12:22:40.000000000 -0500
@@ -0,0 +1,128 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -maes -mpclmul" } */
+
+#include <mm_malloc.h>
+
+/* Test that the intrinsics compile with optimization.  All of them are
+   defined as inline functions in {,x,e,p,t,s,w,a,b,i}mmintrin.h and mm3dnow.h
+   that reference the proper builtin functions.  Defining away "extern" and
+   "__inline" results in all of them being compiled as proper functions.  */
+
+#define extern
+#define __inline
+
+/* Following intrinsics require immediate arguments. */
+
+/* ammintrin.h */
+#define __builtin_ia32_extrqi(X, I, L)  __builtin_ia32_extrqi(X, 1, 1)
+#define __builtin_ia32_insertqi(X, Y, I, L) __builtin_ia32_insertqi(X, Y, 1, 1)
+
+/* immintrin.h */
+#define __builtin_ia32_blendpd256(X, Y, M) __builtin_ia32_blendpd256(X, Y, 1)
+#define __builtin_ia32_blendps256(X, Y, M) __builtin_ia32_blendps256(X, Y, 1)
+#define __builtin_ia32_dpps256(X, Y, M) __builtin_ia32_dpps256(X, Y, 1)
+#define __builtin_ia32_shufpd256(X, Y, M) __builtin_ia32_shufpd256(X, Y, 1)
+#define __builtin_ia32_shufps256(X, Y, M) __builtin_ia32_shufps256(X, Y, 1)
+#define __builtin_ia32_cmpsd(X, Y, O) __builtin_ia32_cmpsd(X, Y, 1)
+#define __builtin_ia32_cmpss(X, Y, O) __builtin_ia32_cmpss(X, Y, 1)
+#define __builtin_ia32_cmppd(X, Y, O) __builtin_ia32_cmppd(X, Y, 1)
+#define __builtin_ia32_cmpps(X, Y, O) __builtin_ia32_cmpps(X, Y, 1)
+#define __builtin_ia32_cmppd256(X, Y, O) __builtin_ia32_cmppd256(X, Y, 1)
+#define __builtin_ia32_cmpps256(X, Y, O) __builtin_ia32_cmpps256(X, Y, 1)
+#define __builtin_ia32_vextractf128_pd256(X, N) __builtin_ia32_vextractf128_pd256(X, 1)
+#define __builtin_ia32_vextractf128_ps256(X, N) __builtin_ia32_vextractf128_ps256(X, 1)
+#define __builtin_ia32_vextractf128_si256(X, N) __builtin_ia32_vextractf128_si256(X, 1)
+#define __builtin_ia32_vpermilpd(X, N) __builtin_ia32_vpermilpd(X, 1)
+#define __builtin_ia32_vpermilpd256(X, N) __builtin_ia32_vpermilpd256(X, 1)
+#define __builtin_ia32_vpermilps(X, N) __builtin_ia32_vpermilps(X, 1)
+#define __builtin_ia32_vpermilps256(X, N) __builtin_ia32_vpermilps256(X, 1)
+#define __builtin_ia32_vpermil2pd(X, Y, C, I) __builtin_ia32_vpermil2pd(X, Y, C, 1)
+#define __builtin_ia32_vpermil2pd256(X, Y, C, I) __builtin_ia32_vpermil2pd256(X, Y, C, 1)
+#define __builtin_ia32_vpermil2ps(X, Y, C, I) __builtin_ia32_vpermil2ps(X, Y, C, 1)
+#define __builtin_ia32_vpermil2ps256(X, Y, C, I) __builtin_ia32_vpermil2ps256(X, Y, C, 1)
+#define __builtin_ia32_vperm2f128_pd256(X, Y, C) __builtin_ia32_vperm2f128_pd256(X, Y, 1)
+#define __builtin_ia32_vperm2f128_ps256(X, Y, C) __builtin_ia32_vperm2f128_ps256(X, Y, 1)
+#define __builtin_ia32_vperm2f128_si256(X, Y, C) __builtin_ia32_vperm2f128_si256(X, Y, 1)
+#define __builtin_ia32_vinsertf128_pd256(X, Y, C) __builtin_ia32_vinsertf128_pd256(X, Y, 1)
+#define __builtin_ia32_vinsertf128_ps256(X, Y, C) __builtin_ia32_vinsertf128_ps256(X, Y, 1)
+#define __builtin_ia32_vinsertf128_si256(X, Y, C) __builtin_ia32_vinsertf128_si256(X, Y, 1)
+#define __builtin_ia32_roundpd256(V, M) __builtin_ia32_roundpd256(V, 1)
+#define __builtin_ia32_roundps256(V, M) __builtin_ia32_roundps256(V, 1)
+
+/* wmmintrin.h */
+#define __builtin_ia32_aeskeygenassist128(X, C) __builtin_ia32_aeskeygenassist128(X, 1)
+#define __builtin_ia32_pclmulqdq128(X, Y, I) __builtin_ia32_pclmulqdq128(X, Y, 1)
+
+/* smmintrin.h */
+#define __builtin_ia32_roundpd(V, M) __builtin_ia32_roundpd(V, 1)
+#define __builtin_ia32_roundsd(D, V, M) __builtin_ia32_roundsd(D, V, 1)
+#define __builtin_ia32_roundps(V, M) __builtin_ia32_roundps(V, 1)
+#define __builtin_ia32_roundss(D, V, M) __builtin_ia32_roundss(D, V, 1)
+
+#define __builtin_ia32_pblendw128(X, Y, M) __builtin_ia32_pblendw128 (X, Y, 1)
+#define __builtin_ia32_blendps(X, Y, M) __builtin_ia32_blendps(X, Y, 1)
+#define __builtin_ia32_blendpd(X, Y, M) __builtin_ia32_blendpd(X, Y, 1)
+#define __builtin_ia32_dpps(X, Y, M) __builtin_ia32_dpps(X, Y, 1)
+#define __builtin_ia32_dppd(X, Y, M) __builtin_ia32_dppd(X, Y, 1)
+#define __builtin_ia32_insertps128(D, S, N) __builtin_ia32_insertps128(D, S, 1)
+#define __builtin_ia32_vec_ext_v4sf(X, N) __builtin_ia32_vec_ext_v4sf(X, 1)
+#define __builtin_ia32_vec_set_v16qi(D, S, N) __builtin_ia32_vec_set_v16qi(D, S, 1)
+#define __builtin_ia32_vec_set_v4si(D, S, N) __builtin_ia32_vec_set_v4si(D, S, 1)
+#define __builtin_ia32_vec_set_v2di(D, S, N) __builtin_ia32_vec_set_v2di(D, S, 1)
+#define __builtin_ia32_vec_ext_v16qi(X, N) __builtin_ia32_vec_ext_v16qi(X, 1)
+#define __builtin_ia32_vec_ext_v4si(X, N) __builtin_ia32_vec_ext_v4si(X, 1)
+#define __builtin_ia32_vec_ext_v2di(X, N) __builtin_ia32_vec_ext_v2di(X, 1)
+#define __builtin_ia32_mpsadbw128(X, Y, M) __builtin_ia32_mpsadbw128(X, Y, 1)
+#define __builtin_ia32_pcmpistrm128(X, Y, M) \
+  __builtin_ia32_pcmpistrm128(X, Y, 1)
+#define __builtin_ia32_pcmpistri128(X, Y, M) \
+  __builtin_ia32_pcmpistri128(X, Y, 1)
+#define __builtin_ia32_pcmpestrm128(X, LX, Y, LY, M) \
+  __builtin_ia32_pcmpestrm128(X, LX, Y, LY, 1)
+#define __builtin_ia32_pcmpestri128(X, LX, Y, LY, M) \
+  __builtin_ia32_pcmpestri128(X, LX, Y, LY, 1)
+#define __builtin_ia32_pcmpistria128(X, Y, M) \
+  __builtin_ia32_pcmpistria128(X, Y, 1)
+#define __builtin_ia32_pcmpistric128(X, Y, M) \
+  __builtin_ia32_pcmpistric128(X, Y, 1)
+#define __builtin_ia32_pcmpistrio128(X, Y, M) \
+  __builtin_ia32_pcmpistrio128(X, Y, 1)
+#define __builtin_ia32_pcmpistris128(X, Y, M) \
+  __builtin_ia32_pcmpistris128(X, Y, 1)
+#define __builtin_ia32_pcmpistriz128(X, Y, M) \
+  __builtin_ia32_pcmpistriz128(X, Y, 1)
+#define __builtin_ia32_pcmpestria128(X, LX, Y, LY, M) \
+  __builtin_ia32_pcmpestria128(X, LX, Y, LY, 1)
+#define __builtin_ia32_pcmpestric128(X, LX, Y, LY, M) \
+  __builtin_ia32_pcmpestric128(X, LX, Y, LY, 1)
+#define __builtin_ia32_pcmpestrio128(X, LX, Y, LY, M) \
+  __builtin_ia32_pcmpestrio128(X, LX, Y, LY, 1)
+#define __builtin_ia32_pcmpestris128(X, LX, Y, LY, M) \
+  __builtin_ia32_pcmpestris128(X, LX, Y, LY, 1)
+#define __builtin_ia32_pcmpestriz128(X, LX, Y, LY, M) \
+  __builtin_ia32_pcmpestriz128(X, LX, Y, LY, 1)
+
+/* tmmintrin.h */
+#define __builtin_ia32_palignr128(X, Y, N) __builtin_ia32_palignr128(X, Y, 8)
+#define __builtin_ia32_palignr(X, Y, N) __builtin_ia32_palignr(X, Y, 8)
+
+/* emmintrin.h */
+#define __builtin_ia32_psrldqi128(A, B) __builtin_ia32_psrldqi128(A, 8)
+#define __builtin_ia32_pslldqi128(A, B) __builtin_ia32_pslldqi128(A, 8)
+#define __builtin_ia32_pshufhw(A, N) __builtin_ia32_pshufhw(A, 0)
+#define __builtin_ia32_pshuflw(A, N) __builtin_ia32_pshuflw(A, 0)
+#define __builtin_ia32_pshufd(A, N) __builtin_ia32_pshufd(A, 0)
+#define __builtin_ia32_vec_set_v8hi(A, D, N) \
+  __builtin_ia32_vec_set_v8hi(A, D, 0)
+#define __builtin_ia32_vec_ext_v8hi(A, N) __builtin_ia32_vec_ext_v8hi(A, 0)
+#define __builtin_ia32_shufpd(A, B, N) __builtin_ia32_shufpd(A, B, 0)
+
+/* xmmintrin.h */
+#define __builtin_prefetch(P, A, I) __builtin_prefetch(P, A, _MM_HINT_NTA)
+#define __builtin_ia32_pshufw(A, N) __builtin_ia32_pshufw(A, 0)
+#define __builtin_ia32_vec_set_v4hi(A, D, N) \
+  __builtin_ia32_vec_set_v4hi(A, D, 0)
+#define __builtin_ia32_vec_ext_v4hi(A, N) __builtin_ia32_vec_ext_v4hi(A, 0)
+#define __builtin_ia32_shufps(A, B, N) __builtin_ia32_shufps(A, B, 0)
+
+#include <x86intrin.h>
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/sse-14.c gcc-xop/gcc/testsuite/gcc.target/i386/sse-14.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/sse-14.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/sse-14.c	2009-09-17 12:22:40.000000000 -0500
@@ -0,0 +1,157 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -msse4a -maes -mpclmul" } */
+
+#include <mm_malloc.h>
+
+/* Test that the intrinsics compile without optimization.  All of them are
+   defined as inline functions in {,x,e,p,t,s,w,a}mmintrin.h  and mm3dnow.h
+   that reference the proper builtin functions.  Defining away "extern" and
+   "__inline" results in all of them being compiled as proper functions.  */
+
+#define extern
+#define __inline
+
+#include <x86intrin.h>
+
+#define _CONCAT(x,y) x ## y
+
+#define test_1(func, type, op1_type, imm)				\
+  type _CONCAT(_,func) (op1_type A, int const I)			\
+  { return func (A, imm); }
+
+#define test_1x(func, type, op1_type, imm1, imm2)			\
+  type _CONCAT(_,func) (op1_type A, int const I, int const L)		\
+  { return func (A, imm1, imm2); }
+
+#define test_2(func, type, op1_type, op2_type, imm)			\
+  type _CONCAT(_,func) (op1_type A, op2_type B, int const I)		\
+  { return func (A, B, imm); }
+
+#define test_2x(func, type, op1_type, op2_type, imm1, imm2)		\
+  type _CONCAT(_,func) (op1_type A, op2_type B, int const I, int const L) \
+  { return func (A, B, imm1, imm2); }
+
+#define test_3(func, type, op1_type, op2_type, op3_type, imm)		\
+  type _CONCAT(_,func) (op1_type A, op2_type B,				\
+			op3_type C, int const I)			\
+  { return func (A, B, C, imm); }
+
+#define test_4(func, type, op1_type, op2_type, op3_type, op4_type, imm)	\
+  type _CONCAT(_,func) (op1_type A, op2_type B,				\
+			op3_type C, op4_type D, int const I)		\
+  { return func (A, B, C, D, imm); }
+
+
+/* Following intrinsics require immediate arguments.  They
+   are defined as macros for non-optimized compilations. */
+
+/* ammintrin.h */
+test_1x (_mm_extracti_si64, __m128i, __m128i, 1, 1)
+test_2x (_mm_inserti_si64, __m128i, __m128i, __m128i, 1, 1)
+
+/* immintrin.h */
+test_2 (_mm256_blend_pd, __m256d, __m256d, __m256d, 1)
+test_2 (_mm256_blend_ps, __m256, __m256, __m256, 1)
+test_2 (_mm256_dp_ps, __m256, __m256, __m256, 1)
+test_2 (_mm256_shuffle_pd, __m256d, __m256d, __m256d, 1)
+test_2 (_mm256_shuffle_ps, __m256, __m256, __m256, 1)
+test_2 (_mm_cmp_sd, __m128d, __m128d, __m128d, 1)
+test_2 (_mm_cmp_ss, __m128, __m128, __m128, 1)
+test_2 (_mm_cmp_pd, __m128d, __m128d, __m128d, 1)
+test_2 (_mm_cmp_ps, __m128, __m128, __m128, 1)
+test_2 (_mm256_cmp_pd, __m256d, __m256d, __m256d, 1)
+test_2 (_mm256_cmp_ps, __m256, __m256, __m256, 1)
+test_1 (_mm256_extractf128_pd, __m128d, __m256d, 1)
+test_1 (_mm256_extractf128_ps, __m128, __m256, 1)
+test_1 (_mm256_extractf128_si256, __m128i, __m256i, 1)
+test_1 (_mm256_extract_epi8, int, __m256i, 20)
+test_1 (_mm256_extract_epi16, int, __m256i, 13)
+test_1 (_mm256_extract_epi32, int, __m256i, 6)
+#ifdef __x86_64__
+test_1 (_mm256_extract_epi64, long long, __m256i, 2)
+#endif
+test_1 (_mm_permute_pd, __m128d, __m128d, 1)
+test_1 (_mm256_permute_pd, __m256d, __m256d, 1)
+test_1 (_mm_permute_ps, __m128, __m128, 1)
+test_1 (_mm256_permute_ps, __m256, __m256, 1)
+test_2 (_mm256_permute2f128_pd, __m256d, __m256d, __m256d, 1)
+test_2 (_mm256_permute2f128_ps, __m256, __m256, __m256, 1)
+test_2 (_mm256_permute2f128_si256, __m256i, __m256i, __m256i, 1)
+test_2 (_mm256_insertf128_pd, __m256d, __m256d, __m128d, 1)
+test_2 (_mm256_insertf128_ps, __m256, __m256, __m128, 1)
+test_2 (_mm256_insertf128_si256, __m256i, __m256i, __m128i, 1)
+test_2 (_mm256_insert_epi8, __m256i, __m256i, int, 30)
+test_2 (_mm256_insert_epi16, __m256i, __m256i, int, 7)
+test_2 (_mm256_insert_epi32, __m256i, __m256i, int, 3)
+#ifdef __x86_64__
+test_2 (_mm256_insert_epi64, __m256i, __m256i, long long, 1)
+#endif
+test_1 (_mm256_round_pd, __m256d, __m256d, 1)
+test_1 (_mm256_round_ps, __m256, __m256, 1)
+
+/* wmmintrin.h */
+test_1 (_mm_aeskeygenassist_si128, __m128i, __m128i, 1)
+test_2 (_mm_clmulepi64_si128, __m128i, __m128i, __m128i, 1)
+
+/* smmintrin.h */
+test_1 (_mm_round_pd, __m128d, __m128d, 1)
+test_1 (_mm_round_ps, __m128, __m128, 1)
+test_2 (_mm_round_sd, __m128d, __m128d, __m128d, 1)
+test_2 (_mm_round_ss, __m128, __m128, __m128, 1)
+
+test_2 (_mm_blend_epi16, __m128i, __m128i, __m128i, 1)
+test_2 (_mm_blend_ps, __m128, __m128, __m128, 1)
+test_2 (_mm_blend_pd, __m128d, __m128d, __m128d, 1)
+test_2 (_mm_dp_ps, __m128, __m128, __m128, 1)
+test_2 (_mm_dp_pd, __m128d, __m128d, __m128d, 1)
+test_2 (_mm_insert_ps, __m128, __m128, __m128, 1)
+test_1 (_mm_extract_ps, int, __m128, 1)
+test_2 (_mm_insert_epi8, __m128i, __m128i, int, 1)
+test_2 (_mm_insert_epi32, __m128i, __m128i, int, 1)
+#ifdef __x86_64__
+test_2 (_mm_insert_epi64, __m128i, __m128i, long long, 1)
+#endif
+test_1 (_mm_extract_epi8, int, __m128i, 1)
+test_1 (_mm_extract_epi32, int, __m128i, 1)
+#ifdef __x86_64__
+test_1 (_mm_extract_epi64, long long, __m128i, 1)
+#endif
+test_2 (_mm_mpsadbw_epu8, __m128i, __m128i, __m128i, 1)
+test_2 (_mm_cmpistrm, __m128i, __m128i, __m128i, 1)
+test_2 (_mm_cmpistri, int, __m128i, __m128i, 1)
+test_4 (_mm_cmpestrm, __m128i, __m128i, int, __m128i, int, 1)
+test_4 (_mm_cmpestri, int, __m128i, int, __m128i, int, 1)
+test_2 (_mm_cmpistra, int, __m128i, __m128i, 1)
+test_2 (_mm_cmpistrc, int, __m128i, __m128i, 1)
+test_2 (_mm_cmpistro, int, __m128i, __m128i, 1)
+test_2 (_mm_cmpistrs, int, __m128i, __m128i, 1)
+test_2 (_mm_cmpistrz, int, __m128i, __m128i, 1)
+test_4 (_mm_cmpestra, int, __m128i, int, __m128i, int, 1)
+test_4 (_mm_cmpestrc, int, __m128i, int, __m128i, int, 1)
+test_4 (_mm_cmpestro, int, __m128i, int, __m128i, int, 1)
+test_4 (_mm_cmpestrs, int, __m128i, int, __m128i, int, 1)
+test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1)
+
+/* tmmintrin.h */
+test_2 (_mm_alignr_epi8, __m128i, __m128i, __m128i, 1)
+test_2 (_mm_alignr_pi8, __m64, __m64, __m64, 1)
+
+/* emmintrin.h */
+test_2 (_mm_shuffle_pd, __m128d, __m128d, __m128d, 1)
+test_1 (_mm_srli_si128, __m128i, __m128i, 1)
+test_1 (_mm_slli_si128, __m128i, __m128i, 1)
+test_1 (_mm_extract_epi16, int, __m128i, 1)
+test_2 (_mm_insert_epi16, __m128i, __m128i, int, 1)
+test_1 (_mm_shufflehi_epi16, __m128i, __m128i, 1)
+test_1 (_mm_shufflelo_epi16, __m128i, __m128i, 1)
+test_1 (_mm_shuffle_epi32, __m128i, __m128i, 1)
+
+/* xmmintrin.h */
+test_2 (_mm_shuffle_ps, __m128, __m128, __m128, 1)
+test_1 (_mm_extract_pi16, int, __m64, 1)
+test_1 (_m_pextrw, int, __m64, 1)
+test_2 (_mm_insert_pi16, __m64, __m64, int, 1)
+test_2 (_m_pinsrw, __m64, __m64, int, 1)
+test_1 (_mm_shuffle_pi16, __m64, __m64, 1)
+test_1 (_m_pshufw, __m64, __m64, 1)
+test_1 (_mm_prefetch, void, void *, _MM_HINT_NTA)
diff -upNw gcc-xop-fma4/gcc/testsuite/gcc.target/i386/sse-22.c gcc-xop/gcc/testsuite/gcc.target/i386/sse-22.c
--- gcc-xop-fma4/gcc/testsuite/gcc.target/i386/sse-22.c	1969-12-31 18:00:00.000000000 -0600
+++ gcc-xop/gcc/testsuite/gcc.target/i386/sse-22.c	2009-09-17 12:22:40.000000000 -0500
@@ -0,0 +1,161 @@
+/* Same as sse-14, except converted to use #pragma GCC option.  */
+/* { dg-do compile } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration" } */
+
+#include <mm_malloc.h>
+
+/* Test that the intrinsics compile without optimization.  All of them are
+   defined as inline functions in {,x,e,p,t,s,w,a}mmintrin.h  and mm3dnow.h
+   that reference the proper builtin functions.  Defining away "extern" and
+   "__inline" results in all of them being compiled as proper functions.  */
+
+#define extern
+#define __inline
+
+#define _CONCAT(x,y) x ## y
+
+#define test_1(func, type, op1_type, imm)				\
+  type _CONCAT(_,func) (op1_type A, int const I)			\
+  { return func (A, imm); }
+
+#define test_1x(func, type, op1_type, imm1, imm2)			\
+  type _CONCAT(_,func) (op1_type A, int const I, int const L)		\
+  { return func (A, imm1, imm2); }
+
+#define test_2(func, type, op1_type, op2_type, imm)			\
+  type _CONCAT(_,func) (op1_type A, op2_type B, int const I)		\
+  { return func (A, B, imm); }
+
+#define test_2x(func, type, op1_type, op2_type, imm1, imm2)		\
+  type _CONCAT(_,func) (op1_type A, op2_type B, int const I, int const L) \
+  { return func (A, B, imm1, imm2); }
+
+#define test_4(func, type, op1_type, op2_type, op3_type, op4_type, imm)	\
+  type _CONCAT(_,func) (op1_type A, op2_type B,				\
+			op3_type C, op4_type D, int const I)		\
+  { return func (A, B, C, D, imm); }
+
+
+#ifndef DIFFERENT_PRAGMAS
+#pragma GCC target ("mmx,3dnow,sse,sse2,sse3,ssse3,sse4.1,sse4.2,sse4a,aes,pclmul")
+#endif
+
+/* Following intrinsics require immediate arguments.  They
+   are defined as macros for non-optimized compilations. */
+
+/* mmintrin.h (MMX).  */
+#ifdef DIFFERENT_PRAGMAS
+#pragma GCC target ("mmx")
+#endif
+#include <mmintrin.h>
+
+/* mm3dnow.h (3DNOW).  */
+#ifdef DIFFERENT_PRAGMAS
+#pragma GCC target ("3dnow")
+#endif
+#include <mm3dnow.h>
+
+/* xmmintrin.h (SSE).  */
+#ifdef DIFFERENT_PRAGMAS
+#pragma GCC target ("sse")
+#endif
+#include <xmmintrin.h>
+test_2 (_mm_shuffle_ps, __m128, __m128, __m128, 1)
+test_1 (_mm_extract_pi16, int, __m64, 1)
+test_1 (_m_pextrw, int, __m64, 1)
+test_2 (_mm_insert_pi16, __m64, __m64, int, 1)
+test_2 (_m_pinsrw, __m64, __m64, int, 1)
+test_1 (_mm_shuffle_pi16, __m64, __m64, 1)
+test_1 (_m_pshufw, __m64, __m64, 1)
+test_1 (_mm_prefetch, void, void *, _MM_HINT_NTA)
+
+/* emmintrin.h (SSE2).  */
+#ifdef DIFFERENT_PRAGMAS
+#pragma GCC target ("sse2")
+#endif
+#include <emmintrin.h>
+test_2 (_mm_shuffle_pd, __m128d, __m128d, __m128d, 1)
+test_1 (_mm_srli_si128, __m128i, __m128i, 1)
+test_1 (_mm_slli_si128, __m128i, __m128i, 1)
+test_1 (_mm_extract_epi16, int, __m128i, 1)
+test_2 (_mm_insert_epi16, __m128i, __m128i, int, 1)
+test_1 (_mm_shufflehi_epi16, __m128i, __m128i, 1)
+test_1 (_mm_shufflelo_epi16, __m128i, __m128i, 1)
+test_1 (_mm_shuffle_epi32, __m128i, __m128i, 1)
+
+/* pmmintrin.h (SSE3).  */
+#ifdef DIFFERENT_PRAGMAS
+#pragma GCC target ("sse3")
+#endif
+#include <pmmintrin.h>
+
+/* tmmintrin.h (SSSE3).  */
+#ifdef DIFFERENT_PRAGMAS
+#pragma GCC target ("ssse3")
+#endif
+#include <tmmintrin.h>
+test_2 (_mm_alignr_epi8, __m128i, __m128i, __m128i, 1)
+test_2 (_mm_alignr_pi8, __m64, __m64, __m64, 1)
+
+/* ammintrin.h (SSE4A).  */
+#ifdef DIFFERENT_PRAGMAS
+#pragma GCC target ("sse4a")
+#endif
+#include <ammintrin.h>
+test_1x (_mm_extracti_si64, __m128i, __m128i, 1, 1)
+test_2x (_mm_inserti_si64, __m128i, __m128i, __m128i, 1, 1)
+
+/* smmintrin.h (SSE4.1).  */
+/* nmmintrin.h (SSE4.2).  */
+/* Note, nmmintrin.h includes smmintrin.h, and smmintrin.h checks for the
+   #ifdef.  So just set the option to SSE4.2.  */
+#ifdef DIFFERENT_PRAGMAS
+#pragma GCC target ("sse4.2")
+#endif
+#include <nmmintrin.h>
+test_2 (_mm_blend_epi16, __m128i, __m128i, __m128i, 1)
+test_2 (_mm_blend_ps, __m128, __m128, __m128, 1)
+test_2 (_mm_blend_pd, __m128d, __m128d, __m128d, 1)
+test_2 (_mm_dp_ps, __m128, __m128, __m128, 1)
+test_2 (_mm_dp_pd, __m128d, __m128d, __m128d, 1)
+test_2 (_mm_insert_ps, __m128, __m128, __m128, 1)
+test_1 (_mm_extract_ps, int, __m128, 1)
+test_2 (_mm_insert_epi8, __m128i, __m128i, int, 1)
+test_2 (_mm_insert_epi32, __m128i, __m128i, int, 1)
+#ifdef __x86_64__
+test_2 (_mm_insert_epi64, __m128i, __m128i, long long, 1)
+#endif
+test_1 (_mm_extract_epi8, int, __m128i, 1)
+test_1 (_mm_extract_epi32, int, __m128i, 1)
+#ifdef __x86_64__
+test_1 (_mm_extract_epi64, long long, __m128i, 1)
+#endif
+test_2 (_mm_mpsadbw_epu8, __m128i, __m128i, __m128i, 1)
+test_2 (_mm_cmpistrm, __m128i, __m128i, __m128i, 1)
+test_2 (_mm_cmpistri, int, __m128i, __m128i, 1)
+test_4 (_mm_cmpestrm, __m128i, __m128i, int, __m128i, int, 1)
+test_4 (_mm_cmpestri, int, __m128i, int, __m128i, int, 1)
+test_2 (_mm_cmpistra, int, __m128i, __m128i, 1)
+test_2 (_mm_cmpistrc, int, __m128i, __m128i, 1)
+test_2 (_mm_cmpistro, int, __m128i, __m128i, 1)
+test_2 (_mm_cmpistrs, int, __m128i, __m128i, 1)
+test_2 (_mm_cmpistrz, int, __m128i, __m128i, 1)
+test_4 (_mm_cmpestra, int, __m128i, int, __m128i, int, 1)
+test_4 (_mm_cmpestrc, int, __m128i, int, __m128i, int, 1)
+test_4 (_mm_cmpestro, int, __m128i, int, __m128i, int, 1)
+test_4 (_mm_cmpestrs, int, __m128i, int, __m128i, int, 1)
+test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1)
+
+/* wmmintrin.h (AES/PCLMUL).  */
+#ifdef DIFFERENT_PRAGMAS
+#pragma GCC target ("aes,pclmul")
+#endif
+#include <wmmintrin.h>
+test_1 (_mm_aeskeygenassist_si128, __m128i, __m128i, 1)
+test_2 (_mm_clmulepi64_si128, __m128i, __m128i, __m128i, 1)
+
+/* smmintrin.h (SSE4.1).  */
+test_1 (_mm_round_pd, __m128d, __m128d, 1)
+test_1 (_mm_round_ps, __m128, __m128, 1)
+test_2 (_mm_round_sd, __m128d, __m128d, __m128d, 1)
+test_2 (_mm_round_ss, __m128, __m128, __m128, 1)
diff -upNw gcc-xop-fma4/gcc/testsuite/g++.dg/other/i386-2.C gcc-xop/gcc/testsuite/g++.dg/other/i386-2.C
--- gcc-xop-fma4/gcc/testsuite/g++.dg/other/i386-2.C	2009-09-21 18:20:30.000000000 -0500
+++ gcc-xop/gcc/testsuite/g++.dg/other/i386-2.C	2009-09-17 12:33:11.000000000 -0500
@@ -1,7 +1,7 @@
-/* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h and mm_malloc.h are
-   usable with -O -pedantic-errors.  */
+/* Test that {,x,e,p,t,s,w,a,i}mmintrin.h, fma4intrin.h, mm3dnow.h and
+   mm_malloc.h are usable with -O -pedantic-errors.  */
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -m3dnow -mavx -msse4a -maes -mpclmul" } */
+/* { dg-options "-O -pedantic-errors -march=k8 -m3dnow -mavx -msse4a -mfma4 -maes -mpclmul" } */
 
 #include <x86intrin.h>
 
diff -upNw gcc-xop-fma4/gcc/testsuite/g++.dg/other/i386-3.C gcc-xop/gcc/testsuite/g++.dg/other/i386-3.C
--- gcc-xop-fma4/gcc/testsuite/g++.dg/other/i386-3.C	2009-09-21 18:20:30.000000000 -0500
+++ gcc-xop/gcc/testsuite/g++.dg/other/i386-3.C	2009-09-17 12:34:29.000000000 -0500
@@ -1,6 +1,6 @@
-/* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h and mm_malloc.h are
-   usable with -O -fkeep-inline-functions.  */
+/* Test that {,x,e,p,t,s,w,a,i}mmintrin.h, fma4intrin.h, mm3dnow.h and
+    mm_malloc.h are usable with -O -fkeep-inline-functions.  */
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -m3dnow -mavx -msse4a -maes -mpclmul" } */
+/* { dg-options "-O -fkeep-inline-functions -march=k8 -m3dnow -mavx -msse4a -mfma4 -maes -mpclmul" } */
 
 #include <x86intrin.h>
diff -upNw gcc-xop-fma4/gcc/testsuite/g++.dg/other/i386-5.C gcc-xop/gcc/testsuite/g++.dg/other/i386-5.C
--- gcc-xop-fma4/gcc/testsuite/g++.dg/other/i386-5.C	2009-09-21 18:20:30.000000000 -0500
+++ gcc-xop/gcc/testsuite/g++.dg/other/i386-5.C	2009-09-17 12:35:49.000000000 -0500
@@ -1,6 +1,6 @@
-/* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h and mm_malloc.h are
-   usable with -O -fkeep-inline-functions.  */
+/* Test that {,x,e,p,t,s,w,a,i}mmintrin.h, fma4intrin.h, mm3dnow.h and
+   mm_malloc.h are usable with -O -fkeep-inline-functions.  */
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -m3dnow -mavx -msse4a -maes -mpclmul" } */
+/* { dg-options "-O -fkeep-inline-functions -march=k8 -m3dnow -mavx -msse4a -mfma4 -maes -mpclmul" } */
 
 #include <x86intrin.h>
diff -upNw gcc-xop-fma4/gcc/testsuite/g++.dg/other/i386-6.C gcc-xop/gcc/testsuite/g++.dg/other/i386-6.C
--- gcc-xop-fma4/gcc/testsuite/g++.dg/other/i386-6.C	2009-09-21 18:20:30.000000000 -0500
+++ gcc-xop/gcc/testsuite/g++.dg/other/i386-6.C	2009-09-17 12:36:32.000000000 -0500
@@ -1,6 +1,6 @@
-/* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h and mm_malloc.h are
-   usable with -O -pedantic-errors.  */
+/* Test that {,x,e,p,t,s,w,a,i}mmintrin.h, fma4intrin.h, mm3dnow.h and
+   mm_malloc.h are usable with -O -pedantic-errors.  */
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -m3dnow -mavx -msse4a -maes -mpclmul" } */
+/* { dg-options "-O -pedantic-errors -march=k8 -m3dnow -mavx -msse4a -mfma4 -maes -mpclmul" } */
 
 #include <x86intrin.h>
Follow-Ups:
- Re: PATCH: Add FMA4 128-bit and 256-bit support for upcoming AMD Orochi processor.
  - From: Jan Hubicka
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]