This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
MMX documentation
- To: <gcc-patches at gcc dot gnu dot org>
- Subject: MMX documentation
- From: Bernd Schmidt <bernds at redhat dot com>
- Date: Sat, 29 Sep 2001 17:31:58 +0100 (BST)
Here's a new patch that adds both generic documentation for vector extensions,
as well as specific docs for -mmmx/-msse/-m3dnow for i386. It also fixes a
typing problem I discovered while going through it.
I used the original version of the docs I sent last week, since one of the
reasons not do so (__attribute__((mode)) being undocumented) existed only
in my imagination, and no one else seemed overly concerned about the ugliness
of this interface.
The documentation for the builtins is somewhat brief; if someone wants to
program with these extensions they probably need to have an instruction set
reference manual anyway so I thought it pointless (and too much work :-) to
document precisely what all the instructions do.
Bootstrapped on i686-linux & checked in.
Bernd
* config/i386/i386.c (init_mmx_sse_builtins): Fix type of storelps and
storehps builtins.
* doc/extend.texi (Vector Extensions): New node.
* doc/invoke.texi (Machine Dependent Options): Add documentation for
i386 -mmmx, -msse, -m3dnow.
Index: config/i386/i386.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/config/i386/i386.c,v
retrieving revision 1.309
diff -c -p -r1.309 i386.c
*** i386.c 2001/09/25 14:59:35 1.309
--- config/i386/i386.c 2001/09/29 15:13:06
*************** ix86_init_mmx_sse_builtins ()
*** 10534,10543 ****
tree_cons (NULL_TREE, V4SF_type_node,
tree_cons (NULL_TREE, pv2si_type_node,
endlink)));
! tree v4sf_ftype_pv2si_v4sf
! = build_function_type (V4SF_type_node,
! tree_cons (NULL_TREE, V4SF_type_node,
! tree_cons (NULL_TREE, pv2si_type_node,
endlink)));
tree void_ftype_pfloat_v4sf
= build_function_type (void_type_node,
--- 10534,10543 ----
tree_cons (NULL_TREE, V4SF_type_node,
tree_cons (NULL_TREE, pv2si_type_node,
endlink)));
! tree void_ftype_pv2si_v4sf
! = build_function_type (void_type_node,
! tree_cons (NULL_TREE, pv2si_type_node,
! tree_cons (NULL_TREE, V4SF_type_node,
endlink)));
tree void_ftype_pfloat_v4sf
= build_function_type (void_type_node,
*************** ix86_init_mmx_sse_builtins ()
*** 10665,10673 ****
for (i = 0, d = bdesc_comi; i < sizeof (bdesc_comi) / sizeof *d; i++, d++)
def_builtin (d->mask, d->name, int_ftype_v4sf_v4sf, d->code);
! def_builtin (MASK_SSE, "__builtin_ia32_packsswb", v8qi_ftype_v4hi_v4hi, IX86_BUILTIN_PACKSSWB);
! def_builtin (MASK_SSE, "__builtin_ia32_packssdw", v4hi_ftype_v2si_v2si, IX86_BUILTIN_PACKSSDW);
! def_builtin (MASK_SSE, "__builtin_ia32_packuswb", v8qi_ftype_v4hi_v4hi, IX86_BUILTIN_PACKUSWB);
def_builtin (MASK_SSE, "__builtin_ia32_cvtpi2ps", v4sf_ftype_v4sf_v2si, IX86_BUILTIN_CVTPI2PS);
def_builtin (MASK_SSE, "__builtin_ia32_cvtps2pi", v2si_ftype_v4sf, IX86_BUILTIN_CVTPS2PI);
--- 10665,10673 ----
for (i = 0, d = bdesc_comi; i < sizeof (bdesc_comi) / sizeof *d; i++, d++)
def_builtin (d->mask, d->name, int_ftype_v4sf_v4sf, d->code);
! def_builtin (MASK_MMX, "__builtin_ia32_packsswb", v8qi_ftype_v4hi_v4hi, IX86_BUILTIN_PACKSSWB);
! def_builtin (MASK_MMX, "__builtin_ia32_packssdw", v4hi_ftype_v2si_v2si, IX86_BUILTIN_PACKSSDW);
! def_builtin (MASK_MMX, "__builtin_ia32_packuswb", v8qi_ftype_v4hi_v4hi, IX86_BUILTIN_PACKUSWB);
def_builtin (MASK_SSE, "__builtin_ia32_cvtpi2ps", v4sf_ftype_v4sf_v2si, IX86_BUILTIN_CVTPI2PS);
def_builtin (MASK_SSE, "__builtin_ia32_cvtps2pi", v2si_ftype_v4sf, IX86_BUILTIN_CVTPS2PI);
*************** ix86_init_mmx_sse_builtins ()
*** 10690,10697 ****
def_builtin (MASK_SSE, "__builtin_ia32_loadhps", v4sf_ftype_v4sf_pv2si, IX86_BUILTIN_LOADHPS);
def_builtin (MASK_SSE, "__builtin_ia32_loadlps", v4sf_ftype_v4sf_pv2si, IX86_BUILTIN_LOADLPS);
! def_builtin (MASK_SSE, "__builtin_ia32_storehps", v4sf_ftype_pv2si_v4sf, IX86_BUILTIN_STOREHPS);
! def_builtin (MASK_SSE, "__builtin_ia32_storelps", v4sf_ftype_pv2si_v4sf, IX86_BUILTIN_STORELPS);
def_builtin (MASK_SSE, "__builtin_ia32_movmskps", int_ftype_v4sf, IX86_BUILTIN_MOVMSKPS);
def_builtin (MASK_SSE, "__builtin_ia32_pmovmskb", int_ftype_v8qi, IX86_BUILTIN_PMOVMSKB);
--- 10690,10697 ----
def_builtin (MASK_SSE, "__builtin_ia32_loadhps", v4sf_ftype_v4sf_pv2si, IX86_BUILTIN_LOADHPS);
def_builtin (MASK_SSE, "__builtin_ia32_loadlps", v4sf_ftype_v4sf_pv2si, IX86_BUILTIN_LOADLPS);
! def_builtin (MASK_SSE, "__builtin_ia32_storehps", void_ftype_pv2si_v4sf, IX86_BUILTIN_STOREHPS);
! def_builtin (MASK_SSE, "__builtin_ia32_storelps", void_ftype_pv2si_v4sf, IX86_BUILTIN_STORELPS);
def_builtin (MASK_SSE, "__builtin_ia32_movmskps", int_ftype_v4sf, IX86_BUILTIN_MOVMSKPS);
def_builtin (MASK_SSE, "__builtin_ia32_pmovmskb", int_ftype_v8qi, IX86_BUILTIN_PMOVMSKB);
Index: doc/extend.texi
===================================================================
RCS file: /cvs/gcc/egcs/gcc/doc/extend.texi,v
retrieving revision 1.25
diff -c -p -r1.25 extend.texi
*** extend.texi 2001/09/21 01:27:06 1.25
--- doc/extend.texi 2001/09/29 15:14:07
*************** extensions, accepted by GCC in C89 mode
*** 430,435 ****
--- 430,436 ----
* Function Names:: Printable strings which are the name of the current
function.
* Return Address:: Getting the return or frame address of a function.
+ * Vector Extensions:: Using vector instructions through built-in functions.
* Other Builtins:: Other built-in functions.
* Pragmas:: Pragmas accepted by GCC.
@end menu
*************** extensions, accepted by GCC in C89 mode
*** 483,488 ****
--- 484,490 ----
* Function Names:: Printable strings which are the name of the current
function.
* Return Address:: Getting the return or frame address of a function.
+ * Vector Extensions:: Using vector instructions through built-in functions.
* Other Builtins:: Other built-in functions.
* Pragmas:: Pragmas accepted by GCC.
@end menu
*************** the first frame pointer is properly init
*** 4146,4151 ****
--- 4148,4222 ----
This function should only be used with a non-zero argument for debugging
purposes.
@end deftypefn
+
+ @node Vector Extensions
+ @section Using vector instructions through built-in functions
+
+ On some targets, the instruction set contains SIMD vector instructions that
+ operate on multiple values contained in one large register at the same time.
+ For example, on the i386 the MMX, 3Dnow! and SSE extensions can be used
+ this way.
+
+ The first step in using these extensions is to provide the necessary data
+ types. This should be done using an appropriate @code{typedef}:
+
+ @example
+ typedef int v4si __attribute__ ((mode(V4SI)));
+ @end example
+
+ The base type @code{int} is effectively ignored by the compiler, the
+ actual properties of the new type @code{v4si} are defined by the
+ @code{__attribute__}. It defines the machine mode to be used; for vector
+ types these have the form @code{VnB}; @code{n} should be the number of
+ elements in the vector, and @code{B} should be the base mode of the
+ individual elements. The following can be used as base modes:
+
+ @table @code
+ @item QI
+ An integer that is as wide as the smallest addressable unit, usually 8 bits.
+ @item HI
+ An integer, twice as wide as a QI mode integer, usually 16 bits.
+ @item SI
+ An integer, four times as wide as a QI mode integer, usually 32 bits.
+ @item DI
+ An integer, eight times as wide as a QI mode integer, usually 64 bits.
+ @item SF
+ A floating point value, as wide as a SI mode integer, usually 32 bits.
+ @item DF
+ A floating point value, as wide as a DI mode integer, usually 64 bits.
+ @end table
+
+ Not all base types or combinations are always valid; which modes can be used
+ is determined by the target machine. For example, if targetting the i386 MMX
+ extensions, only @code{V8QI}, @code{V4HI} and @code{V2SI} are allowed modes.
+
+ There are no @code{V1xx} vector modes - they would be identical to the
+ corresponding base mode.
+
+ There is no distinction between signed and unsigned vector modes. This
+ distinction is made by the operations that perform on the vectors, not
+ by the data type.
+
+ The types defined in this manner are somewhat special, they cannot be
+ used with most normal C operations (i.e., a vector addition can @emph{not}
+ be represented by a normal addition of two vector type variables). You
+ can declare only variables and use them in function calls and returns, as
+ well as in assignments and some casts. It is possible to cast from one
+ vector type to another, provided they are of the same size (in fact, you
+ can also cast vectors to and from other datatypes of the same size).
+
+ A port that supports vector operations provides a set of built-in functions
+ that can be used to operate on vectors. For example, a function to add two
+ vectors and multiply the result by a third could look like this:
+
+ @example
+ v4si f (v4si a, v4si b, v4si c)
+ @{
+ v4si tmp = __builtin_addv4si (a, b);
+ return __builtin_mulv4si (tmp, c);
+ @}
+
+ @end example
@node Other Builtins
@section Other built-in functions provided by GCC
Index: doc/invoke.texi
===================================================================
RCS file: /cvs/gcc/egcs/gcc/doc/invoke.texi,v
retrieving revision 1.52
diff -c -p -r1.52 invoke.texi
*** invoke.texi 2001/09/11 16:49:56 1.52
--- doc/invoke.texi 2001/09/29 15:14:47
*************** in the following sections.
*** 471,476 ****
--- 471,477 ----
-mno-fp-ret-in-387 -msoft-float -msvr3-shlib @gol
-mno-wide-multiply -mrtd -malign-double @gol
-mpreferred-stack-boundary=@var{num} @gol
+ -mmmx -msse -m3dnow @gol
-mthreads -mno-align-stringops -minline-all-stringops @gol
-mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol
-m96bit-long-double -mregparm=@var{num} -momit-leaf-frame-pointer}
*************** This extra alignment does consume extra
*** 7599,7604 ****
--- 7600,7982 ----
to stack space usage, such as embedded systems and operating system kernels,
may want to reduce the preferred alignment to
@option{-mpreferred-stack-boundary=2}.
+
+ @item -mmmx
+ @itemx -mno-mmx
+ @item -msse
+ @itemx -mno-sse
+ @item -m3dnow
+ @itemx -mno-3dnow
+ @opindex mmmx
+ @opindex mno-mmx
+ @opindex msse
+ @opindex mno-sse
+ @opindex m3dnow
+ @opindex mno-3dnow
+ These switches enable or disable the use of built-in functions that allow
+ direct access to the MMX, SSE and 3Dnow extensions of the instruction set.
+
+ The following machine modes are available for use with MMX builtins
+ (@pxref{Vector Extensions}): @code{V2SI} for a vector of two 32 bit integers,
+ @code{V4HI} for a vector of four 16 bit integers, and @code{V8QI} for a
+ vector of eight 8 bit integers. Some of the builtins operate on MMX
+ registers as a whole 64 bit entity, these use @code{DI} as their mode.
+
+ If 3Dnow extensions are enabled, @code{V2SF} is used as a mode for a vector
+ of two 32 bit floating point values.
+
+ If SSE extensions are enabled, @code{V4SF} is used for a vector of four 32 bit
+ floating point values. Some instructions use a vector of four 32 bit
+ integers, these use @code{V4SI}. Finally, some instructions operate on an
+ entire vector register, interpreting it as a 128 bit integer, these use mode
+ @code{TI}.
+
+ The following builtins are made available by @option{-mmmx}:
+ @table @code
+ @item v8qi __builtin_ia32_paddb (v8qi, v8qi)
+ Generates the @code{paddb} machine instruction.
+ @item v4hi __builtin_ia32_paddw (v4hi, v4hi)
+ Generates the @code{paddw} machine instruction.
+ @item v2si __builtin_ia32_paddd (v2si, v2si)
+ Generates the @code{paddd} machine instruction.
+ @item v8qi __builtin_ia32_psubb (v8qi, v8qi)
+ Generates the @code{psubb} machine instruction.
+ @item v4hi __builtin_ia32_psubw (v4hi, v4hi)
+ Generates the @code{psubw} machine instruction.
+ @item v2si __builtin_ia32_psubd (v2si, v2si)
+ Generates the @code{psubd} machine instruction.
+
+ @item v8qi __builtin_ia32_paddsb (v8qi, v8qi)
+ Generates the @code{paddsb} machine instruction.
+ @item v4hi __builtin_ia32_paddsw (v4hi, v4hi)
+ Generates the @code{paddsw} machine instruction.
+ @item v8qi __builtin_ia32_psubsb (v8qi, v8qi)
+ Generates the @code{psubsb} machine instruction.
+ @item v4hi __builtin_ia32_psubsw (v4hi, v4hi)
+ Generates the @code{psubsw} machine instruction.
+
+ @item v8qi __builtin_ia32_paddusb (v8qi, v8qi)
+ Generates the @code{paddusb} machine instruction.
+ @item v4hi __builtin_ia32_paddusw (v4hi, v4hi)
+ Generates the @code{paddusw} machine instruction.
+ @item v8qi __builtin_ia32_psubusb (v8qi, v8qi)
+ Generates the @code{psubusb} machine instruction.
+ @item v4hi __builtin_ia32_psubusw (v4hi, v4hi)
+ Generates the @code{psubusw} machine instruction.
+
+ @item v4hi __builtin_ia32_pmullw (v4hi, v4hi)
+ Generates the @code{pmullw} machine instruction.
+ @item v4hi __builtin_ia32_pmulhw (v4hi, v4hi)
+ Generates the @code{pmulhw} machine instruction.
+
+ @item di __builtin_ia32_pand (di, di)
+ Generates the @code{pand} machine instruction.
+ @item di __builtin_ia32_pandn (di,di)
+ Generates the @code{pandn} machine instruction.
+ @item di __builtin_ia32_por (di, di)
+ Generates the @code{por} machine instruction.
+ @item di __builtin_ia32_pxor (di, di)
+ Generates the @code{pxor} machine instruction.
+
+ @item v8qi __builtin_ia32_pcmpeqb (v8qi, v8qi)
+ Generates the @code{pcmpeqb} machine instruction.
+ @item v4hi __builtin_ia32_pcmpeqw (v4hi, v4hi)
+ Generates the @code{pcmpeqw} machine instruction.
+ @item v2si __builtin_ia32_pcmpeqd (v2si, v2si)
+ Generates the @code{pcmpeqd} machine instruction.
+ @item v8qi __builtin_ia32_pcmpgtb (v8qi, v8qi)
+ Generates the @code{pcmpgtb} machine instruction.
+ @item v4hi __builtin_ia32_pcmpgtw (v4hi, v4hi)
+ Generates the @code{pcmpgtw} machine instruction.
+ @item v2si __builtin_ia32_pcmpgtd (v2si, v2si)
+ Generates the @code{pcmpgtd} machine instruction.
+
+ @item v8qi __builtin_ia32_punpckhbw (v8qi, v8qi)
+ Generates the @code{punpckhbw} machine instruction.
+ @item v4hi __builtin_ia32_punpckhwd (v4hi, v4hi)
+ Generates the @code{punpckhwd} machine instruction.
+ @item v2si __builtin_ia32_punpckhdq (v2si, v2si)
+ Generates the @code{punpckhdq} machine instruction.
+ @item v8qi __builtin_ia32_punpcklbw (v8qi, v8qi)
+ Generates the @code{punpcklbw} machine instruction.
+ @item v4hi __builtin_ia32_punpcklwd (v4hi, v4hi)
+ Generates the @code{punpcklwd} machine instruction.
+ @item v2si __builtin_ia32_punpckldq (v2si, v2si)
+ Generates the @code{punpckldq} machine instruction.
+
+ @item v8qi __builtin_ia32_packsswb (v4hi, v4hi)
+ Generates the @code{packsswb} machine instruction.
+ @item v4hi __builtin_ia32_packssdw (v2si, v2si)
+ Generates the @code{packssdw} machine instruction.
+ @item v8qi __builtin_ia32_packuswb (v4hi, v4hi)
+ Generates the @code{packuswb} machine instruction.
+
+ @end table
+
+ The following builtins are made available either with @option{-msse}, or
+ with a combination of @option{-m3dnow} and @option{-march=athlon}.
+ @table @code
+
+ @item v4hi __builtin_ia32_pmulhuw (v4hi, v4hi)
+ Generates the @code{pmulhuw} machine instruction.
+
+ @item v8qi __builtin_ia32_pavgb (v8qi, v8qi)
+ Generates the @code{pavgb} machine instruction.
+ @item v4hi __builtin_ia32_pavgw (v4hi, v4hi)
+ Generates the @code{pavgw} machine instruction.
+ @item v4hi __builtin_ia32_psadbw (v8qi, v8qi)
+ Generates the @code{psadbw} machine instruction.
+
+ @item v8qi __builtin_ia32_pmaxub (v8qi, v8qi)
+ Generates the @code{pmaxub} machine instruction.
+ @item v4hi __builtin_ia32_pmaxsw (v4hi, v4hi)
+ Generates the @code{pmaxsw} machine instruction.
+ @item v8qi __builtin_ia32_pminub (v8qi, v8qi)
+ Generates the @code{pminub} machine instruction.
+ @item v4hi __builtin_ia32_pminsw (v4hi, v4hi)
+ Generates the @code{pminsw} machine instruction.
+
+ @item int __builtin_ia32_pextrw (v4hi, int)
+ Generates the @code{pextrw} machine instruction.
+ @item v4hi __builtin_ia32_pinsrw (v4hi, int, int)
+ Generates the @code{pinsrw} machine instruction.
+
+ @item int __builtin_ia32_pmovmskb (v8qi)
+ Generates the @code{pmovmskb} machine instruction.
+ @item void __builtin_ia32_maskmovq (v8qi, v8qi, char *)
+ Generates the @code{maskmovq} machine instruction.
+ @item void __buitlin_ia32_movntq (di *, di)
+ Generates the @code{movntq} machine instruction.
+ @item void __buitlin_ia32_sfence (void)
+ Generates the @code{sfence} machine instruction.
+ @item void __builtin_ia32_prefetch (char *, int selector)
+ Generates a prefetch machine instruction, depending on the value of
+ selector. If @code{selector} is 0, it generates @code{prefetchnta}; for
+ a value of 1, it generates @code{prefetcht0}; for a value of 2, it generates
+ @code{prefetcht1}; and for a value of 3 it generates @code{prefetcht2}.
+
+ @end table
+
+ The following builtins are available when @option{-msse} is used.
+
+ @table @code
+ @item int __buitlin_ia32_comieq (v4sf, v4sf)
+ Generates the @code{comiss} machine instruction and performs an equality
+ comparison. The return value is the truth value of that comparison.
+ @item int __buitlin_ia32_comineq (v4sf, v4sf)
+ Generates the @code{comiss} machine instruction and performs an inequality
+ comparison. The return value is the truth value of that comparison.
+ @item int __buitlin_ia32_comilt (v4sf, v4sf)
+ Generates the @code{comiss} machine instruction and performs a ``less than''
+ comparison. The return value is the truth value of that comparison.
+ @item int __buitlin_ia32_comile (v4sf, v4sf)
+ Generates the @code{comiss} machine instruction and performs a ``less or
+ equal'' comparison. The return value is the truth value of that comparison.
+ @item int __buitlin_ia32_comigt (v4sf, v4sf)
+ Generates the @code{comiss} machine instruction and performs a ``greater than''
+ comparison. The return value is the truth value of that comparison.
+ @item int __buitlin_ia32_comige (v4sf, v4sf)
+ Generates the @code{comiss} machine instruction and performs a ``greater or
+ equal'' comparison. The return value is the truth value of that comparison.
+
+ @item int __buitlin_ia32_ucomieq (v4sf, v4sf)
+ Generates the @code{ucomiss} machine instruction and performs an equality
+ comparison. The return value is the truth value of that comparison.
+ @item int __buitlin_ia32_ucomineq (v4sf, v4sf)
+ Generates the @code{ucomiss} machine instruction and performs an inequality
+ comparison. The return value is the truth value of that comparison.
+ @item int __buitlin_ia32_ucomilt (v4sf, v4sf)
+ Generates the @code{ucomiss} machine instruction and performs a ``less than''
+ comparison. The return value is the truth value of that comparison.
+ @item int __buitlin_ia32_ucomile (v4sf, v4sf)
+ Generates the @code{ucomiss} machine instruction and performs a ``less or
+ equal'' comparison. The return value is the truth value of that comparison.
+ @item int __buitlin_ia32_ucomigt (v4sf, v4sf)
+ Generates the @code{ucomiss} machine instruction and performs a ``greater than''
+ comparison. The return value is the truth value of that comparison.
+ @item int __buitlin_ia32_ucomige (v4sf, v4sf)
+ Generates the @code{ucomiss} machine instruction and performs a ``greater or
+ equal'' comparison. The return value is the truth value of that comparison.
+
+ @item v4sf __buitlin_ia32_addps (v4sf, v4sf)
+ Generates the @code{addps} machine instruction.
+ @item v4sf __buitlin_ia32_addss (v4sf, v4sf)
+ Generates the @code{addss} machine instruction.
+ @item v4sf __buitlin_ia32_subps (v4sf, v4sf)
+ Generates the @code{subps} machine instruction.
+ @item v4sf __buitlin_ia32_subss (v4sf, v4sf)
+ Generates the @code{subss} machine instruction.
+ @item v4sf __buitlin_ia32_mulps (v4sf, v4sf)
+ Generates the @code{mulps} machine instruction.
+ @item v4sf __buitlin_ia32_mulss (v4sf, v4sf)
+ Generates the @code{mulss} machine instruction.
+ @item v4sf __buitlin_ia32_divps (v4sf, v4sf)
+ Generates the @code{divps} machine instruction.
+ @item v4sf __buitlin_ia32_divss (v4sf, v4sf)
+ Generates the @code{divss} machine instruction.
+
+ @item v4si __buitlin_ia32_cmpeqps (v4sf, v4sf)
+ Generates the @code{cmpeqps} machine instruction.
+ @item v4si __buitlin_ia32_cmplts (v4sf, v4sf)
+ Generates the @code{cmpltps} machine instruction.
+ @item v4si __buitlin_ia32_cmpleps (v4sf, v4sf)
+ Generates the @code{cmpleps} machine instruction.
+ @item v4si __buitlin_ia32_cmpgtps (v4sf, v4sf)
+ Generates the @code{cmpgtps} machine instruction.
+ @item v4si __buitlin_ia32_cmpgeps (v4sf, v4sf)
+ Generates the @code{cmpgeps} machine instruction.
+ @item v4si __buitlin_ia32_cmpunordps (v4sf, v4sf)
+ Generates the @code{cmpunodps} machine instruction.
+ @item v4si __buitlin_ia32_cmpneqps (v4sf, v4sf)
+ Generates the @code{cmpeqps} machine instruction.
+ @item v4si __buitlin_ia32_cmpnltps (v4sf, v4sf)
+ Generates the @code{cmpltps} machine instruction.
+ @item v4si __buitlin_ia32_cmpnleps (v4sf, v4sf)
+ Generates the @code{cmpleps} machine instruction.
+ @item v4si __buitlin_ia32_cmpngtps (v4sf, v4sf)
+ Generates the @code{cmpgtps} machine instruction.
+ @item v4si __buitlin_ia32_cmpngeps (v4sf, v4sf)
+ Generates the @code{cmpgeps} machine instruction.
+ @item v4si __buitlin_ia32_cmpordps (v4sf, v4sf)
+ Generates the @code{cmpunodps} machine instruction.
+
+ @item v4si __buitlin_ia32_cmpeqss (v4sf, v4sf)
+ Generates the @code{cmpeqss} machine instruction.
+ @item v4si __buitlin_ia32_cmpltss (v4sf, v4sf)
+ Generates the @code{cmpltss} machine instruction.
+ @item v4si __buitlin_ia32_cmpless (v4sf, v4sf)
+ Generates the @code{cmpless} machine instruction.
+ @item v4si __buitlin_ia32_cmpgtss (v4sf, v4sf)
+ Generates the @code{cmpgtss} machine instruction.
+ @item v4si __buitlin_ia32_cmpgess (v4sf, v4sf)
+ Generates the @code{cmpgess} machine instruction.
+ @item v4si __buitlin_ia32_cmpunordss (v4sf, v4sf)
+ Generates the @code{cmpunodss} machine instruction.
+ @item v4si __buitlin_ia32_cmpneqss (v4sf, v4sf)
+ Generates the @code{cmpeqss} machine instruction.
+ @item v4si __buitlin_ia32_cmpnlts (v4sf, v4sf)
+ Generates the @code{cmpltss} machine instruction.
+ @item v4si __buitlin_ia32_cmpnless (v4sf, v4sf)
+ Generates the @code{cmpless} machine instruction.
+ @item v4si __buitlin_ia32_cmpngtss (v4sf, v4sf)
+ Generates the @code{cmpgtss} machine instruction.
+ @item v4si __buitlin_ia32_cmpngess (v4sf, v4sf)
+ Generates the @code{cmpgess} machine instruction.
+ @item v4si __buitlin_ia32_cmpordss (v4sf, v4sf)
+ Generates the @code{cmpunodss} machine instruction.
+
+ @item v4sf __buitlin_ia32_maxps (v4sf, v4sf)
+ Generates the @code{maxps} machine instruction.
+ @item v4sf __buitlin_ia32_maxsss (v4sf, v4sf)
+ Generates the @code{maxss} machine instruction.
+ @item v4sf __buitlin_ia32_minps (v4sf, v4sf)
+ Generates the @code{minps} machine instruction.
+ @item v4sf __buitlin_ia32_minsss (v4sf, v4sf)
+ Generates the @code{minss} machine instruction.
+
+ @item ti __buitlin_ia32_andps (ti, ti)
+ Generates the @code{andps} machine instruction.
+ @item ti __buitlin_ia32_andnps (ti, ti)
+ Generates the @code{andnps} machine instruction.
+ @item ti __buitlin_ia32_orps (ti, ti)
+ Generates the @code{orps} machine instruction.
+ @item ti __buitlin_ia32_xorps (ti, ti)
+ Generates the @code{xorps} machine instruction.
+
+ @item v4sf __buitlin_ia32_movps (v4sf, v4sf)
+ Generates the @code{movps} machine instruction.
+ @item v4sf __buitlin_ia32_movhlps (v4sf, v4sf)
+ Generates the @code{movhlps} machine instruction.
+ @item v4sf __buitlin_ia32_movlhps (v4sf, v4sf)
+ Generates the @code{movlhps} machine instruction.
+ @item v4sf __buitlin_ia32_unpckhps (v4sf, v4sf)
+ Generates the @code{unpckhps} machine instruction.
+ @item v4sf __buitlin_ia32_unpcklps (v4sf, v4sf)
+ Generates the @code{unpcklps} machine instruction.
+
+ @item v4sf __buitlin_ia32_cvtpi2ps (v4sf, v2si)
+ Generates the @code{cvtpi2ps} machine instruction.
+ @item v2si __buitlin_ia32_cvtps2pi (v4sf)
+ Generates the @code{cvtps2pi} machine instruction.
+ @item v4sf __buitlin_ia32_cvtsi2ss (v4sf, int)
+ Generates the @code{cvtsi2ss} machine instruction.
+ @item int __buitlin_ia32_cvtss2si (v4sf)
+ Generates the @code{cvtsi2ss} machine instruction.
+ @item v2si __buitlin_ia32_cvttps2pi (v4sf)
+ Generates the @code{cvttps2pi} machine instruction.
+ @item int __buitlin_ia32_cvttss2si (v4sf)
+ Generates the @code{cvttsi2ss} machine instruction.
+
+ @item v4sf __buitlin_ia32_rcpps (v4sf)
+ Generates the @code{rcpps} machine instruction.
+ @item v4sf __buitlin_ia32_rsqrtps (v4sf)
+ Generates the @code{rsqrtps} machine instruction.
+ @item v4sf __buitlin_ia32_sqrtps (v4sf)
+ Generates the @code{sqrtps} machine instruction.
+ @item v4sf __buitlin_ia32_rcpss (v4sf)
+ Generates the @code{rcpss} machine instruction.
+ @item v4sf __buitlin_ia32_rsqrtss (v4sf)
+ Generates the @code{rsqrtss} machine instruction.
+ @item v4sf __buitlin_ia32_sqrtss (v4sf)
+ Generates the @code{sqrtss} machine instruction.
+
+ @item v4sf __buitlin_ia32_shufps (v4sf, v4sf, int)
+ Generates the @code{shufps} machine instruction.
+
+ @item v4sf __buitlin_ia32_loadaps (float *)
+ Generates the @code{movaps} machine instruction as a load from memory.
+ @item void __buitlin_ia32_storeaps (float *, v4sf)
+ Generates the @code{movaps} machine instruction as a store to memory.
+ @item v4sf __buitlin_ia32_loadups (float *)
+ Generates the @code{movups} machine instruction as a load from memory.
+ @item void __buitlin_ia32_storeups (float *, v4sf)
+ Generates the @code{movups} machine instruction as a store to memory.
+ @item v4sf __buitlin_ia32_loadsss (float *)
+ Generates the @code{movss} machine instruction as a load from memory.
+ @item void __buitlin_ia32_storess (float *, v4sf)
+ Generates the @code{movss} machine instruction as a store to memory.
+
+ @item v4sf __buitlin_ia32_loadhps (v4sf, v2si *)
+ Generates the @code{movhps} machine instruction as a load from memory.
+ @item v4sf __buitlin_ia32_loadlps (v4sf, v2si *)
+ Generates the @code{movlps} machine instruction as a load from memory
+ @item void __buitlin_ia32_storehps (v4sf, v2si *)
+ Generates the @code{movhps} machine instruction as a store to memory.
+ @item void __buitlin_ia32_storelps (v4sf, v2si *)
+ Generates the @code{movlps} machine instruction as a store to memory.
+
+ @item void __buitlin_ia32_movntps (float *, v4sf)
+ Generates the @code{movntps} machine instruction.
+ @item int __buitlin_ia32_movmskps (v4sf)
+ Generates the @code{movntps} machine instruction.
+
+ @item void __buitlin_ia32_storeps1 (float *, v4sf)
+ Generates the @code{movaps} machine instruction as a store to memory.
+ Before storing, the value is modified with a @code{shufps} instruction
+ so that the lowest of the four floating point elements is replicated
+ across the entire vector that is stored.
+ @item void __buitlin_ia32_storerps (float *, v4sf)
+ Generates the @code{movaps} machine instruction as a store to memory.
+ Before storing, the value is modified with a @code{shufps} instruction
+ so that the order of the four floating point elements in the vector is
+ reversed.
+ @item v4sf __buitlin_ia32_loadps1 (float *)
+ Generates a @code{movss} machine instruction to load a floating point
+ value from memory, and a @code{shufps} instruction to replicate the
+ loaded value across all four elements of the result vector.
+ @item v4sf __buitlin_ia32_loadrps (float *)
+ Generates a @code{movaps} machine instruction to load a vector from
+ memory, and a @code{shufps} instruction to reverse the order of the
+ four floating point elements in the result vector.
+ @item v4sf __builtin_ia32_setps (float, float, float, float)
+ Constructs a vector from four single floating point values. The return
+ value is equal to the value that would result from storing the four
+ arguments into consecutive memory locations and then executing a
+ @code{movaps} to load the vector from memory.
+ @item v4sf __builtin_ia32_setps1 (float)
+ Constructs a vector from a single floating point value by replicating
+ it across all four elements of the result vector.
+ @end table
@item -mpush-args
@itemx -mno-push-args