The gcc command line option -msse2 seems to be implicitly turning on the - mfpmath=sse option. The man pages describe the -msse2 as only enabling the use of SSE2 intrinsics rather than automatic use. This bug is very dangorous since it breaks using runtime detection of SSE2 support. We use the -msse2 option to only generate SSE2 code in areas where we have written intrinsics so that a single binary is platform safe. A simple example is: gcc -msse2 test.c <file test.c> int main(void) { int i = 1; float temp2 = 1.0 * (i*i); // Allow for an optimal sse2 int to float conversion return 0; } <test.s> output .file "test.c" .text .globl main .type main, @function main: pushl %ebp movl %esp, %ebp subl $24, %esp andl $-16, %esp movl $0, %eax addl $15, %eax addl $15, %eax shrl $4, %eax sall $4, %eax subl %eax, %esp movl $1, -4(%ebp) movl -4(%ebp), %eax imull -4(%ebp), %eax cvtsi2sd %eax, %xmm0 cvtsd2ss %xmm0, %xmm0 movss %xmm0, -8(%ebp) movl $0, %eax leave ret .size main, .-main .section .note.GNU-stack,"",@progbits .ident "GCC: (GNU) 3.4.0" <end> Note that the movss, cvtsi2sd and cvtsd2ss instructions are used which should not happen. When this is compiled and run on a Pentium 3 machine a SIGILL is generated. <Additional Information> <file test.i> # 1 "test.c" # 1 "<built-in>" # 1 "<command line>" # 1 "test.c" int main(void) { int i = 1; float temp2 = 1.0 * (i*i); return 0; } <end file> Reading specs from /usr/lib/gcc/i486-slackware-linux/3.4.0/specs Configured with: ../gcc-3.4.0/configure --prefix=/usr --enable-shared --enable- threads=posix --enable-__cxa_atexit --disable-checking --with-gnu-ld --verbose - -target=i486-slackware-linux --host=i486-slackware-linux Thread model: posix gcc version 3.4.0 /usr/libexec/gcc/i486-slackware-linux/3.4.0/cc1 -E -quiet -v test.c -msse2 - mtune=i486 -o test.i ignoring nonexistent directory "/usr/lib/gcc/i486-slackware- linux/3.4.0/../../../../i486-slackware-linux/include" #include "..." search starts here: #include <...> search starts here: /usr/local/include /usr/lib/gcc/i486-slackware-linux/3.4.0/include /usr/include End of search list. /usr/libexec/gcc/i486-slackware-linux/3.4.0/cc1 -fpreprocessed test.i -quiet - dumpbase test.c -msse2 -mtune=i486 -auxbase test -version -o test.s GNU C version 3.4.0 (i486-slackware-linux) compiled by GNU C version 3.4.0. GGC heuristics: --param ggc-min-expand=64 --param ggc-min-heapsize=64518 /usr/lib/gcc/i486-slackware-linux/3.4.0/../../../../i486-slackware- linux/bin/as -V -Qy -o test.o test.s GNU assembler version 2.15.90.0.3 (i486-slackware-linux) using BFD version 2.15.90.0.3 20040415 /usr/libexec/gcc/i486-slackware-linux/3.4.0/collect2 --eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 /usr/lib/gcc/i486-slackware- linux/3.4.0/../../../crt1.o /usr/lib/gcc/i486-slackware- linux/3.4.0/../../../crti.o /usr/lib/gcc/i486-slackware-linux/3.4.0/crtbegin.o - L/usr/lib/gcc/i486-slackware-linux/3.4.0 -L/usr/lib/gcc/i486-slackware- linux/3.4.0 -L/usr/lib/gcc/i486-slackware-linux/3.4.0/../../../../i486- slackware-linux/lib -L/usr/lib/gcc/i486-slackware-linux/3.4.0/../../.. test.o - lgcc -lgcc_eh -lc -lgcc -lgcc_eh /usr/lib/gcc/i486-slackware- linux/3.4.0/crtend.o /usr/lib/gcc/i486-slackware-linux/3.4.0/../../../crtn.o Also tested and found under gcc-3.3.4
Read the documenation: " -msse2 Support MMX, SSE and SSE2 built-in functions and code generation", see the "and code generation" which means it can generate SSE2 instructions any time, sorry.
Three responses too this: 1. Just using -msse does not trigger the same behiavor with SSE1 floating point calls in regular code. 2. Even if (1) was performing this: The man page section under -mfpmath states that the -mfpmath option must explicitly be passed for normal floating point code to be repalced with SSE code. Even if you explicity pass -mfpmath=387 - msse2, it will still generate SSE2 for non-intrinsics code. 3. If you description is correct, How are you supposed to generate binaries that are processor safe on x86. We ship a Linux program that has sections optimized (using Intrinsics) that are protected by runtime CPUID detection. The man pages that I have for GCC state for -msse2 These switches enable or disable the use of built-in functions that allow direct access to the MMX, SSE, SSE2, SSE3 and 3Dnow extensions of the instruction set. ------> To have SSE/SSE2 instructions generated automatically from floating-point code, see -mfpmath=sse. <man page section on -mfpmath > -mfpmath=unit Generate floating point arithmetics for selected unit unit. The choices for unit are: 387 Use the standard 387 floating point coprocessor present majority of chips and emulated otherwise. Code compiled with this option will run almost everywhere. The temporary results are computed in 80bit precision instead of precision specified by the type resulting in slightly different results compared to most of other chips. See -ffloat-store for more detailed description. This is the default choice for i386 compiler. sse Use scalar floating point instructions present in the SSE instruction set. This instruction set is supported by Pentium3 and newer chips, in the AMD line by Athlon-4, Athlon-xp and Athlon-mp chips. The earlier version of SSE instruction set sup ports only single precision arithmetics, thus the double and extended precision arithmetics is still done using 387. Later version, present only in Pentium4 and the future AMD x86-64 chips supports double precision arithmetics too. For i387 you need to use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For x86-64 compiler, these extensions are enabled by default. Thank You. Aron Rosenberg <a href="http://www.sightspeed.com">http://www.sightspeed.com</a>
Confirmed that it is still present in 3.4.1
With current TARGET_SSE_MATH work, mainline gcc produces: main: pushl %ebp movl %esp, %ebp subl $24, %esp andl $-16, %esp movl $0, %eax addl $15, %eax addl $15, %eax shrl $4, %eax sall $4, %eax subl %eax, %esp movl $1, -8(%ebp) movl -8(%ebp), %eax imull -8(%ebp), %eax pushl %eax fildl (%esp) leal 4(%esp), %esp fstps -4(%ebp) movl $0, %eax leave ret However, -mfpmath just tells which instruction set is preferred. It is -msse, -mmmx etc. that tells the compiler which instructions it can use, independently of -mfpmath setting. For example, cvttss2si insn will be generated when -msse is specified, no matter what -mfpmath setting you use. -mmmx, -msse and -msse2 are treated the same way as -march=pentium3, etc. You can not run the code, compiled with -march=pentium4 on i586. However, this part of documentation should be fixed: For i387 you need to use `-march=CPU-TYPE', `-msse' or `-msse2' switches to enable SSE extensions and make this I belive, it should read: For i386 compiler, you need to use `-march=CPU-TYPE', `-msse' or `-msse2' switches to enable SSE extensions and make this Uros.
A documentation patch is waiting for review: http://gcc.gnu.org/ml/gcc-patches/2004-12/msg01895.html I guess that documentation patches doesn't qualify for 'patch' keyword. However, this bug should be marked as INVALID.
CVSROOT: /cvs/gcc Module name: gcc Changes by: uros@gcc.gnu.org 2005-01-05 09:55:57 Modified files: gcc : ChangeLog gcc/doc : invoke.texi Log message: * doc/invoke.texi (Intel 386 and AMD x86-64 Options): Replace i387 with 'i386 compiler' in -mfpmath=sse option. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7028&r2=2.7029 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/doc/invoke.texi.diff?cvsroot=gcc&r1=1.563&r2=1.564