Bug 16584 - -msse2 also enabling -mfpmath=sse option causing illegal instruction errors
Summary: -msse2 also enabling -mfpmath=sse option causing illegal instruction errors
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 3.4.0
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-07-16 02:46 UTC by sightspeed
Modified: 2005-07-23 22:49 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description sightspeed 2004-07-16 02:46:56 UTC
The gcc command line option -msse2 seems to be implicitly turning on the -
mfpmath=sse option. The man pages describe the -msse2 as only enabling the use 
of SSE2 intrinsics rather than automatic use.

This bug is very dangorous since it breaks using runtime detection of SSE2 
support. We use the -msse2 option to only generate SSE2 code in areas where we 
have written intrinsics so that a single binary is platform safe.

A simple example is:
gcc -msse2 test.c

<file test.c>
int main(void)
{
        int i = 1;
        float temp2 = 1.0 * (i*i); // Allow for an optimal sse2 int to float 
conversion

        return 0;
}

<test.s> output

        .file   "test.c"
        .text
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $24, %esp
        andl    $-16, %esp
        movl    $0, %eax
        addl    $15, %eax
        addl    $15, %eax
        shrl    $4, %eax
        sall    $4, %eax
        subl    %eax, %esp
        movl    $1, -4(%ebp)
        movl    -4(%ebp), %eax
        imull   -4(%ebp), %eax
        cvtsi2sd        %eax, %xmm0
        cvtsd2ss        %xmm0, %xmm0
        movss   %xmm0, -8(%ebp)
        movl    $0, %eax
        leave
        ret
        .size   main, .-main
        .section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.4.0"

<end>
Note that the movss, cvtsi2sd and cvtsd2ss instructions are used which should 
not happen.
When this is compiled and run on a Pentium 3 machine a SIGILL is generated.



<Additional Information>
<file test.i>
# 1 "test.c"
# 1 "<built-in>"
# 1 "<command line>"
# 1 "test.c"
int main(void)
{
 int i = 1;
 float temp2 = 1.0 * (i*i);

 return 0;
}
<end file>



Reading specs from /usr/lib/gcc/i486-slackware-linux/3.4.0/specs
Configured with: ../gcc-3.4.0/configure --prefix=/usr --enable-shared --enable-
threads=posix --enable-__cxa_atexit --disable-checking --with-gnu-ld --verbose -
-target=i486-slackware-linux --host=i486-slackware-linux
Thread model: posix
gcc version 3.4.0
 /usr/libexec/gcc/i486-slackware-linux/3.4.0/cc1 -E -quiet -v test.c -msse2 -
mtune=i486 -o test.i
ignoring nonexistent directory "/usr/lib/gcc/i486-slackware-
linux/3.4.0/../../../../i486-slackware-linux/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /usr/lib/gcc/i486-slackware-linux/3.4.0/include
 /usr/include
End of search list.
 /usr/libexec/gcc/i486-slackware-linux/3.4.0/cc1 -fpreprocessed test.i -quiet -
dumpbase test.c -msse2 -mtune=i486 -auxbase test -version -o test.s
GNU C version 3.4.0 (i486-slackware-linux)
        compiled by GNU C version 3.4.0.
GGC heuristics: --param ggc-min-expand=64 --param ggc-min-heapsize=64518
 /usr/lib/gcc/i486-slackware-linux/3.4.0/../../../../i486-slackware-
linux/bin/as -V -Qy -o test.o test.s
GNU assembler version 2.15.90.0.3 (i486-slackware-linux) using BFD version 
2.15.90.0.3 20040415
 /usr/libexec/gcc/i486-slackware-linux/3.4.0/collect2 --eh-frame-hdr -m 
elf_i386 -dynamic-linker /lib/ld-linux.so.2 /usr/lib/gcc/i486-slackware-
linux/3.4.0/../../../crt1.o /usr/lib/gcc/i486-slackware-
linux/3.4.0/../../../crti.o /usr/lib/gcc/i486-slackware-linux/3.4.0/crtbegin.o -
L/usr/lib/gcc/i486-slackware-linux/3.4.0 -L/usr/lib/gcc/i486-slackware-
linux/3.4.0 -L/usr/lib/gcc/i486-slackware-linux/3.4.0/../../../../i486-
slackware-linux/lib -L/usr/lib/gcc/i486-slackware-linux/3.4.0/../../.. test.o -
lgcc -lgcc_eh -lc -lgcc -lgcc_eh /usr/lib/gcc/i486-slackware-
linux/3.4.0/crtend.o /usr/lib/gcc/i486-slackware-linux/3.4.0/../../../crtn.o

Also tested and found under gcc-3.3.4
Comment 1 Andrew Pinski 2004-07-16 06:32:06 UTC
Read the documenation: "  -msse2                    Support MMX, SSE and SSE2 built-in functions and code 
generation", see the "and code generation" which means it can generate SSE2 instructions any time, 
sorry.
Comment 2 sightspeed 2004-07-16 17:11:36 UTC
Three responses too this:

1. Just using -msse does not trigger the same behiavor with SSE1 floating point 
calls in regular code.

2. Even if (1) was performing this: The man page section under -mfpmath states 
that the -mfpmath option must explicitly be passed for normal floating point 
code to be repalced with SSE code. Even if you explicity pass -mfpmath=387 -
msse2, it will still generate SSE2 for non-intrinsics code.

3. If you description is correct, How are you supposed to generate binaries 
that are processor safe on x86. We ship a Linux program that has sections 
optimized (using Intrinsics) that are protected by runtime CPUID detection.

The man pages that I have for GCC state for -msse2

         These switches enable or disable the use of built-in
           functions that allow direct access to the MMX, SSE,
           SSE2, SSE3 and 3Dnow extensions of the instruction
           set.

  ------>  To have SSE/SSE2 instructions generated automatically
           from floating-point code, see -mfpmath=sse.



<man page section on -mfpmath >
-mfpmath=unit
           Generate floating point arithmetics for selected unit
           unit.  The choices for unit are:

           387 Use the standard 387 floating point coprocessor
               present majority of chips and emulated otherwise.
               Code compiled with this option will run almost
               everywhere.  The temporary results are computed in
               80bit precision instead of precision specified by
               the type resulting in slightly different results
               compared to most of other chips. See -ffloat-store
               for more detailed description.

               This is the default choice for i386 compiler.

           sse Use scalar floating point instructions present in
               the SSE instruction set.  This instruction set is
               supported by Pentium3 and newer chips, in the AMD
               line by Athlon-4, Athlon-xp and Athlon-mp chips.
               The earlier version of SSE instruction set sup­
               ports only single precision arithmetics, thus the
               double and extended precision arithmetics is still
               done using 387.  Later version, present only in
               Pentium4 and the future AMD x86-64 chips supports
               double precision arithmetics too.

               For i387 you need to use -march=cpu-type, -msse or
               -msse2 switches to enable SSE extensions and make
               this option effective.  For x86-64 compiler, these
               extensions are enabled by default.




Thank You.
Aron Rosenberg
<a href="http://www.sightspeed.com">http://www.sightspeed.com</a>

Comment 3 sightspeed 2004-07-30 17:46:52 UTC
Confirmed that it is still present in 3.4.1
Comment 4 Uroš Bizjak 2004-12-24 14:29:27 UTC
With current TARGET_SSE_MATH work, mainline gcc produces:

main:
      pushl  %ebp
      movl   %esp, %ebp
      subl   $24, %esp
      andl   $-16, %esp
      movl   $0, %eax
      addl   $15, %eax
      addl   $15, %eax
      shrl   $4, %eax
      sall   $4, %eax
      subl   %eax, %esp
      movl   $1, -8(%ebp)
      movl   -8(%ebp), %eax
      imull  -8(%ebp), %eax
      pushl  %eax
      fildl  (%esp)
      leal   4(%esp), %esp
      fstps  -4(%ebp)
      movl   $0, %eax
      leave
      ret

However, -mfpmath just tells which instruction set is preferred. It is -msse,
-mmmx etc. that tells the compiler which instructions it can use, independently
of -mfpmath setting. For example, cvttss2si insn will be generated when -msse is
specified, no matter what -mfpmath setting you use.

-mmmx, -msse and -msse2 are treated the same way as -march=pentium3, etc. You
can not run the code, compiled with -march=pentium4 on i586.

However, this part of documentation should be fixed:

          For i387 you need to use `-march=CPU-TYPE', `-msse' or
          `-msse2' switches to enable SSE extensions and make this
 
I belive, it should read:

          For i386 compiler, you need to use `-march=CPU-TYPE', `-msse' or
          `-msse2' switches to enable SSE extensions and make this
 
Uros.
Comment 5 Uroš Bizjak 2004-12-28 15:33:06 UTC
A documentation patch is waiting for review:
http://gcc.gnu.org/ml/gcc-patches/2004-12/msg01895.html

I guess that documentation patches doesn't qualify for 'patch' keyword. However,
this bug should be marked as INVALID.
Comment 6 Uroš Bizjak 2005-01-05 09:59:37 UTC
CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	uros@gcc.gnu.org	2005-01-05 09:55:57

Modified files:
	gcc            : ChangeLog 
	gcc/doc        : invoke.texi 

Log message:
	* doc/invoke.texi (Intel 386 and AMD x86-64 Options):
	Replace i387 with 'i386 compiler' in -mfpmath=sse option.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7028&r2=2.7029
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/doc/invoke.texi.diff?cvsroot=gcc&r1=1.563&r2=1.564