-mfpmath=387 should enable x387 intrinsics on m64

Jiri Hladky hladky.jiri@googlemail.com
Thu Jun 12 00:48:00 GMT 2008

Hi Tim and all,

thanks for the hints! I have found following
- for in-lined trig functions it's required to use  -funsafe-math-optimizations
- in-lined fsin function is limited to -2^63 to +2^63 range. See
- it's recommended to use  FPREM instruction to reduce the value to
acceptable limits
> The libm for x86-64 supports only SSE functions
- this is not true. objdump -d /usr/lib64/libm.a | grep -B5 -A20
__sinl shows that there is 387 function which is using fsin and fprem1
to compute sin. (See bellow assembly code).
- gcc -c -g -m64 -Wall -mfpmath=387 -funsafe-math-optimizations -o
sin_387_fast_math.o sin.c
will use function fsin directly to compute sin
objdump -d sin_387_fast_math.o
  11:   dd 45 e8                fldl   0xffffffffffffffe8(%rbp)
  14:   d9 fe                   fsin
  16:   dd 5d f8                fstpl  0xfffffffffffffff8(%rbp)

I'm curious why it's not calling 387 implementation of sin from
/usr/lib64/libm.a (see assembly code below - I mean the version with

I have compiled sin.c also as 32-bit application - this one is using
387 implementation of sin from /usr/lib/libm.a (which is identical to
387 implementation of sin from /usr/lib64/libm.a)

The difference can be seen for 2^63 and bigger inputs:
fsin returns back input value: 9223372036854775808 (2^63)
387 implementation of sin from /usr/lib/libm.a (using fprem1) returns
0.987373 (0x1.f988fda35014cp-1)

So the question is, can we force gcc to use 387 implementation of sin
from /usr/lib64/libm.a ? And if not, why is 387 implementation of sin
(using fprem1) included in /usr/lib64/libm.a?

Thanks a lot!

objdump -d /usr/lib64/libm.a | grep -B2 -A20 __sinl
s_sinl.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <__sinl>:
   0:   db 6c 24 08             fldt   0x8(%rsp)
   4:   d9 fe                   fsin
   6:   df e0                   fnstsw %ax
   8:   a9 00 04 00 00          test   $0x400,%eax
   d:   75 01                   jne    10 <__sinl+0x10>
   f:   c3                      retq
  10:   d9 eb                   fldpi
  12:   d8 c0                   fadd   %st(0),%st
  14:   d9 c9                   fxch   %st(1)
  16:   d9 f5                   fprem1
  18:   df e0                   fnstsw %ax
  1a:   a9 00 04 00 00          test   $0x400,%eax
  1f:   75 f5                   jne    16 <__sinl+0x16>
  21:   dd d9                   fstp   %st(1)
  23:   d9 fe                   fsin
  25:   c3                      retq

PS: 387 implementation of sin from /usr/lib/libm.a has obviously some
numerical problems:

          387 (fprem1 && fsin)      fsin       sse        real value
(Mathematica N[Sin[x],8] )
2^63     0.98737328           2^63       0.99993038     0.99993038
2^62    -0.70713293    -0.70713293  -0.70292244    -0.70292244
2^50     0.49639777     0.49639777   0.49639652     0.49639652
2^45     0.71849132     0.71849132   0.71849129     0.71849129

It shows clearly that
-upto 2^63 (fprem1 && fsin) = fsin (fprem1 is only used when fsin
returns it's argument instead of sin)
-sse version is accurate
-fsin is producing wrong values for when argument is big number

What is shocking that even Mathematica (I have only rather old version
5.0) is producing wrong results (actually it's calling fsin from
libm.a) when N[Sin[2^63]] is used. N[Sin[2^63],8] is working fine and
producing valid result. I will test some newer version of Mathematica

On Wed, Jun 11, 2008 at 1:28 AM, Tim Prince <TimothyPrince@sbcglobal.net> wrote:
> Jiri Hladky wrote:
>> Hi all,
>> I have tried to compare sse version of sin() to 387 version of sin() on
>> -core duo (Intel(R) Core(TM)2 CPU         T7400  @ 2.16GHz)
>> -AMD Athlon(tm) 64 X2 Dual Core Processor 4800+
>> To my surprise
>> gcc -c -g -m64 -Wall -mfpmath=387 -o sin_387.o sin.c
>> is still using sse version of sin() from libm.a (and not calling fsin as I
>> would expect)
>> When I use -mfpmath=387 -ffast-math
>> gcc -c -g -m64 -Wall -mfpmath=387 -ffast-math -o sin_387_fast_math.o sin.c
>> then I will get 387 version of sin()!!!
>> This is puzzling me.
>> For me it  seems like bug - I would expect -mfpmath=387 to use always 387
>> version of sin. And why -ffast-math enables 387 is also unclear to me!?
>> Testcase is attached. Run it with
>> make clean && make run
>> ************Results on Athlon***************************************
>> time sin_387 0.5
>> Input: 0x1p-1, Output: 0x1.44e3aefd8ba93p-15
>> 27.99user 0.04system 0:28.36elapsed 98%CPU (0avgtext+0avgdata
>> 0maxresident)k
>> 0inputs+0outputs (0major+93minor)pagefaults 0swaps
>> time sin_sse 0.5
>> Input: 0x1p-1, Output: 0x1.44e3aefd8ba93p-15
>> 28.70user 0.06system 0:29.30elapsed 98%CPU (0avgtext+0avgdata
>> 0maxresident)k
>> 0inputs+0outputs (0major+93minor)pagefaults 0swaps
>> time sin_387_fast_math 0.5
>> Input: 0x1p-1, Output: 0x1.44e3aefe00c04p-15
>> 82.72user 0.08system 1:23.84elapsed 98%CPU (0avgtext+0avgdata
>> 0maxresident)k
>> 0inputs+0outputs (0major+92minor)pagefaults 0swaps
>> **************************************************************************
>> diff sin_387.obj sin_sse.obj
>> =>files are same!!!
>> grep fsin *obj* =>Hits only sin_387_fast_math*obj*
>> I have tried with following two versions: of gcc
>> gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)
>> gcc version 4.2.1 (SUSE Linux)
> Redirecting to gcc-help, where you are more likely to get this answered.  I
> suppose it's something like this:  You don't get in-lined trig functions
> unless you set -ffast-math, possibly on the assumption that a library
> function could take more precautions about corner cases, or because the
> default is to require ERRNO setting.  For example, fsin instruction returns
> the argument rather than its sin when the magnitude of the value is too
> large.  The libm for x86-64 supports only SSE functions, as that is the
> default decreed by the ABI.

More information about the Gcc-help mailing list