-mfpmath=387 should enable x387 intrinsics on m64
Jiri Hladky
hladky.jiri@googlemail.com
Thu Jun 12 00:48:00 GMT 2008
Hi Tim and all,
thanks for the hints! I have found following
- for in-lined trig functions it's required to use -funsafe-math-optimizations
- in-lined fsin function is limited to -2^63 to +2^63 range. See
http://www.website.masmforum.com/tutorials/fptute/fpuchap10.htm
- it's recommended to use FPREM instruction to reduce the value to
acceptable limits
> The libm for x86-64 supports only SSE functions
- this is not true. objdump -d /usr/lib64/libm.a | grep -B5 -A20
__sinl shows that there is 387 function which is using fsin and fprem1
to compute sin. (See bellow assembly code).
- gcc -c -g -m64 -Wall -mfpmath=387 -funsafe-math-optimizations -o
sin_387_fast_math.o sin.c
will use function fsin directly to compute sin
objdump -d sin_387_fast_math.o
11: dd 45 e8 fldl 0xffffffffffffffe8(%rbp)
14: d9 fe fsin
16: dd 5d f8 fstpl 0xfffffffffffffff8(%rbp)
I'm curious why it's not calling 387 implementation of sin from
/usr/lib64/libm.a (see assembly code below - I mean the version with
fprem1)
I have compiled sin.c also as 32-bit application - this one is using
387 implementation of sin from /usr/lib/libm.a (which is identical to
387 implementation of sin from /usr/lib64/libm.a)
The difference can be seen for 2^63 and bigger inputs:
fsin returns back input value: 9223372036854775808 (2^63)
387 implementation of sin from /usr/lib/libm.a (using fprem1) returns
0.987373 (0x1.f988fda35014cp-1)
So the question is, can we force gcc to use 387 implementation of sin
from /usr/lib64/libm.a ? And if not, why is 387 implementation of sin
(using fprem1) included in /usr/lib64/libm.a?
Thanks a lot!
Jiri
objdump -d /usr/lib64/libm.a | grep -B2 -A20 __sinl
s_sinl.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <__sinl>:
0: db 6c 24 08 fldt 0x8(%rsp)
4: d9 fe fsin
6: df e0 fnstsw %ax
8: a9 00 04 00 00 test $0x400,%eax
d: 75 01 jne 10 <__sinl+0x10>
f: c3 retq
10: d9 eb fldpi
12: d8 c0 fadd %st(0),%st
14: d9 c9 fxch %st(1)
16: d9 f5 fprem1
18: df e0 fnstsw %ax
1a: a9 00 04 00 00 test $0x400,%eax
1f: 75 f5 jne 16 <__sinl+0x16>
21: dd d9 fstp %st(1)
23: d9 fe fsin
25: c3 retq
PS: 387 implementation of sin from /usr/lib/libm.a has obviously some
numerical problems:
387 (fprem1 && fsin) fsin sse real value
(Mathematica N[Sin[x],8] )
2^63 0.98737328 2^63 0.99993038 0.99993038
2^62 -0.70713293 -0.70713293 -0.70292244 -0.70292244
2^50 0.49639777 0.49639777 0.49639652 0.49639652
2^45 0.71849132 0.71849132 0.71849129 0.71849129
It shows clearly that
-upto 2^63 (fprem1 && fsin) = fsin (fprem1 is only used when fsin
returns it's argument instead of sin)
-sse version is accurate
-fsin is producing wrong values for when argument is big number
What is shocking that even Mathematica (I have only rather old version
5.0) is producing wrong results (actually it's calling fsin from
libm.a) when N[Sin[2^63]] is used. N[Sin[2^63],8] is working fine and
producing valid result. I will test some newer version of Mathematica
later.
On Wed, Jun 11, 2008 at 1:28 AM, Tim Prince <TimothyPrince@sbcglobal.net> wrote:
> Jiri Hladky wrote:
>>
>> Hi all,
>>
>> I have tried to compare sse version of sin() to 387 version of sin() on
>> -core duo (Intel(R) Core(TM)2 CPU T7400 @ 2.16GHz)
>> -AMD Athlon(tm) 64 X2 Dual Core Processor 4800+
>>
>> To my surprise
>> gcc -c -g -m64 -Wall -mfpmath=387 -o sin_387.o sin.c
>> is still using sse version of sin() from libm.a (and not calling fsin as I
>> would expect)
>>
>> When I use -mfpmath=387 -ffast-math
>> gcc -c -g -m64 -Wall -mfpmath=387 -ffast-math -o sin_387_fast_math.o sin.c
>> then I will get 387 version of sin()!!!
>>
>> This is puzzling me.
>> For me it seems like bug - I would expect -mfpmath=387 to use always 387
>> version of sin. And why -ffast-math enables 387 is also unclear to me!?
>>
>> Testcase is attached. Run it with
>>
>> make clean && make run
>> ************Results on Athlon***************************************
>> time sin_387 0.5
>> Input: 0x1p-1, Output: 0x1.44e3aefd8ba93p-15
>> 27.99user 0.04system 0:28.36elapsed 98%CPU (0avgtext+0avgdata
>> 0maxresident)k
>> 0inputs+0outputs (0major+93minor)pagefaults 0swaps
>> time sin_sse 0.5
>> Input: 0x1p-1, Output: 0x1.44e3aefd8ba93p-15
>> 28.70user 0.06system 0:29.30elapsed 98%CPU (0avgtext+0avgdata
>> 0maxresident)k
>> 0inputs+0outputs (0major+93minor)pagefaults 0swaps
>> time sin_387_fast_math 0.5
>> Input: 0x1p-1, Output: 0x1.44e3aefe00c04p-15
>> 82.72user 0.08system 1:23.84elapsed 98%CPU (0avgtext+0avgdata
>> 0maxresident)k
>> 0inputs+0outputs (0major+92minor)pagefaults 0swaps
>> **************************************************************************
>> diff sin_387.obj sin_sse.obj
>> =>files are same!!!
>>
>> grep fsin *obj* =>Hits only sin_387_fast_math*obj*
>>
>> I have tried with following two versions: of gcc
>> gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)
>> gcc version 4.2.1 (SUSE Linux)
>>
>>
>
> Redirecting to gcc-help, where you are more likely to get this answered. I
> suppose it's something like this: You don't get in-lined trig functions
> unless you set -ffast-math, possibly on the assumption that a library
> function could take more precautions about corner cases, or because the
> default is to require ERRNO setting. For example, fsin instruction returns
> the argument rather than its sin when the magnitude of the value is too
> large. The libm for x86-64 supports only SSE functions, as that is the
> default decreed by the ABI.
>
More information about the Gcc-help
mailing list