39228 – [4.3/4.4 Regression] 387 optimised __builtin_isinf() gives incorrect result

Bug 39228 - [4.3/4.4 Regression] 387 optimised __builtin_isinf() gives incorrect result

Summary: [4.3/4.4 Regression] 387 optimised __builtin_isinf() gives incorrect result

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.3.2

Importance:	P3 normal
Target Milestone:	4.3.4
Assignee:	Uroš Bizjak

URL:	http://gcc.gnu.org/ml/gcc-patches/200...
Keywords:	patch

Depends on:
Blocks:

Reported:	2009-02-18 08:31 UTC by Stewart Smith
Modified:	2009-02-20 10:41 UTC (History)
CC List:	2 users (show)

See Also:
Host:	i486-linux-gnu
Target:	i486-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed:	2009-02-18 09:47:25

Attachments
patch (683 bytes, patch) 2009-02-18 10:33 UTC, Uroš Bizjak	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Stewart Smith 2009-02-18 08:31:28 UTC

#include <stdio.h>
#include <math.h>

int main()
{
        double a= 10.0;
        double b= 1e+308;
        printf("%d %d %d\n", isinf(a*b), __builtin_isinf(a*b), __isinf(a*b));
        return 0;
}

mtaylor@drizzle-dev:~$ gcc -o test test.c 
mtaylor@drizzle-dev:~$ ./test
0 0 1
mtaylor@drizzle-dev:~$ gcc -o test test.c -std=c99
mtaylor@drizzle-dev:~$ ./test
1 0 1
mtaylor@drizzle-dev:~$ gcc -o test test.c   -mfpmath=sse -march=pentium4
mtaylor@drizzle-dev:~$ ./test
1 1 1
mtaylor@drizzle-dev:~$ g++ -o test test.c 
mtaylor@drizzle-dev:~$ ./test
1 0 1


Originally I found the simple isinf() case to be different on x86 than x86-64, ppc32 and sparc (32 and 64).

After more research, I found that x86-64 uses the sse instructions to do it (and using sse is the only way for __builtin_isinf() to produce correct results). For the g++ built version, it calls __isinf() instead of inlining (and as can be seen, the __isinf() version is always correct).

Specifically, it's because the optimised 387 code is doing the math in double extended precision inside the FPU. 10.0*1e308 fits in 80bits but not in 64bit. Any code that forces it to be stored and loaded gets the correct result too. e.g.

mtaylor@drizzle-dev:~$ cat test-simple.c
#include <stdio.h>
#include <math.h>

int main()
{
        double a= 10.0;
        double b= 1e+308;
	volatile	double c= a*b;
        printf("%d\n", isinf(c));
        return 0;
}
mtaylor@drizzle-dev:~$ gcc -o test-simple test-simple.c 
mtaylor@drizzle-dev:~$ ./test-simple 
1

With this code you can easily see the load and store:
 8048407:       dc 0d 18 85 04 08       fmull  0x8048518
 804840d:       dd 5d f0                fstpl  -0x10(%ebp)
 8048410:       dd 45 f0                fldl   -0x10(%ebp)
 8048413:       d9 e5                   fxam   

While if you remove volatile, the load and store doesn't happen (at least on -O3, on -O0 it hasn't been optimised away):
 8048407:       dc 0d 18 85 04 08       fmull  0x8048518
 804840d:       c7 44 24 04 10 85 04    movl   $0x8048510,0x4(%esp)
 8048414:       08 
 8048415:       c7 04 24 01 00 00 00    movl   $0x1,(%esp)
 804841c:       d9 e5                   fxam   


This is also a regression from 4.2.4 as it just calls isinf() and doesn't expand the 387 code inline. My guess is the 387 optimisation was added in 4.3.

Recommended fix: store and load in the 387 version so to operate on same precision as elsewhere.

Comment 1 Uroš Bizjak 2009-02-18 09:47:25 UTC

Looking into it.

Comment 2 Uroš Bizjak 2009-02-18 10:33:01 UTC

Created attachment 17323 [details]
patch

Patch currently in testing.

Comment 3 Stewart Smith 2009-02-18 11:20:12 UTC

To implement a work around for us, I'm proposing the patch below.
- the tmp2 being volatile was for who knows what reason (old code)
- The check should (on c99 systems or those with the right compile options enabled, which we do have) detect that we calculate at 80bits but store at 64.
- the volatile temp should ensure a write to memory (and subsequent read)
- where the check is false, the whole code path should be optimised out.

Perhaps it should also be surrounded by an ifdef for the specific gcc version (3.4)?

Thoughts are very welcome.

=== modified file 'drizzled/function/math/round.cc'
--- drizzled/function/math/round.cc	2008-12-16 06:21:46 +0000
+++ drizzled/function/math/round.cc	2009-02-18 09:59:40 +0000
@@ -111,14 +111,22 @@
     tmp2 is here to avoid return the value with 80 bit precision
     This will fix that the test round(0.1,1) = round(0.1,1) is true
   */
-  volatile double tmp2;
+  double tmp2;
 
   tmp=(abs_dec < array_elements(log_10) ?
        log_10[abs_dec] : pow(10.0,(double) abs_dec));
 
+  double value_times_tmp= value * tmp;
+
+  if(sizeof(double) < sizeof(double_t))
+  {
+    volatile double t= value_times_tmp;
+    value_times_tmp= t;
+  }
+
   if (dec_negative && isinf(tmp))
     tmp2= 0;
-  else if (!dec_negative && isinf(value * tmp))
+  else if (!dec_negative && isinf(value_times_tmp))
     tmp2= value;
   else if (truncate)
   {

Comment 4 Uroš Bizjak 2009-02-18 14:15:04 UTC

Patch at http://gcc.gnu.org/ml/gcc-patches/2009-02/msg00825.html

Comment 5 uros 2009-02-19 10:51:18 UTC

Subject: Bug 39228

Author: uros
Date: Thu Feb 19 10:51:04 2009
New Revision: 144293

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=144293
Log:
	PR target/39228
	* config/i386/i386.md (isinfxf2): Split from isinf<mode>2.
	(UNSPEC_FXAM_MEM): New unspec.
	(fxam<mode>2_i387_with_temp): New insn and split pattern.
	(isinf<mode>2): Use MODEF mode iterator.  Force operand[1] through
	memory using fxam<mode>2_i387_with_temp to remove excess precision.

testsuite/ChangeLog:

	PR target/39228
	* gcc.c-torture/execute/pr39228.c: New test.


Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr39228.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.md
    trunk/gcc/testsuite/ChangeLog

Comment 6 uros 2009-02-19 12:44:53 UTC

Subject: Bug 39228

Author: uros
Date: Thu Feb 19 12:44:40 2009
New Revision: 144295

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=144295
Log:
	PR target/39228
	* config/i386/i386.md (isinfxf2): Split from isinf<mode>2.
	(UNSPEC_FXAM_MEM): New unspec.
	(fxam<mode>2_i387_with_temp): New insn and split pattern.
	(isinf<mode>2): Use MODEF mode iterator.  Force operand[1] through
	memory using fxam<mode>2_i387_with_temp to remove excess precision.

testsuite/ChangeLog:

	PR target/39228
	* gcc.c-torture/execute/pr39228.c: New test.


Added:
    branches/gcc-4_3-branch/gcc/testsuite/gcc.c-torture/execute/pr39228.c
Modified:
    branches/gcc-4_3-branch/gcc/ChangeLog
    branches/gcc-4_3-branch/gcc/config/i386/i386.md
    branches/gcc-4_3-branch/gcc/testsuite/ChangeLog

Comment 7 Uroš Bizjak 2009-02-19 12:58:12 UTC

Fixed.

Comment 8 Kazumoto Kojima 2009-02-20 10:41:29 UTC

> Added:
>     trunk/gcc/testsuite/gcc.c-torture/execute/pr39228.c

SH requires -mieee option for the newly added pr39228.c test.
I guess that alpha is on the same boat.  Also it'll fail on
the targets like SPU which have no inf/nan supports.  We need
gcc.c-torture/execute/pr39228.x

if { [istarget "alpha*-*-*"] || [istarget "sh*-*-*"] } {
	# alpha and SH require -mieee for this test.
	set additional_flags "-mieee"
}
if [istarget "spu-*-*"] {
	# No Inf/NaN support on SPU.
	return 1
}

return 0

or something like that, doesn't we?