Bug 37489 - const_true_rtx returned for float compare
Summary: const_true_rtx returned for float compare
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.4.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: ssemmx, wrong-code
: 37969 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-09-11 23:35 UTC by raksit
Modified: 2008-11-06 14:03 UTC (History)
5 users (show)

See Also:
Host:
Target: x86_64-linux
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
new test: gcc/testsuite/g++.dg/opt/cse3.C (556 bytes, text/plain)
2008-09-12 18:52 UTC, raksit
Details

Note You need to log in before you can comment on or make changes to this bug.
Description raksit 2008-09-11 23:35:22 UTC
Consider the c++ code:
------
class StatVal {
 public:
  StatVal(double ev, double va)
    : m(ev),
      v(va) {}
  StatVal(const StatVal& other)
    : m(other.m),
      v(other.v) {}
  StatVal& operator*=(const StatVal& other) {
    double A = m == 0 ? 1.0 : v / (m * m);
    double B = other.m == 0 ? 1.0 : other.v / (other.m * other.m);
    m = m * other.m;
    v = m * m * (A + B);
    return *this;
  }
  double m;
  double v;
};

extern "C" void abort (void);
const StatVal two_dot_three(2, 0.3);

int main(int argc, char **argv) {
  StatVal product3(two_dot_three);
  product3 *= two_dot_three;
  if (product3.v > 2.5)
    abort();
}
------

In the above code, product3.v should be 2.4, and the program aborts if this value is greater than 2.5.

Compiled with the trunk gcc, the program aborts if options "-O1 -fno-guess-branch-probability -fcse-follow-jumps -fgcse -frerun-cse-after-loop" are used. Lets call these "baseOptions".
On the gcc-4.3 branch, this program aborts if compiled with those options, and also if compiled with simply "-O2".

Lets look at interesting snippets of assembley code generated with the trunk compiler:
(1) First, with using "baseOptions -fno-if-conversion -fno-rerun-cse-after-loop" (the test program passes with these options):
        ...snip...
        ucomisd %xmm3, %xmm0
        jne     .L12
        .p2align 4,,3
        .p2align 3
        jp      .L12
        movsd   .LC3(%rip), %xmm0
        jmp     .L7
.L12:
        movapd  %xmm2, %xmm0
.L7:
        mulsd   %xmm1, %xmm1
        ...

(2) Second, with using "baseOptions -fno-rerun-cse-after-loop" (the test program passes with these options):
        ...snip...
        cmpneqsd        %xmm3, %xmm0
        movapd  %xmm2, %xmm3
        andpd   %xmm0, %xmm3
        movsd   .LC3(%rip), %xmm4
        andnpd  %xmm4, %xmm0
        orpd    %xmm3, %xmm0
        ...

Comparing (1) and (2), the if-conversion gets rid of a few branches by converting:
if (condX) x=a; else x = b;
into:
maskX = condX ? 0xfff.. : 0;  // cmpneqsd %xmm3, %xmm0
x1 = a & maskX;               // andpd    %xmm0, %xmm3
x2 = b & ~maskX;              // andnpd   %xmm4, %xmm0
x = x1 | x2;                  // orpd     %xmm3, %xmm0

(3) Lastly, with using "baseOptions" (the test program fails now):
        ...snip...
        cmpneqsd        %xmm3, %xmm0
        movapd  %xmm2, %xmm3
        andpd   %xmm0, %xmm3
        movapd  %xmm3, %xmm0
        movsd   .LC3(%rip), %xmm3
        orpd    %xmm0, %xmm3
        ...

What has happened is that the "cse2" phase has deleted the "andnpd" instruction.
We will have to look at the (.cse2) dump file to figure out why:

---- snip ----
(insn 69 15 70 3 /home/raksit/bug-test.C:17 (set (reg:DF 77)
        (mem/u/c/i:DF (symbol_ref/u:DI ("*.LC3") [flags 0x2]) [0 S8 A64])) 103 {*movdf_integer_rex64} (expr_list:REG_EQUAL (const_double:DF 1.0e+0 [0x0.8p+1])
        (nil)))

(insn 70 69 71 3 /home/raksit/bug-test.C:17 (set (reg:DF 78)
        (ne:DF (reg:DF 61 [ D.2927 ])
            (reg:DF 68))) 615 {*sse_setccdf} (expr_list:REG_EQUAL (const_int 1 [0x1])
        (nil)))

(insn 71 70 73 3 /home/raksit/bug-test.C:17 (set (reg:DF 79)
        (and:DF (reg/v:DF 60 [ B ])
            (reg:DF 78))) 1277 {*anddf3} (nil))

(insn 73 71 31 3 /home/raksit/bug-test.C:17 (set (reg/v:DF 58 [ B.25 ])
        (ior:DF (reg:DF 77)
            (reg:DF 79))) 1278 {*iordf3} (nil))
--------------

The interesting part is:
(insn 70 69 71 3 /home/raksit/bug-test.C:17 (set (reg:DF 78)
        (ne:DF (reg:DF 61 [ D.2927 ])
            (reg:DF 68))) 615 {*sse_setccdf} (expr_list:REG_EQUAL (const_int 1 [0x1])
        (nil)))

This instruction corresponds to:
maskX = condX ? 0xfff.. : 0;  // cmpneqsd %xmm3, %xmm0

Gcc is able to figure out that condX evaluates to true at compile-time -- and this is conveyed by the "REG_EQUAL (const_int 1 [0x1])" note on the instruction.
This note which says that maskX is equal to "const_int 1", is added by cse.c:fold_rtx(). It is this REG_EQUAL note that ultimately results in the CSE phase deleting the andnpd instruction (because, given the generated code sequence, maskX should be folded into 0xffff..., not "const_int 1").

The problem is in cse.c:fold_rtx(), when it folds the given floating-point-mode RTX into a true/false value. The code in fold_rtx() checks the FLOAT_STORE_FLAG_VALUE macro to find the correct representation of "true" in floating-point modes. But if this macro is not defined (its not for the i386 target), it uses const_true_rtx, which is equal to "const_int 1".

This is different behavior from something closely related in simplify-rtx.c.
In simplify-rtx.c:simplify_relational_operation():
---- snip ----
  tem = simplify_const_relational_operation (code, cmp_mode, op0, op1);
  if (tem)
    {
      if (SCALAR_FLOAT_MODE_P (mode))
        {
          if (tem == const0_rtx)
            return CONST0_RTX (mode);
#ifdef FLOAT_STORE_FLAG_VALUE
          {
            REAL_VALUE_TYPE val;
            val = FLOAT_STORE_FLAG_VALUE (mode);
            return CONST_DOUBLE_FROM_REAL_VALUE (val, mode);
          }
#else
          return NULL_RTX;
#endif
        }
--------------

Above, if simplify_const_relational_operation() can simplify the given rel-op expression to a compile-time true/false, the returned value tem may be const_true_rtx/const0_rtx.
When its const_true_rtx and this expression is floating-point mode, the FLOAT_STORE_FLAG_VALUE macro is used to return the correct floating-point representation of true. And if the macro is undefined, instead of returning const_true_rtx, NULL_RTX is returned (i.e., the given expression couldn't be simplified).

One fix for the bug described above is to make cse.c:fold_rtx() behave the same way as simplify-rtx.c, i.e., when a floating-point-mode expression can be folded into "true", but the FLOAT_STORE_FLAG_VALUE is undefined, give up on folding instead of returning const_true_rtx as the folded expression.
Comment 1 Andrew Pinski 2008-09-11 23:53:23 UTC
This is a target bug really since it is SSE which needs to set FLOAT_STORE_FLAG_VALUE correctly.
Comment 2 H.J. Lu 2008-09-12 00:45:45 UTC
(In reply to comment #1)
> This is a target bug really since it is SSE which needs to set
> FLOAT_STORE_FLAG_VALUE correctly.
> 

How can you define FLOAT_STORE_FLAG_VALUE only for float/double
when SSE math is used?
Comment 3 raksit 2008-09-12 01:08:21 UTC
(In reply to comment #1)
> This is a target bug really since it is SSE which needs to set
> FLOAT_STORE_FLAG_VALUE correctly.
> 

In that case, I think simplify-rtx.c:simplify_relational_operation() is being overly conservative; and its behavior should be changed to match that of cse.c:fold_rtx() in this regard.
Comment 4 H.J. Lu 2008-09-12 17:53:10 UTC
We can't define FLOAT_STORE_FLAG_VALUE for SSE since we can't
represent 0xfffffff as a valid FP value. This patch makes
fold_rtx to match simplify_relational_operation:

--- ./cse.c.foo	2008-09-08 10:46:09.000000000 -0700
+++ ./cse.c	2008-09-12 10:46:18.000000000 -0700
@@ -3217,14 +3217,16 @@ fold_rtx (rtx x, rtx insn)
 	  rtx true_rtx = const_true_rtx, false_rtx = const0_rtx;
 	  enum machine_mode mode_arg1;
 
-#ifdef FLOAT_STORE_FLAG_VALUE
 	  if (SCALAR_FLOAT_MODE_P (mode))
 	    {
+#ifdef FLOAT_STORE_FLAG_VALUE
 	      true_rtx = (CONST_DOUBLE_FROM_REAL_VALUE
 			  (FLOAT_STORE_FLAG_VALUE (mode), mode));
+#else
+	      true_rtx = NULL_RTX;
+#endif
 	      false_rtx = CONST0_RTX (mode);
 	    }
-#endif
 
 	  code = find_comparison_args (code, &folded_arg0, &folded_arg1,
 				       &mode_arg0, &mode_arg1);
@@ -3332,8 +3334,17 @@ fold_rtx (rtx x, rtx insn)
 						  const_arg1))
 			      || (REG_P (folded_arg1)
 				  && (REG_QTY (REGNO (folded_arg1)) == ent->comparison_qty))))
-			return (comparison_dominates_p (ent->comparison_code, code)
-				? true_rtx : false_rtx);
+			{
+			  if (comparison_dominates_p (ent->comparison_code, code))
+			    {
+			      if (true_rtx)
+				return true_rtx;
+			      else
+				break;
+			    }
+			  else
+			    return false_rtx;
+			}
 		    }
 		}
 	    }
Comment 5 raksit 2008-09-12 18:52:20 UTC
Created attachment 16307 [details]
new test: gcc/testsuite/g++.dg/opt/cse3.C
Comment 6 raksit 2008-09-12 18:54:25 UTC
(In reply to comment #4)
> We can't define FLOAT_STORE_FLAG_VALUE for SSE since we can't
> represent 0xfffffff as a valid FP value. This patch makes
> fold_rtx to match simplify_relational_operation:
> 
> --- ./cse.c.foo 2008-09-08 10:46:09.000000000 -0700
> +++ ./cse.c     2008-09-12 10:46:18.000000000 -0700
> @@ -3217,14 +3217,16 @@ fold_rtx (rtx x, rtx insn)
>           rtx true_rtx = const_true_rtx, false_rtx = const0_rtx;
>           enum machine_mode mode_arg1;
> 
> -#ifdef FLOAT_STORE_FLAG_VALUE
>           if (SCALAR_FLOAT_MODE_P (mode))
>             {
> +#ifdef FLOAT_STORE_FLAG_VALUE
>               true_rtx = (CONST_DOUBLE_FROM_REAL_VALUE
>                           (FLOAT_STORE_FLAG_VALUE (mode), mode));
> +#else
> +             true_rtx = NULL_RTX;
> +#endif
>               false_rtx = CONST0_RTX (mode);
>             }
> -#endif
> 
>           code = find_comparison_args (code, &folded_arg0, &folded_arg1,
>                                        &mode_arg0, &mode_arg1);
> @@ -3332,8 +3334,17 @@ fold_rtx (rtx x, rtx insn)
>                                                   const_arg1))
>                               || (REG_P (folded_arg1)
>                                   && (REG_QTY (REGNO (folded_arg1)) ==
> ent->comparison_qty))))
> -                       return (comparison_dominates_p (ent->comparison_code,
> code)
> -                               ? true_rtx : false_rtx);
> +                       {
> +                         if (comparison_dominates_p (ent->comparison_code,
> code))
> +                           {
> +                             if (true_rtx)
> +                               return true_rtx;
> +                             else
> +                               break;
> +                           }
> +                         else
> +                           return false_rtx;
> +                       }
>                     }
>                 }
>             }
> 

I had a similar patch in mind. In case it helps, I have attached a testcase that you can use for the submission. You might also want to fix the gcc-4.3 branch.

-raksit
Comment 7 H.J. Lu 2008-09-12 19:40:45 UTC
A patch is posted at

http://gcc.gnu.org/ml/gcc-patches/2008-09/msg00984.html
Comment 8 hjl@gcc.gnu.org 2008-09-13 15:48:57 UTC
Subject: Bug 37489

Author: hjl
Date: Sat Sep 13 15:47:41 2008
New Revision: 140344

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=140344
Log:
gcc/

2008-09-13  H.J. Lu  <hongjiu.lu@intel.com>

	PR rtl-optimization/37489
	* cse.c (fold_rtx): Don't return const_true_rtx for float
	compare if FLOAT_STORE_FLAG_VALUE is undefined.

gcc/testsuite/

2008-09-13  Raksit Ashok <raksit@google.com>

	PR rtl-optimization/37489
	* g++.dg/opt/cse3.C: New.

Added:
    trunk/gcc/testsuite/g++.dg/opt/cse3.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cse.c
    trunk/gcc/testsuite/ChangeLog

Comment 9 H.J. Lu 2008-09-13 16:00:32 UTC
Fixed.
Comment 10 Richard Biener 2008-11-06 14:03:31 UTC
*** Bug 37969 has been marked as a duplicate of this bug. ***
Comment 11 Richard Biener 2008-11-06 15:07:11 UTC
Subject: Bug 37489

Author: rguenth
Date: Thu Nov  6 15:05:44 2008
New Revision: 141645

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=141645
Log:
2008-11-06  Richard Guenther  <rguenther@suse.de>
 
 	Backport from mainline:
 	2008-09-13  H.J. Lu  <hongjiu.lu@intel.com>
 
 	PR rtl-optimization/37489
 	* cse.c (fold_rtx): Don't return const_true_rtx for float
 	compare if FLOAT_STORE_FLAG_VALUE is undefined.

 	* g++.dg/opt/cse3.C: New testcase.
	* gcc.dg/torture/pr37969.c: New testcase.

Added:
    branches/gcc-4_3-branch/gcc/testsuite/g++.dg/opt/cse3.C
    branches/gcc-4_3-branch/gcc/testsuite/gcc.dg/torture/pr37969.c
Modified:
    branches/gcc-4_3-branch/gcc/ChangeLog
    branches/gcc-4_3-branch/gcc/cse.c
    branches/gcc-4_3-branch/gcc/testsuite/ChangeLog