91645 – Missed optimization with sqrt(x*x)

Bug 91645 - Missed optimization with sqrt(x*x)

Summary: Missed optimization with sqrt(x*x)

Status:	ASSIGNED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	9.2.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Jakub Jelinek

URL:
Keywords:	missed-optimization

Duplicates (1):	103559 (view as bug list)
Depends on:
Blocks:	VRP
	Show dependency tree / graph

Reported:	2019-09-03 14:39 UTC by Nikita Lisitsa
Modified:	2024-07-21 22:44 UTC (History)
CC List:	11 users (show)

See Also:	107591
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2019-09-04 00:00:00

Attachments
gcc13-pr91645.patch (2.86 KB, patch) 2023-03-30 13:30 UTC, Jakub Jelinek	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Nikita Lisitsa 2019-09-03 14:39:11 UTC

Based on a discussion on stackoverflow: https://stackoverflow.com/questions/57673825/how-to-force-gcc-to-assume-that-a-floating-point-expression-is-non-negative.

With gcc-trunk and '-std=c++17 -O3', the function 

float test (float x) 
{
    return std::sqrt(x*x);
}

produces the following assembly:

test(float):
        mulss   xmm0, xmm0
        pxor    xmm2, xmm2
        ucomiss xmm2, xmm0
        movaps  xmm1, xmm0
        sqrtss  xmm1, xmm1
        ja      .L8
        movaps  xmm0, xmm1
        ret
.L8:
        sub     rsp, 24
        movss   DWORD PTR [rsp+12], xmm1
        call    sqrtf
        movss   xmm1, DWORD PTR [rsp+12]
        add     rsp, 24
        movaps  xmm0, xmm1
        ret


As far as I can tell, it calls sqrtf, unless the argument to sqrt is >= 0, to check for negatives/NaN's and set the appropriate errno. The behavior is reasonable, as expected.

Adding '-fno-math-errno', '-ffast-math', or '-ffinite-math-only' removes all the clutter and compiles the same code into the neat

test(float):
        mulss   xmm0, xmm0
        sqrtss  xmm0, xmm0
        ret


Now, the problem is that GCC doesn't seem to optimize away the call to sqrtf based on some surrounding code. As an example, it would be neat to have this (or something similar) to get compiled into the same mulss-sqrtss-ret:

float test (float x) 
{
    float y = x*x;
    if (y >= 0.f)
        return std::sqrt(y);
    __builtin_unreachable();
}

If I understand it correctly, the 'y >= 0.f' excludes 'y' being NaN and 'y' being negative (though this is excluded by 'y = x*x'), so there is no need to check if the argument to `std::sqrt` is any bad, enabling to just do 'sqrtss' and return.

Furthemore, adding e.g. '#pragma GCC optimize ("no-math-errno")' before the 'test' function doesn't lead to optimizing it either, though I'm not sure whether this is expected to work and/or requires a separate bugtracker issue.

Comment 1 Richard Biener 2019-09-04 09:50:50 UTC

Confirmed.  We have no value-range analysis for floating-point, plus
call-DCE emits an unordered >= while your test is using an ordered compare.

  <bb 2> [local count: 1073741824]:
  y_4 = x_3(D) * x_3(D);
  if (y_4 >= 0.0)
    goto <bb 3>; [26.75%]
  else
    goto <bb 6>; [73.25%]

  <bb 3> [local count: 287225938]:
  if (y_4 u>= 0.0)
    goto <bb 4>; [99.95%]
  else
    goto <bb 5>; [0.05%]

  <bb 4> [local count: 287082327]:
  _7 = .SQRT (y_4);
  goto <bb 6>; [100.00%]

  <bb 5> [local count: 143611]:
  _10 = __builtin_sqrtf (y_4);

and jump threading doesn't know that if y >= 0.0 is true then
isgreaterequal (y, 0.0) is as well.  OTOH I'm not sure that's
actually true since IIRC isgreaterequal (y, 0.0) will return
true if isunordered (y, 0.0) while for y >= 0.0 the result
is unspecified?

So maybe you simply have to adjust your precondition appropriately
like

  y = x * x;
  if (isless (y, 0.0))
    __builtin_unreachable ();
  return std::sqrt (y);

that way it works for me?

Comment 2 Nikita Lisitsa 2019-09-04 11:30:26 UTC

If by 'isless(y, 0.0)' you mean 'y < 0.f', then no, it doesn't change anything, it produces the same 'ucomiss ... call sqrtf' boilerplate. May I have misunderstood you?

By the way, what about '#pragma GCC optimize ("no-math-errno")'? Is it supposed to work? Should I issue another bug on that matter?

Comment 3 Marc Glisse 2019-09-04 18:45:50 UTC

(In reply to Richard Biener from comment #1)
> for y >= 0.0 the result is unspecified?

NaN >= 0.0 is false, but even if it was unspecified, the implication would still be true.

(In reply to Nikita Lisitsa from comment #2)
> If by 'isless(y, 0.0)' you mean 'y < 0.f',

He is talking about the standard macro isless, defined in math.h.

Comment 4 rguenther@suse.de 2019-09-05 07:38:25 UTC

On Wed, 4 Sep 2019, lisyarus at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91645
> 
> --- Comment #2 from Nikita Lisitsa <lisyarus at gmail dot com> ---
> If by 'isless(y, 0.0)' you mean 'y < 0.f', then no, it doesn't change anything,
> it produces the same 'ucomiss ... call sqrtf' boilerplate. May I have
> misunderstood you?

I meant isless literally, it's from math.h

> By the way, what about '#pragma GCC optimize ("no-math-errno")'? Is it supposed
> to work? Should I issue another bug on that matter?

It should work if the #pragma is before the function start (those pragmas 
only work on function granularity)

Comment 5 Nikita Lisitsa 2019-09-05 09:07:57 UTC

Oh, thank you a lot! Indeed, this version compiles to just mulss & sqrtss

float test (float x)
{
    float y = x*x;
    if (std::isless(y, 0.f))
        __builtin_unreachable();
    return std::sqrt(y);
}

Yet, I still don't quite understand what is happening here. Is it because the standard '<' operator is still subject to FE_* ?

Concerning pragmas, the code

#pragma GCC optimize ("no-math-errno")
float test (float x)
{
    return std::sqrt(x*x);
}

produces the following assembly

std::sqrt(float):
        pxor    xmm2, xmm2
        movaps  xmm1, xmm0
        ucomiss xmm2, xmm0
        sqrtss  xmm1, xmm1
        ja      .L8
        movaps  xmm0, xmm1
        ret
.L8:
        sub     rsp, 24
        movss   DWORD PTR [rsp+12], xmm1
        call    sqrtf
        movss   xmm1, DWORD PTR [rsp+12]
        add     rsp, 24
        movaps  xmm0, xmm1
        ret
test(float):
        mulss   xmm0, xmm0
        jmp     std::sqrt(float)

So, the only notable difference is that now 'std::sqrt(float)' is not inlined, but is tail-called instead. Thus, the pragma seems not to work?

Comment 6 rguenther@suse.de 2019-09-05 09:19:26 UTC

On Thu, 5 Sep 2019, lisyarus at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91645
> 
> --- Comment #5 from Nikita Lisitsa <lisyarus at gmail dot com> ---
> Oh, thank you a lot! Indeed, this version compiles to just mulss & sqrtss
> 
> float test (float x)
> {
>     float y = x*x;
>     if (std::isless(y, 0.f))
>         __builtin_unreachable();
>     return std::sqrt(y);
> }
> 
> Yet, I still don't quite understand what is happening here. Is it because the
> standard '<' operator is still subject to FE_* ?

It's probably because our jump-threading is imperfect if I interpret
Marcs comment correctly.  When seeing LE_EXPR we only consider

      if (FLOAT_TYPE_P (TREE_TYPE (op0)))
        {
          build_and_record_new_cond (ORDERED_EXPR, op0, op1, p);
        }

so we know it's ORDERED but we don't register it's known to be UNLE as 
well.

> Concerning pragmas, the code
> 
> #pragma GCC optimize ("no-math-errno")
> float test (float x)
> {
>     return std::sqrt(x*x);
> }
> 
> produces the following assembly
> 
> std::sqrt(float):
>         pxor    xmm2, xmm2
>         movaps  xmm1, xmm0
>         ucomiss xmm2, xmm0
>         sqrtss  xmm1, xmm1
>         ja      .L8
>         movaps  xmm0, xmm1
>         ret
> .L8:
>         sub     rsp, 24
>         movss   DWORD PTR [rsp+12], xmm1
>         call    sqrtf
>         movss   xmm1, DWORD PTR [rsp+12]
>         add     rsp, 24
>         movaps  xmm0, xmm1
>         ret
> test(float):
>         mulss   xmm0, xmm0
>         jmp     std::sqrt(float)
> 
> So, the only notable difference is that now 'std::sqrt(float)' is not inlined,
> but is tail-called instead. Thus, the pragma seems not to work?

True, there may already be a bug about this.  The issue is that
whether functions set errno or not is decided globally and this
info isn't changed between functions according to pragmas.

Comment 7 Aldy Hernandez 2022-09-03 16:08:48 UTC

> Now, the problem is that GCC doesn't seem to optimize away the call to sqrtf
> based on some surrounding code. As an example, it would be neat to have this
> (or something similar) to get compiled into the same mulss-sqrtss-ret:
> 
> float test (float x) 
> {
>     float y = x*x;
>     if (y >= 0.f)
>         return std::sqrt(y);
>     __builtin_unreachable();
> }
> 
> If I understand it correctly, the 'y >= 0.f' excludes 'y' being NaN and 'y'
> being negative (though this is excluded by 'y = x*x'), so there is no need
> to check if the argument to `std::sqrt` is any bad, enabling to just do
> 'sqrtss' and return.

I'm not an FP expert, but I think we have enough information to do this right now.  The evrp dump now has:

=========== BB 2 ============
Imports: y_2  
Exports: y_2  
    <bb 2> :
    y_2 = x_1(D) * x_1(D);
    if (y_2 >= 0.0)
      goto <bb 3>; [INV]
    else
      goto <bb 4>; [INV]

2->3  (T) y_2 : 	[frange] float [0.0,  Inf] !NAN 
2->4  (F) y_2 : 	[frange] float [ -Inf, 0.0] 


=========== BB 3 ============
y_2	[frange] float [0.0,  Inf] !NAN 
    <bb 3> :
    _6 = __builtin_sqrtf (y_2);
    return _6;

Which means that y_2 is known to be [0.0, Inf] excluding a NAN.

What needs to happen for the call to __builtin_sqrtf to be optimized to sqrtss?

Comment 8 Jeffrey A. Law 2022-09-03 17:33:08 UTC

First, there's magic bits which turn a standard sqrt call into something like

if (exceptional condition)
  call libm's sqrt
else
  use hardware sqrt

The primary goal is to get errno set properly for those exceptional conditions.  That code (tree-ssa-math-opts.cc?) can be updated to query the range to determine if the exceptional conditions can happen.  Essentially that eliminates the need for -fno-math-errno.

That may be enough for targets with real sqrt instructions.  More work is likely needed for targets that use an estimator + correction steps.

Comment 9 Aldy Hernandez 2023-03-29 17:40:57 UTC

It looks like what we want for this test is actually !isgreaterequal() not isless(), since we want to exclude the possibility of a NAN.  Like this:

float test (float x)
{
  if (!__builtin_isgreaterequal (x, 0.0))
    __builtin_unreachable();
  return sqrtf (x);
}

After VRP1 removes the unreachable, the range for x_1 is correctly exported as >= -0.0 without a NAN:

Global Exported (via unreachable): x_1(D) = [frange] float [-0.0 (-0x0.0p+0), +Inf]

We end up with this:

float test (float x)
{
  float _4;

  <bb 2> [local count: 1073741824]:
  _4 = sqrtf (x_1(D));
  return _4;

}

which then CDCE expands with the unnecessary checking code:

  <bb 2> [local count: 1073741824]:
  DCE_COND_LB.2_5 = x_1(D);
  DCE_COND_LB_TEST.3_6 = DCE_COND_LB.2_5 u>= 0.0;
  if (DCE_COND_LB_TEST.3_6 != 0)
    goto <bb 5>; [99.95%]
  else
    goto <bb 4>; [0.05%]

  <bb 5> [local count: 1073204960]:
  _4 = .SQRT (x_1(D));
  goto <bb 3>; [100.00%]

  <bb 4> [local count: 536864]:
  _7 = sqrtf (x_1(D));

So the CDCE pass needs to be enhanced to use the ranger, instead of heuristics, to determine that the argument to sqrtf is neither negative nor NAN.

In this particular case, we could use global ranges for free, but there's no reason we can't use an actual on-demand ranger for more complex scenarios.

Just a guess here, but use_internal_fn() in CDCE shrink wraps the call to sqrt into a check with appropriate dispatch.  We could emit the .SQRT call directly if the range of x_1 is not NAN and not negative.

Comment 10 Aldy Hernandez 2023-03-29 17:46:09 UTC

*** Bug 103559 has been marked as a duplicate of this bug. ***

Comment 11 Jakub Jelinek 2023-03-30 11:50:28 UTC

(In reply to Aldy Hernandez from comment #9)
> It looks like what we want for this test is actually !isgreaterequal() not
> isless(), since we want to exclude the possibility of a NAN.  Like this:
> 
> float test (float x)
> {
>   if (!__builtin_isgreaterequal (x, 0.0))
>     __builtin_unreachable();
>   return sqrtf (x);
> }

No, what we emit for sqrtf is effectively:
  if (!__builtin_isless (x, 0.0))
    asm ("sqrtss %0, %0" : "+x" (x));
  else
    x = sqrtf (x); // The library version
  return x;
So, the
  if (__builtin_isless (x, 0.0)
    __builtin_unreachable ();
is all we need.
We should either ask in tree-call-cdce.cc before trying to create this whether
x range is [-0.0, +Inf] NAN or some subset thereof, in that case just use
the .SQRT internal call, or if it is [-Inf, nextafterf (-0.0, -Inf)] or subset thereof,
then use the library call only.
Or try to fold it in vrp2 or when, but in that case we don't know that easily whether
the comparison can be optimized away completely (it still raises exceptions on sNaN).

Comment 12 Jakub Jelinek 2023-03-30 13:03:52 UTC

WIP which still doesn't work:
--- gcc/value-query.cc.jj	2023-03-23 15:25:47.069740988 +0100
+++ gcc/value-query.cc	2023-03-30 14:56:58.809298424 +0200
@@ -230,9 +230,11 @@ range_query::get_tree_range (vrange &r,
     default:
       break;
     }
-  if (BINARY_CLASS_P (expr))
+  if (BINARY_CLASS_P (expr) || COMPARISON_CLASS_P (expr))
     {
-      range_op_handler op (TREE_CODE (expr), type);
+      range_op_handler op (TREE_CODE (expr),
+			   BINARY_CLASS_P (expr) ? type
+			   : TREE_TYPE (TREE_OPERAND (expr, 0)));
       if (op)
 	{
 	  Value_Range r0 (TREE_TYPE (TREE_OPERAND (expr, 0)));
--- gcc/tree-call-cdce.cc.jj	2023-01-02 09:32:45.940944935 +0100
+++ gcc/tree-call-cdce.cc	2023-03-30 14:54:25.248544702 +0200
@@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.
 #include "builtins.h"
 #include "internal-fn.h"
 #include "tree-dfa.h"
+#include "gimple-range.h"
 

 
 /* This pass serves two closely-related purposes:
@@ -425,12 +426,9 @@ comparison_code_if_no_nans (tree_code co
    null tree.  */
 
 static void
-gen_one_condition (tree arg, int lbub,
-                   enum tree_code tcode,
-                   const char *temp_name1,
-		   const char *temp_name2,
-		   vec<gimple *> conds,
-                   unsigned *nconds)
+gen_one_condition (tree arg, int lbub, enum tree_code tcode,
+		   const char *temp_name1, const char *temp_name2,
+		   vec<gimple *> conds, unsigned *nconds, gimple *stmt)
 {
   if (!HONOR_NANS (arg))
     tcode = comparison_code_if_no_nans (tcode);
@@ -451,10 +449,24 @@ gen_one_condition (tree arg, int lbub,
   gimple_assign_set_lhs (stmt1, tempn);
 
   tempc = create_tmp_var (boolean_type_node, temp_name2);
-  stmt2 = gimple_build_assign (tempc,
-                               fold_build2 (tcode,
-					    boolean_type_node,
-					    tempn, lbub_real_cst));
+  tree tcond = build2 (tcode, boolean_type_node, arg, lbub_real_cst);
+  int_range_max r;
+  range_query *q = get_range_query (cfun);
+  if (q == get_global_range_query ())
+    q = enable_ranger (cfun);
+  /* Ask the ranger whether it knows the condition will be always false or
+     always true.  */
+  if (!q->range_of_expr (r, tcond, stmt) || r.undefined_p ())
+    tcond = NULL_TREE;
+  else if (r.upper_bound () == 0)
+    tcond = boolean_false_node;
+  else if (r.lower_bound () == 1)
+    tcond = boolean_true_node;
+  else
+    tcond = NULL_TREE;
+  if (!tcond)
+    tcond = fold_build2 (tcode, boolean_type_node, tempn, lbub_real_cst);
+  stmt2 = gimple_build_assign (tempc, tcond);
   tempcn = make_ssa_name (tempc, stmt2);
   gimple_assign_set_lhs (stmt2, tempcn);
 
@@ -475,16 +487,15 @@ gen_one_condition (tree arg, int lbub,
    for lower bound check, one for upper bound check.  */
 
 static void
-gen_conditions_for_domain (tree arg, inp_domain domain,
-			   vec<gimple *> conds,
-                           unsigned *nconds)
+gen_conditions_for_domain (tree arg, inp_domain domain, vec<gimple *> conds,
+			   unsigned *nconds, gimple *stmt)
 {
   if (domain.has_lb)
     gen_one_condition (arg, domain.lb,
                        (domain.is_lb_inclusive
                         ? UNGE_EXPR : UNGT_EXPR),
                        "DCE_COND_LB", "DCE_COND_LB_TEST",
-                       conds, nconds);
+		       conds, nconds, stmt);
 
   if (domain.has_ub)
     {
@@ -496,7 +507,7 @@ gen_conditions_for_domain (tree arg, inp
                          (domain.is_ub_inclusive
                           ? UNLE_EXPR : UNLT_EXPR),
                          "DCE_COND_UB", "DCE_COND_UB_TEST",
-                         conds, nconds);
+			 conds, nconds, stmt);
     }
 }
 
@@ -518,9 +529,8 @@ gen_conditions_for_domain (tree arg, inp
    and *NCONDS is the number of logical conditions.  */
 
 static void
-gen_conditions_for_pow_cst_base (tree base, tree expn,
-				 vec<gimple *> conds,
-                                 unsigned *nconds)
+gen_conditions_for_pow_cst_base (tree base, tree expn, vec<gimple *> conds,
+				 unsigned *nconds, gimple *stmt)
 {
   inp_domain exp_domain;
   /* Validate the range of the base constant to make
@@ -532,11 +542,9 @@ gen_conditions_for_pow_cst_base (tree ba
   real_from_integer (&mv, TYPE_MODE (TREE_TYPE (base)), 256, UNSIGNED);
   gcc_assert (!real_less (&mv, &bcv));
 
-  exp_domain = get_domain (0, false, false,
-                           127, true, false);
+  exp_domain = get_domain (0, false, false, 127, true, false);
 
-  gen_conditions_for_domain (expn, exp_domain,
-                             conds, nconds);
+  gen_conditions_for_domain (expn, exp_domain, conds, nconds, stmt);
 }
 
 /* Generate error condition code for pow calls with
@@ -554,9 +562,8 @@ gen_conditions_for_pow_cst_base (tree ba
    conditions.  */
 
 static void
-gen_conditions_for_pow_int_base (tree base, tree expn,
-				 vec<gimple *> conds,
-                                 unsigned *nconds)
+gen_conditions_for_pow_int_base (tree base, tree expn, vec<gimple *> conds,
+				 unsigned *nconds, gimple *stmt)
 {
   gimple *base_def;
   tree base_val0;
@@ -600,11 +607,9 @@ gen_conditions_for_pow_int_base (tree ba
   /* Generate condition in reverse order -- first
      the condition for the exp argument.  */
 
-  exp_domain = get_domain (0, false, false,
-                           max_exp, true, true);
+  exp_domain = get_domain (0, false, false, max_exp, true, true);
 
-  gen_conditions_for_domain (expn, exp_domain,
-                             conds, nconds);
+  gen_conditions_for_domain (expn, exp_domain, conds, nconds, stmt);
 
   /* Now generate condition for the base argument.
      Note it does not use the helper function
@@ -660,9 +665,9 @@ gen_conditions_for_pow (gcall *pow_call,
   bc = TREE_CODE (base);
 
   if (bc == REAL_CST)
-    gen_conditions_for_pow_cst_base (base, expn, conds, nconds);
+    gen_conditions_for_pow_cst_base (base, expn, conds, nconds, pow_call);
   else if (bc == SSA_NAME)
-    gen_conditions_for_pow_int_base (base, expn, conds, nconds);
+    gen_conditions_for_pow_int_base (base, expn, conds, nconds, pow_call);
   else
     gcc_unreachable ();
 }
@@ -852,7 +857,7 @@ gen_shrink_wrap_conditions (gcall *bi_ca
       inp_domain domain = get_no_error_domain (fnc);
       *nconds = 0;
       arg = gimple_call_arg (bi_call, 0);
-      gen_conditions_for_domain (arg, domain, conds, nconds);
+      gen_conditions_for_domain (arg, domain, conds, nconds, bi_call);
     }
 
   return;
@@ -1290,6 +1295,8 @@ pass_call_cdce::execute (function *fun)
     return 0;
 
   shrink_wrap_conditional_dead_built_in_calls (cond_dead_built_in_calls);
+  if (get_range_query (fun) != get_global_range_query ())
+    disable_ranger (fun);
   free_dominance_info (CDI_POST_DOMINATORS);
   /* As we introduced new control-flow we need to insert PHI-nodes
      for the call-clobbers of the remaining call.  */

Comment 13 Jakub Jelinek 2023-03-30 13:30:01 UTC

Created attachment 54789 [details]
gcc13-pr91645.patch

This seems to work right on the testcase I've been trying:
float sqrtf (float);
float foo (float x) { return sqrtf (x); }
float bar (float x) { if (__builtin_isless (x, 0)) __builtin_unreachable (); return sqrtf (x); }
float baz (float x) { if (!__builtin_isless (x, 0)) __builtin_unreachable (); return sqrtf (x); }
float qux (float x) { x = x * x; return sqrtf (x); }

Do we want this for GCC 13, or at least parts of it like the range-op-float.cc fixes and/or the value-range.cc change, or defer everything for GCC 14?

Comment 14 GCC Commits 2023-03-31 11:43:13 UTC

The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:e02c9d9116f243643c0daba8dbcc5d1795c827c3

commit r13-6956-ge02c9d9116f243643c0daba8dbcc5d1795c827c3
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri Mar 31 13:41:34 2023 +0200

    range-op-float, value-range: Fix up handling of UN{LT,LE,GT,GE,EQ}_EXPR and handle comparisons in get_tree_range [PR91645]
    
    When looking into PR91645, I've noticed we handle UN{LT,LE,GT,GE,EQ}_EXPR
    comparisons incorrectly.
    All those are unordered or ..., we correctly return [1, 1] if one or
    both operands are known NANs, and correctly ask the non-UN prefixed
    op to fold_range if neither operand may be NAN.
    But for the case where one or both operands may be NAN, we always
    return [0, 1].  The UN* fold_range tries to handle it by asking
    the non-UN prefixed fold_range and if it returns [1, 1] return that,
    if it returns [0, 0] or [0, 1] return [0, 1], which makes sense,
    because the maybe NAN means that it is the non-UN prefixed fold_range
    unioned with [1, 1] in case the maybe NAN is actually NAN at runtime.
    The problem is that the non-UN prefixed fold_range always returns [0, 1]
    because those fold_range implementations are like:
      if (op1.known_isnan () || op2.known_isnan ())
        r = range_false (type);
      else if (!maybe_isnan (op1, op2))
        {
    ...
        }
      else
        r = range_true_and_false (type);
    and so if maybe_isnan, they always return [0, 1].  Now, thinking about it,
    this is unnecessary pessimization, for the case where the ... block
    returns range_false (type) we actually could do it also if maybe_isnan (op1,
    op2), because if one or both operands are NAN, the comparison will be false,
    and if neither is NAN, the comparison will be also false.  Will fix
    incrementally today.
    Anyway, the following patch fixes it by asking the non-UN prefixed
    fold_range on ranges with NAN cleared, which I think does the right
    thing in all cases.
    
    Another change in the patch is that range_query::get_tree_range
    always returned VARYING for comparisons, this patch allows to ask about
    those as well (they are very much like binary ops, except they take
    the important type from the types of the operands rather than result).
    
    Initially I've developed this patch together with changes to tree-call-cdce.cc,
    but those result in one regression and apparently aren't actually needed to
    fix this bug, the range-op-float.cc changes are enough.
    
    2023-03-31  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/91645
            * range-op-float.cc (foperator_unordered_lt::fold_range,
            foperator_unordered_le::fold_range,
            foperator_unordered_gt::fold_range,
            foperator_unordered_ge::fold_range,
            foperator_unordered_equal::fold_range): Call the ordered
            fold_range on ranges with cleared NaNs.
            * value-query.cc (range_query::get_tree_range): Handle also
            COMPARISON_CLASS_P trees.
    
            * gcc.target/i386/pr103559-1.c: New test.
            * gcc.target/i386/pr103559-2.c: New test.
            * gcc.target/i386/pr103559-3.c: New test.
            * gcc.target/i386/pr103559-4.c: New test.