Bug 120231 - GCC fails to notice that (double)u64 is non-negative
Summary: GCC fails to notice that (double)u64 is non-negative
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 16.0
: P3 normal
Target Milestone: ---
Assignee: Jakub Jelinek
URL:
Keywords: missed-optimization
Depends on:
Blocks: VRP
  Show dependency treegraph
 
Reported: 2025-05-12 09:26 UTC by Alex Coplan
Modified: 2025-09-02 01:08 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2025-05-12 00:00:00


Attachments
gcc16-pr120231.patch (3.29 KB, patch)
2025-06-04 21:15 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Coplan 2025-05-12 09:26:57 UTC
For the following function:

_Bool f(unsigned long u64)
{
    return (double)u64 >= 0.0;
}

at -O3 on AArch64, GCC generates:

f:
        ucvtf   d31, x0
        fcmpe   d31, #0.0
        cset    w0, ge
        ret

whereas LLVM gives:

f:
        mov     w0, #1
        ret

i.e. it folds the comparison to constant true.  GCC should do the same.  This would be useful in the context of code like the following:

unsigned long sqrtint(unsigned long a) {
    return __builtin_sqrt(a);
}

for which GCC currently generates:

sqrtint:
        ucvtf   d0, x0
        fcmp    d0, #0.0
        bpl     .L5
        stp     x29, x30, [sp, -16]!
        mov     x29, sp
        bl      sqrt
        ldp     x29, x30, [sp], 16
        fcvtzu  x0, d0
        ret
.L5:
        fsqrt   d0, d0
        fcvtzu  x0, d0
        ret

i.e. it tests if the input is non-negative so that errno can be set correctly, but the test is redundant since the input is necessarily non-negative.  Indeed LLVM generates:

sqrtint:
        ucvtf   d0, x0
        fsqrt   d0, d0
        fcvtzu  x0, d0
        ret

for this case.
Comment 1 Jakub Jelinek 2025-05-12 09:40:24 UTC
I think we don't have value range handlers for casts from integers to floating point, from floating point to integers and their reverse operations, and it would certainly be desirable to add those.
Say if we know that some integer SSA_NAME has [42, 48] range, then the frange when that is converted to say double will be [42.0, 48.0].  Of course, when the boundary values can't be converted exactly, we need to take it into account etc.
Comment 2 Andrew Macleod 2025-05-12 17:40:37 UTC
We do not have cast operators between int and float.  We are also missing some dispatch code for them as we haven't actually used some of those patterns yet.

I am going to checked in a patch to trunk shortly using this PR as a tag.  It supplies the needed patterns and blank versions of the required operator methods for each cast. These are located near the bottom of range-op-float.cc.   These functions currently return false, which is the default behaviour if the functions did not exist.


// Cast Float to Integer
bool operator_cast::fold_range (irange &lhs, tree type, const frange &op1, const irange &, relation_trio) const;
bool operator_cast::op1_range (frange &op1, tree type, const irange &lhs, const irange &, relation_trio) const;

// Cast Integer to Float.
bool operator_cast::fold_range (frange &lhs, tree type, const irange &op1, const frange &, relation_trio) const;
bool operator_cast::op1_range (irange &op1, tree type, const frange &lhs, const frange &, relation_trio) const;


Feel free to populate them for this PR.  If I get a chance and no one else does, I may do some simple population. but it wont be complex since I'm not familiar with working on floats.
Comment 3 Jakub Jelinek 2025-05-12 18:10:29 UTC
Yeah, I'd definitely appreciate blank handlers for those, which I can gradually try to implement.  Working now on big-endian _BitInt, so it won't be immediately, but will try to get to it before summer.
Comment 4 GCC Commits 2025-05-12 20:51:19 UTC
The master branch has been updated by Andrew Macleod <amacleod@gcc.gnu.org>:

https://gcc.gnu.org/g:6f375445ef09d5c97d5bcc0fcb6069612217963e

commit r16-571-g6f375445ef09d5c97d5bcc0fcb6069612217963e
Author: Andrew MacLeod <amacleod@redhat.com>
Date:   Mon May 12 11:41:37 2025 -0400

    Add dispatch for casts between integer and float.
    
    GCC currently does not implement range operators for casting between
    integers and float.  This patch adds the missing dispatch patterns and
    routines to facilitate implmenting these casts.
    
            PR tree-optimization/120231
            * range-op-float.cc (operator_cast::fold_range): New variants.
            (operator_cast::op1_range): Likewise.
            * range-op-mixed.h (operator_cast::fold_range): Likewise.
            (operator_cast::op1_range): Likewise
            * range-op.cc (range_op_handler::fold_range): Add RO_FIF dispatch.
            (range_op_handler::op1_range): Add RO_IFF and RO_FII patterns.
            (range_operator::fold_range): Provide new variant default.
            (range_operator::op1_range): Likewise.
            * range-op.h (range_operator): Add new variant methods.
Comment 5 Joe Ramsay 2025-05-22 11:01:35 UTC
Hello! I have discovered a similar issue, commenting in case another data point is useful. The following:

#include <math.h>

_Float16 norm(_Float16 re, _Float16 im) {
    return sqrtf(re * re + im * im);
}

compiled with -march=armv8-a+fp16 -O3 gives

norm:
        fmul    h1, h1, h1
        fmadd   h1, h0, h0, h1
        fcvt    s0, h1
        fcmp    s0, #0.0
        bpl     .L4
        stp     x29, x30, [sp, -32]!
        mov     x29, sp
        str     s1, [sp, 28]
        bl      sqrtf
        ldr     s1, [sp, 28]
        ldp     x29, x30, [sp], 32
        fsqrt   h0, h1
        ret
.L4:
        fsqrt   h0, h1
        ret

and with -ffast-math added gives:

norm:
        fmul    h1, h1, h1
        fmadd   h0, h0, h0, h1
        fsqrt   h0, h0
        ret

I think GCC is being overly cautious here - the fast FSQRT case is fine without fast-math (the optimal sequence is emitted without -ffast-math for single- and double-precision floats).
Comment 6 Alex Coplan 2025-05-22 11:12:06 UTC
I suppose that example boils down to whether code like:

_Bool f(_Float16 a) {
    return a * a >= 0;
}
_Bool g(float a) {
    return a * a >= 0;
}

can be optimised to return true.  We currently do it with -ffast-math but not without.
Comment 7 Jakub Jelinek 2025-05-22 11:14:35 UTC
(In reply to Alex Coplan from comment #6)
> I suppose that example boils down to whether code like:
> 
> _Bool f(_Float16 a) {
>     return a * a >= 0;
> }
> _Bool g(float a) {
>     return a * a >= 0;
> }
> 
> can be optimised to return true.  We currently do it with -ffast-math but
> not without.

And it is correct like that.  If a is a NaN (qNaN or sNaN), then a * a >= 0 is false.
Comment 8 Jakub Jelinek 2025-05-22 11:15:54 UTC
In any case, #c5 and onwards is completely unrelated to this PR, which is about value ranges for casts from integers to floating point and vice versa.  So, please move that elsewhere.
Comment 9 Jakub Jelinek 2025-06-02 14:25:24 UTC
Apparently we are missing range implementation of casts between different floating point types as well.

Trying now:
--- gcc/range-op-mixed.h.jj	2025-05-20 08:14:06.520404648 +0200
+++ gcc/range-op-mixed.h	2025-06-02 14:30:37.304673412 +0200
@@ -473,14 +473,15 @@ public:
   bool fold_range (prange &r, tree type,
 		   const irange &op1, const prange &op2,
 		   relation_trio rel = TRIO_VARYING) const final override;
+  bool fold_range (frange &r, tree type,
+		   const frange &op1, const frange &op2,
+		   relation_trio = TRIO_VARYING) const final override;
   bool fold_range (irange &r, tree type,
-		   const frange &lh,
-		   const irange &rh,
-		   relation_trio = TRIO_VARYING) const;
+		   const frange &op1, const irange &op2,
+		   relation_trio = TRIO_VARYING) const final override;
   bool fold_range (frange &r, tree type,
-		   const irange &lh,
-		   const frange &rh,
-		   relation_trio = TRIO_VARYING) const;
+		   const irange &op1, const frange &op2,
+		   relation_trio = TRIO_VARYING) const final override;
 
   bool op1_range (irange &r, tree type,
 		  const irange &lhs, const irange &op2,
@@ -495,13 +496,14 @@ public:
 		  const irange &lhs, const prange &op2,
 		  relation_trio rel = TRIO_VARYING) const final override;
   bool op1_range (frange &r, tree type,
-		  const irange &lhs,
-		  const irange &op2,
-		  relation_trio = TRIO_VARYING) const;
+		  const frange &lhs, const frange &op2,
+		  relation_trio = TRIO_VARYING) const final override;
+  bool op1_range (frange &r, tree type,
+		  const irange &lhs, const irange &op2,
+		  relation_trio = TRIO_VARYING) const final override;
   bool op1_range (irange &r, tree type,
-		  const frange &lhs,
-		  const frange &op2,
-		  relation_trio = TRIO_VARYING) const;
+		  const frange &lhs, const frange &op2,
+		  relation_trio = TRIO_VARYING) const final override;
 
   relation_kind lhs_op1_relation (const irange &lhs,
 				  const irange &op1, const irange &op2,
--- gcc/range-op-float.cc.jj	2025-05-20 08:14:06.519404661 +0200
+++ gcc/range-op-float.cc	2025-06-02 15:23:14.171003089 +0200
@@ -2899,6 +2899,107 @@ private:
   }
 } fop_div;
 
+bool
+operator_cast::fold_range (frange &r, tree type, const frange &op1,
+			   const frange &, relation_trio) const
+{
+  REAL_VALUE_TYPE lb, ub;
+  enum machine_mode mode = TYPE_MODE (type);
+  bool mode_composite = MODE_COMPOSITE_P (mode);
+
+  if (empty_range_varying (r, type, op1, op1))
+    return true;
+  if (!MODE_HAS_NANS (mode) && op1.maybe_isnan ())
+    {
+      r.set_varying (type);
+      return true;
+    }
+  if (op1.known_isnan ())
+    {
+      r.set_nan (type);
+      return true;
+    }
+
+  const REAL_VALUE_TYPE &lh_lb = op1.lower_bound ();
+  const REAL_VALUE_TYPE &lh_ub = op1.upper_bound ();
+  real_convert (&lb, mode, &lh_lb);
+  real_convert (&ub, mode, &lh_ub);
+
+  if (flag_rounding_math)
+    {
+      if (real_less (&lh_lb, &lb))
+	{
+	  if (mode_composite
+	      && (real_isdenormal (&lb, mode) || real_iszero (&lb)))
+	    {
+	      // IBM extended denormals only have DFmode precision.
+	      REAL_VALUE_TYPE tmp, tmp2;
+	      real_convert (&tmp2, DFmode, &lh_lb);
+	      real_nextafter (&tmp, REAL_MODE_FORMAT (DFmode), &tmp2,
+			      &dconstninf);
+	      real_convert (&lb, mode, &tmp);
+	    }
+	  else
+	    frange_nextafter (mode, lb, dconstninf);
+	}
+      if (real_less (&ub, &lh_ub))
+	{
+	  if (mode_composite
+	      && (real_isdenormal (&ub, mode) || real_iszero (&ub)))
+	    {
+	      // IBM extended denormals only have DFmode precision.
+	      REAL_VALUE_TYPE tmp, tmp2;
+	      real_convert (&tmp2, DFmode, &lh_ub);
+	      real_nextafter (&tmp, REAL_MODE_FORMAT (DFmode), &tmp2,
+			      &dconstinf);
+	      real_convert (&ub, mode, &tmp);
+	    }
+	  else
+	    frange_nextafter (mode, ub, dconstinf);
+	}
+    }
+
+  r.set (type, lb, ub, op1.get_nan_state ());
+
+  if (flag_trapping_math
+      && MODE_HAS_INFINITIES (TYPE_MODE (type))
+      && r.known_isinf ()
+      && !op1.known_isinf ())
+    {
+      REAL_VALUE_TYPE inf = r.lower_bound ();
+      if (real_isneg (&inf))
+	{
+	  REAL_VALUE_TYPE min = real_min_representable (type);
+	  r.set (type, inf, min);
+	}
+      else
+	{
+	  REAL_VALUE_TYPE max = real_max_representable (type);
+	  r.set (type, max, inf);
+	}
+    }
+
+  r.flush_denormals_to_zero ();
+  return true;
+}
+
+// Implement fold for a cast from float to another float.
+bool
+operator_cast::op1_range (frange &r, tree type, const frange &lhs,
+			  const frange &op2, relation_trio) const
+{
+  if (lhs.undefined_p ())
+    return false;
+  tree lhs_type = lhs.type ();
+  frange wlhs = float_widen_lhs_range (lhs_type, lhs);
+  auto save_flag_rounding_math = flag_rounding_math;
+  flag_rounding_math = 1;
+  bool ret = float_binary_op_range_finish (fold_range (r, type, wlhs, op2),
+					   r, type, wlhs);
+  flag_rounding_math = save_flag_rounding_math;
+  return ret;
+}
+
 // Implement fold for a cast from float to an int.
 bool
 operator_cast::fold_range (irange &, tree, const frange &,
Comment 10 GCC Commits 2025-06-04 15:23:29 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:b7960a3f966a0f87888de0fc588999d026918449

commit r16-1108-gb7960a3f966a0f87888de0fc588999d026918449
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Jun 4 17:21:51 2025 +0200

    ranger: Add support for float <-> float casts [PR120231]
    
    I've noticed we don't even support say float -> double and other
    scalar floating point to scalar floating point conversions in the
    ranger, we just end up with VARYING for those.
    
    The following patch attempts to fix that.
    The reverse cast case uses float_binary_op_range_finish e.g. because
    if the result isn't infinite, then the source couldn't be infinite
    either even if the reverse fold_range would suggest that.
    And special cases the case of guaranteed widening cast (where
    we have assurance that all the source type values are exactly
    representable in the destination type; using ieee_bits for that).
    
    2025-06-04  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/120231
            * range-op-mixed.h (operator_cast::fold_range): Add overload
            with 3 {,const} frange & operands.  Change parameter names and
            add final override keywords for float <-> integer cast overloads.
            (operator_cast::op1_range): Likewise.
            * range-op-float.cc (operator_cast::fold_range): New overload
            with 3 {,const} frange & operands.
            (operator_cast::op1_range): Likewise.
    
            * gcc.dg/tree-ssa/pr120231-1.c: New test.
Comment 11 Jakub Jelinek 2025-06-04 21:15:32 UTC
Created attachment 61583 [details]
gcc16-pr120231.patch

Untested patch for the float -> int and int -> float casts and their reverse ops.

Unfortunately it regresses
FAIL: g++.dg/tree-ssa/loop-split-1.C   scan-tree-dump-times lsplit "loop split" 1
FAIL: gfortran.dg/inline_matmul_16.f90   -O   scan-tree-dump-times optimized "_gfortran_matmul" 1
tests, so I'll need to study those up tomorrow.
Comment 12 GCC Commits 2025-06-05 16:11:57 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:b6b238ddcb119bb51555ead9be0fa7b06b8a6be7

commit r16-1191-gb6b238ddcb119bb51555ead9be0fa7b06b8a6be7
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Jun 5 18:10:22 2025 +0200

    ranger: Add support for float <-> int casts [PR120231]
    
    The following patch adds support for float <-> integer conversions in
    ranger.
    The patch reverts part of the r16-571 changes, those changes were right
    for fold_range, but not for op1_range, where RO_IFI and RO_FIF are actually
    called rather than RO_IFF and RO_FII that the patch expected.
    Also, the float -> int operation actually uses FIX_TRUNC_EXPR tree code
    rather than NOP_EXPR or CONVERT_EXPR and int -> float uses FLOAT_EXPR,
    but I think we can just handle all of them using operator_cast, at least
    as long as we don't try to use VIEW_CONVERT_EXPR using that too; not really
    sure handling VCE at least for floating to integral or vice versa would
    be actually useful though.
    
    The patch "regressed" two tests, gfortran.dg/inline_matmul_16.f90 and
    g++.dg/tree-ssa/loop-split-1.C.  In the first case, there is a loop doing
    matmul on various sizes of matrices, up to 10x10 matrices, and Fortran
    FE given the options emits two implementations of the matmul, one inline
    for the case where the matmul has less than 1000 elements and one for
    larger matmuls.  The check for whatever reason uses floating point
    calculations and before this patch we weren't able to prove that all the
    matrices will be smaller than the cutoff and the test was checking for
    presence of the fallback call; with the patch we are able to figure it
    out and only keep the inline copy.  I've duplicated the test, once
    unmodified source which doesn't expect _gfortran_matmul string in optimized
    dump anymore, and another copy which uses volatile ten instead of 10 in
    loop upper bounds so that it has to keep the fallback and scans for it.
    The other test is g++.dg/tree-ssa/loop-split-1.C, which does
    constexpr unsigned s = 100000000;
    ...
        for(unsigned i = 0; i < s; ++i)
        {
            if(i == 0)
                a[i] = b[i] * c[i];
            else
                a[i] = (b[i] + c[i]) * c[i-1] * std::log(i);
        }
    and for some reason the successful loop splitting for which the test
    searches in a dump file is dependent on the errno fallback of std::log,
    where we do t = std::log((double)i); if ((double)i) u> 0); else log ((double)i);
    But i goes only from 1 to 100000000, so (double)i has the range
    [1.0, 100000000.0] with the patch and so we see it will never need errno
    nor raise exception.  I've tested adding + d for it where d is 0.0 but
    modifiable in some other TU, and tested it also with r14-2851 and r14-2852,
    where the former FAILed the test both unmodified and modified, while
    the latter PASSed both versions.
    
    2025-06-05  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/120231
            * range-op.cc (range_op_table::range_op_table): Register op_cast
            also for FLOAT_EXPR and FIX_TRUNC_EXPR.
            (RO_III): Adjust comment.
            (range_op_handler::op1_range): Handle RO_IFI rather than RO_IFF.
            Don't handle RO_FII.
            (range_operator::op1_range): Remove overload with
            irange &, tree, const frange &, const frange &, relation_trio
            and frange &, tree, const irange &, const irange &, relation_trio
            arguments.  Add overload with
            irange &, tree, const frange &, const irange &, relation_trio
            arguments.
            * range-op-mixed.h (operator_cast::op1_range): Remove overload with
            irange &, tree, const frange &, const frange &, relation_trio
            and frange &, tree, const irange &, const irange &, relation_trio
            arguments.  Add overload with
            irange &, tree, const frange &, const irange &, relation_trio and
            frange &, tree, const irange &, const frange &, relation_trio
            arguments.
            * range-op.h (range_operator::op1_cast): Remove overload with
            irange &, tree, const frange &, const frange &, relation_trio
            and frange &, tree, const irange &, const irange &, relation_trio
            arguments.  Add overload with
            irange &, tree, const frange &, const irange &, relation_trio
            arguments.
            * range-op-float.cc (operator_cast::fold_range): Implement
            float to int and int to float casts.
            (operator_cast::op1_range): Remove overload with
            irange &, tree, const frange &, const frange &, relation_trio
            and frange &, tree, const irange &, const irange &, relation_trio
            arguments.  Add overload with
            irange &, tree, const frange &, const irange &, relation_trio and
            frange &, tree, const irange &, const frange &, relation_trio
            arguments and implement reverse op of float to int and int to float
            cast there.
    
            * gcc.dg/tree-ssa/pr120231-2.c: New test.
            * gcc.dg/tree-ssa/pr120231-3.c: New test.
            * gfortran.dg/inline_matmul_16.f90: Don't expect any _gfortran_matmul
            strings in optimized dump.
            * gfortran.dg/inline_matmul_26.f90: New test.
            * g++.dg/tree-ssa/loop-split-1.C (d): New variable.
            (main): Use std::log (i + d) instead of std::log (i).
Comment 13 Jakub Jelinek 2025-06-05 16:14:32 UTC
Should be implemented now.
Comment 14 chenglulu 2025-06-11 06:30:24 UTC
Hi:

With the patch, the following testcase generates an incorrect assembly on LoongArch.

test.c
```
unsigned int step;
double sqrt (double x);
void test1 (double);
void
test ()
{
  test1 (0.5 / sqrt (1. + step));
}
```
# cc1 test.c -o test.s -Ofast -fdump-tree-dom3

```
test:
.LFB0 = .
	.cfi_startproc
	pcalau12i	$r12,%pc_hi20(.LC0)
	fld.d	$f0,$r12,%pc_lo12(.LC0)
	b	%plt(test1)
```
I found that in the dom3 optimization, the range calculation of _5 is incorrect.

```
Exporting new  global ranges:
============================
Global Exported: _5 = [frange] double [5.0e-1 (0x0.8p+0), 5.0e-1 (0x0.8p+0)]
========= Done =============
void test ()
{
  unsigned int step.0_1;
  double _2;
  double _3;
  double _4;
  double _5;

  <bb 2> [local count: 1073741824]:
  step.0_1 = step;
  _2 = (double) step.0_1;
  _3 = _2 + 1.0e+0; 
  _4 = .RSQRT (_3); 
  _5 = _4 * 5.0e-1; 
  test1 (5.0e-1);
  return;

}     
```
Comment 15 Sam James 2025-06-12 09:03:53 UTC
(In reply to chenglulu from comment #14)
> Hi:
> 
> With the patch, the following testcase generates an incorrect assembly on
> LoongArch.
> [...]

(For the record, filed as PR120638).