For the following function: _Bool f(unsigned long u64) { return (double)u64 >= 0.0; } at -O3 on AArch64, GCC generates: f: ucvtf d31, x0 fcmpe d31, #0.0 cset w0, ge ret whereas LLVM gives: f: mov w0, #1 ret i.e. it folds the comparison to constant true. GCC should do the same. This would be useful in the context of code like the following: unsigned long sqrtint(unsigned long a) { return __builtin_sqrt(a); } for which GCC currently generates: sqrtint: ucvtf d0, x0 fcmp d0, #0.0 bpl .L5 stp x29, x30, [sp, -16]! mov x29, sp bl sqrt ldp x29, x30, [sp], 16 fcvtzu x0, d0 ret .L5: fsqrt d0, d0 fcvtzu x0, d0 ret i.e. it tests if the input is non-negative so that errno can be set correctly, but the test is redundant since the input is necessarily non-negative. Indeed LLVM generates: sqrtint: ucvtf d0, x0 fsqrt d0, d0 fcvtzu x0, d0 ret for this case.
I think we don't have value range handlers for casts from integers to floating point, from floating point to integers and their reverse operations, and it would certainly be desirable to add those. Say if we know that some integer SSA_NAME has [42, 48] range, then the frange when that is converted to say double will be [42.0, 48.0]. Of course, when the boundary values can't be converted exactly, we need to take it into account etc.
We do not have cast operators between int and float. We are also missing some dispatch code for them as we haven't actually used some of those patterns yet. I am going to checked in a patch to trunk shortly using this PR as a tag. It supplies the needed patterns and blank versions of the required operator methods for each cast. These are located near the bottom of range-op-float.cc. These functions currently return false, which is the default behaviour if the functions did not exist. // Cast Float to Integer bool operator_cast::fold_range (irange &lhs, tree type, const frange &op1, const irange &, relation_trio) const; bool operator_cast::op1_range (frange &op1, tree type, const irange &lhs, const irange &, relation_trio) const; // Cast Integer to Float. bool operator_cast::fold_range (frange &lhs, tree type, const irange &op1, const frange &, relation_trio) const; bool operator_cast::op1_range (irange &op1, tree type, const frange &lhs, const frange &, relation_trio) const; Feel free to populate them for this PR. If I get a chance and no one else does, I may do some simple population. but it wont be complex since I'm not familiar with working on floats.
Yeah, I'd definitely appreciate blank handlers for those, which I can gradually try to implement. Working now on big-endian _BitInt, so it won't be immediately, but will try to get to it before summer.
The master branch has been updated by Andrew Macleod <amacleod@gcc.gnu.org>: https://gcc.gnu.org/g:6f375445ef09d5c97d5bcc0fcb6069612217963e commit r16-571-g6f375445ef09d5c97d5bcc0fcb6069612217963e Author: Andrew MacLeod <amacleod@redhat.com> Date: Mon May 12 11:41:37 2025 -0400 Add dispatch for casts between integer and float. GCC currently does not implement range operators for casting between integers and float. This patch adds the missing dispatch patterns and routines to facilitate implmenting these casts. PR tree-optimization/120231 * range-op-float.cc (operator_cast::fold_range): New variants. (operator_cast::op1_range): Likewise. * range-op-mixed.h (operator_cast::fold_range): Likewise. (operator_cast::op1_range): Likewise * range-op.cc (range_op_handler::fold_range): Add RO_FIF dispatch. (range_op_handler::op1_range): Add RO_IFF and RO_FII patterns. (range_operator::fold_range): Provide new variant default. (range_operator::op1_range): Likewise. * range-op.h (range_operator): Add new variant methods.
Hello! I have discovered a similar issue, commenting in case another data point is useful. The following: #include <math.h> _Float16 norm(_Float16 re, _Float16 im) { return sqrtf(re * re + im * im); } compiled with -march=armv8-a+fp16 -O3 gives norm: fmul h1, h1, h1 fmadd h1, h0, h0, h1 fcvt s0, h1 fcmp s0, #0.0 bpl .L4 stp x29, x30, [sp, -32]! mov x29, sp str s1, [sp, 28] bl sqrtf ldr s1, [sp, 28] ldp x29, x30, [sp], 32 fsqrt h0, h1 ret .L4: fsqrt h0, h1 ret and with -ffast-math added gives: norm: fmul h1, h1, h1 fmadd h0, h0, h0, h1 fsqrt h0, h0 ret I think GCC is being overly cautious here - the fast FSQRT case is fine without fast-math (the optimal sequence is emitted without -ffast-math for single- and double-precision floats).
I suppose that example boils down to whether code like: _Bool f(_Float16 a) { return a * a >= 0; } _Bool g(float a) { return a * a >= 0; } can be optimised to return true. We currently do it with -ffast-math but not without.
(In reply to Alex Coplan from comment #6) > I suppose that example boils down to whether code like: > > _Bool f(_Float16 a) { > return a * a >= 0; > } > _Bool g(float a) { > return a * a >= 0; > } > > can be optimised to return true. We currently do it with -ffast-math but > not without. And it is correct like that. If a is a NaN (qNaN or sNaN), then a * a >= 0 is false.
In any case, #c5 and onwards is completely unrelated to this PR, which is about value ranges for casts from integers to floating point and vice versa. So, please move that elsewhere.
Apparently we are missing range implementation of casts between different floating point types as well. Trying now: --- gcc/range-op-mixed.h.jj 2025-05-20 08:14:06.520404648 +0200 +++ gcc/range-op-mixed.h 2025-06-02 14:30:37.304673412 +0200 @@ -473,14 +473,15 @@ public: bool fold_range (prange &r, tree type, const irange &op1, const prange &op2, relation_trio rel = TRIO_VARYING) const final override; + bool fold_range (frange &r, tree type, + const frange &op1, const frange &op2, + relation_trio = TRIO_VARYING) const final override; bool fold_range (irange &r, tree type, - const frange &lh, - const irange &rh, - relation_trio = TRIO_VARYING) const; + const frange &op1, const irange &op2, + relation_trio = TRIO_VARYING) const final override; bool fold_range (frange &r, tree type, - const irange &lh, - const frange &rh, - relation_trio = TRIO_VARYING) const; + const irange &op1, const frange &op2, + relation_trio = TRIO_VARYING) const final override; bool op1_range (irange &r, tree type, const irange &lhs, const irange &op2, @@ -495,13 +496,14 @@ public: const irange &lhs, const prange &op2, relation_trio rel = TRIO_VARYING) const final override; bool op1_range (frange &r, tree type, - const irange &lhs, - const irange &op2, - relation_trio = TRIO_VARYING) const; + const frange &lhs, const frange &op2, + relation_trio = TRIO_VARYING) const final override; + bool op1_range (frange &r, tree type, + const irange &lhs, const irange &op2, + relation_trio = TRIO_VARYING) const final override; bool op1_range (irange &r, tree type, - const frange &lhs, - const frange &op2, - relation_trio = TRIO_VARYING) const; + const frange &lhs, const frange &op2, + relation_trio = TRIO_VARYING) const final override; relation_kind lhs_op1_relation (const irange &lhs, const irange &op1, const irange &op2, --- gcc/range-op-float.cc.jj 2025-05-20 08:14:06.519404661 +0200 +++ gcc/range-op-float.cc 2025-06-02 15:23:14.171003089 +0200 @@ -2899,6 +2899,107 @@ private: } } fop_div; +bool +operator_cast::fold_range (frange &r, tree type, const frange &op1, + const frange &, relation_trio) const +{ + REAL_VALUE_TYPE lb, ub; + enum machine_mode mode = TYPE_MODE (type); + bool mode_composite = MODE_COMPOSITE_P (mode); + + if (empty_range_varying (r, type, op1, op1)) + return true; + if (!MODE_HAS_NANS (mode) && op1.maybe_isnan ()) + { + r.set_varying (type); + return true; + } + if (op1.known_isnan ()) + { + r.set_nan (type); + return true; + } + + const REAL_VALUE_TYPE &lh_lb = op1.lower_bound (); + const REAL_VALUE_TYPE &lh_ub = op1.upper_bound (); + real_convert (&lb, mode, &lh_lb); + real_convert (&ub, mode, &lh_ub); + + if (flag_rounding_math) + { + if (real_less (&lh_lb, &lb)) + { + if (mode_composite + && (real_isdenormal (&lb, mode) || real_iszero (&lb))) + { + // IBM extended denormals only have DFmode precision. + REAL_VALUE_TYPE tmp, tmp2; + real_convert (&tmp2, DFmode, &lh_lb); + real_nextafter (&tmp, REAL_MODE_FORMAT (DFmode), &tmp2, + &dconstninf); + real_convert (&lb, mode, &tmp); + } + else + frange_nextafter (mode, lb, dconstninf); + } + if (real_less (&ub, &lh_ub)) + { + if (mode_composite + && (real_isdenormal (&ub, mode) || real_iszero (&ub))) + { + // IBM extended denormals only have DFmode precision. + REAL_VALUE_TYPE tmp, tmp2; + real_convert (&tmp2, DFmode, &lh_ub); + real_nextafter (&tmp, REAL_MODE_FORMAT (DFmode), &tmp2, + &dconstinf); + real_convert (&ub, mode, &tmp); + } + else + frange_nextafter (mode, ub, dconstinf); + } + } + + r.set (type, lb, ub, op1.get_nan_state ()); + + if (flag_trapping_math + && MODE_HAS_INFINITIES (TYPE_MODE (type)) + && r.known_isinf () + && !op1.known_isinf ()) + { + REAL_VALUE_TYPE inf = r.lower_bound (); + if (real_isneg (&inf)) + { + REAL_VALUE_TYPE min = real_min_representable (type); + r.set (type, inf, min); + } + else + { + REAL_VALUE_TYPE max = real_max_representable (type); + r.set (type, max, inf); + } + } + + r.flush_denormals_to_zero (); + return true; +} + +// Implement fold for a cast from float to another float. +bool +operator_cast::op1_range (frange &r, tree type, const frange &lhs, + const frange &op2, relation_trio) const +{ + if (lhs.undefined_p ()) + return false; + tree lhs_type = lhs.type (); + frange wlhs = float_widen_lhs_range (lhs_type, lhs); + auto save_flag_rounding_math = flag_rounding_math; + flag_rounding_math = 1; + bool ret = float_binary_op_range_finish (fold_range (r, type, wlhs, op2), + r, type, wlhs); + flag_rounding_math = save_flag_rounding_math; + return ret; +} + // Implement fold for a cast from float to an int. bool operator_cast::fold_range (irange &, tree, const frange &,
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:b7960a3f966a0f87888de0fc588999d026918449 commit r16-1108-gb7960a3f966a0f87888de0fc588999d026918449 Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Jun 4 17:21:51 2025 +0200 ranger: Add support for float <-> float casts [PR120231] I've noticed we don't even support say float -> double and other scalar floating point to scalar floating point conversions in the ranger, we just end up with VARYING for those. The following patch attempts to fix that. The reverse cast case uses float_binary_op_range_finish e.g. because if the result isn't infinite, then the source couldn't be infinite either even if the reverse fold_range would suggest that. And special cases the case of guaranteed widening cast (where we have assurance that all the source type values are exactly representable in the destination type; using ieee_bits for that). 2025-06-04 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/120231 * range-op-mixed.h (operator_cast::fold_range): Add overload with 3 {,const} frange & operands. Change parameter names and add final override keywords for float <-> integer cast overloads. (operator_cast::op1_range): Likewise. * range-op-float.cc (operator_cast::fold_range): New overload with 3 {,const} frange & operands. (operator_cast::op1_range): Likewise. * gcc.dg/tree-ssa/pr120231-1.c: New test.
Created attachment 61583 [details] gcc16-pr120231.patch Untested patch for the float -> int and int -> float casts and their reverse ops. Unfortunately it regresses FAIL: g++.dg/tree-ssa/loop-split-1.C scan-tree-dump-times lsplit "loop split" 1 FAIL: gfortran.dg/inline_matmul_16.f90 -O scan-tree-dump-times optimized "_gfortran_matmul" 1 tests, so I'll need to study those up tomorrow.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:b6b238ddcb119bb51555ead9be0fa7b06b8a6be7 commit r16-1191-gb6b238ddcb119bb51555ead9be0fa7b06b8a6be7 Author: Jakub Jelinek <jakub@redhat.com> Date: Thu Jun 5 18:10:22 2025 +0200 ranger: Add support for float <-> int casts [PR120231] The following patch adds support for float <-> integer conversions in ranger. The patch reverts part of the r16-571 changes, those changes were right for fold_range, but not for op1_range, where RO_IFI and RO_FIF are actually called rather than RO_IFF and RO_FII that the patch expected. Also, the float -> int operation actually uses FIX_TRUNC_EXPR tree code rather than NOP_EXPR or CONVERT_EXPR and int -> float uses FLOAT_EXPR, but I think we can just handle all of them using operator_cast, at least as long as we don't try to use VIEW_CONVERT_EXPR using that too; not really sure handling VCE at least for floating to integral or vice versa would be actually useful though. The patch "regressed" two tests, gfortran.dg/inline_matmul_16.f90 and g++.dg/tree-ssa/loop-split-1.C. In the first case, there is a loop doing matmul on various sizes of matrices, up to 10x10 matrices, and Fortran FE given the options emits two implementations of the matmul, one inline for the case where the matmul has less than 1000 elements and one for larger matmuls. The check for whatever reason uses floating point calculations and before this patch we weren't able to prove that all the matrices will be smaller than the cutoff and the test was checking for presence of the fallback call; with the patch we are able to figure it out and only keep the inline copy. I've duplicated the test, once unmodified source which doesn't expect _gfortran_matmul string in optimized dump anymore, and another copy which uses volatile ten instead of 10 in loop upper bounds so that it has to keep the fallback and scans for it. The other test is g++.dg/tree-ssa/loop-split-1.C, which does constexpr unsigned s = 100000000; ... for(unsigned i = 0; i < s; ++i) { if(i == 0) a[i] = b[i] * c[i]; else a[i] = (b[i] + c[i]) * c[i-1] * std::log(i); } and for some reason the successful loop splitting for which the test searches in a dump file is dependent on the errno fallback of std::log, where we do t = std::log((double)i); if ((double)i) u> 0); else log ((double)i); But i goes only from 1 to 100000000, so (double)i has the range [1.0, 100000000.0] with the patch and so we see it will never need errno nor raise exception. I've tested adding + d for it where d is 0.0 but modifiable in some other TU, and tested it also with r14-2851 and r14-2852, where the former FAILed the test both unmodified and modified, while the latter PASSed both versions. 2025-06-05 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/120231 * range-op.cc (range_op_table::range_op_table): Register op_cast also for FLOAT_EXPR and FIX_TRUNC_EXPR. (RO_III): Adjust comment. (range_op_handler::op1_range): Handle RO_IFI rather than RO_IFF. Don't handle RO_FII. (range_operator::op1_range): Remove overload with irange &, tree, const frange &, const frange &, relation_trio and frange &, tree, const irange &, const irange &, relation_trio arguments. Add overload with irange &, tree, const frange &, const irange &, relation_trio arguments. * range-op-mixed.h (operator_cast::op1_range): Remove overload with irange &, tree, const frange &, const frange &, relation_trio and frange &, tree, const irange &, const irange &, relation_trio arguments. Add overload with irange &, tree, const frange &, const irange &, relation_trio and frange &, tree, const irange &, const frange &, relation_trio arguments. * range-op.h (range_operator::op1_cast): Remove overload with irange &, tree, const frange &, const frange &, relation_trio and frange &, tree, const irange &, const irange &, relation_trio arguments. Add overload with irange &, tree, const frange &, const irange &, relation_trio arguments. * range-op-float.cc (operator_cast::fold_range): Implement float to int and int to float casts. (operator_cast::op1_range): Remove overload with irange &, tree, const frange &, const frange &, relation_trio and frange &, tree, const irange &, const irange &, relation_trio arguments. Add overload with irange &, tree, const frange &, const irange &, relation_trio and frange &, tree, const irange &, const frange &, relation_trio arguments and implement reverse op of float to int and int to float cast there. * gcc.dg/tree-ssa/pr120231-2.c: New test. * gcc.dg/tree-ssa/pr120231-3.c: New test. * gfortran.dg/inline_matmul_16.f90: Don't expect any _gfortran_matmul strings in optimized dump. * gfortran.dg/inline_matmul_26.f90: New test. * g++.dg/tree-ssa/loop-split-1.C (d): New variable. (main): Use std::log (i + d) instead of std::log (i).
Should be implemented now.
Hi: With the patch, the following testcase generates an incorrect assembly on LoongArch. test.c ``` unsigned int step; double sqrt (double x); void test1 (double); void test () { test1 (0.5 / sqrt (1. + step)); } ``` # cc1 test.c -o test.s -Ofast -fdump-tree-dom3 ``` test: .LFB0 = . .cfi_startproc pcalau12i $r12,%pc_hi20(.LC0) fld.d $f0,$r12,%pc_lo12(.LC0) b %plt(test1) ``` I found that in the dom3 optimization, the range calculation of _5 is incorrect. ``` Exporting new global ranges: ============================ Global Exported: _5 = [frange] double [5.0e-1 (0x0.8p+0), 5.0e-1 (0x0.8p+0)] ========= Done ============= void test () { unsigned int step.0_1; double _2; double _3; double _4; double _5; <bb 2> [local count: 1073741824]: step.0_1 = step; _2 = (double) step.0_1; _3 = _2 + 1.0e+0; _4 = .RSQRT (_3); _5 = _4 * 5.0e-1; test1 (5.0e-1); return; } ```
(In reply to chenglulu from comment #14) > Hi: > > With the patch, the following testcase generates an incorrect assembly on > LoongArch. > [...] (For the record, filed as PR120638).