Bug 24021 - VRP does not work with floating points
Summary: VRP does not work with floating points
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.1.0
: P2 enhancement
Target Milestone: 13.0
Assignee: Aldy Hernandez
URL:
Keywords: missed-optimization
Depends on:
Blocks: VRP 68097
  Show dependency treegraph
 
Reported: 2005-09-22 20:59 UTC by Andrew Pinski
Modified: 2022-11-28 22:15 UTC (History)
14 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2007-07-01 00:23:36


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Pinski 2005-09-22 20:59:19 UTC
Take the following example:
double BG_SplineLength ()
{
  double lastPoint;
  double i;

  for (i = 0.01;i<=1;i+=0.1f)
    if (!(i != 0.0))
      {
	lastPoint = i;
      }
    else
      {
        lastPoint = 2;
      }
  return lastPoint;
}

The loop is useless and we should remove the loop and make the function just return 2.0;
Comment 1 Andrew Pinski 2005-10-23 23:42:58 UTC
Confirmed.
Comment 2 Manuel López-Ibáñez 2016-09-11 19:40:35 UTC
Could this be a summer of code project? Numerically intensive programs should benefit tremendously, shouldn't they?
Comment 3 Vincent Lefèvre 2020-03-04 11:36:27 UTC
But note that the optimization should be modified or disabled in contexts where floating-point exceptions need to be honored, as the i+=0.1f will sometimes trigger the inexact exception.
Comment 4 Andrew Macleod 2021-08-25 20:29:34 UTC
FWIW, we hope to enable floating point range support in ranger for GCC 13. 

One of the post stage 1 tasks is to generalize ranger to use a vrange class, from which we can derive integral (irange), pointer (prange), and float (frange) types. possibly complex integral (cirange) and/or string (srange) have been thrown around if appropriate.

Before we go to develop the frange class, we'll have a working session of some sort to flesh out what it can track, then implement some range-ops for some float tree-codes.
Comment 5 Aldy Hernandez 2022-02-07 16:07:45 UTC
(In reply to Andrew Macleod from comment #4)
> FWIW, we hope to enable floating point range support in ranger for GCC 13. 
> 
> One of the post stage 1 tasks is to generalize ranger to use a vrange class,
> from which we can derive integral (irange), pointer (prange), and float
> (frange) types. possibly complex integral (cirange) and/or string (srange)
> have been thrown around if appropriate.
> 
> Before we go to develop the frange class, we'll have a working session of
> some sort to flesh out what it can track, then implement some range-ops for
> some float tree-codes.

FWIW, I'm on this.  I've begun work on vrange, and simultaneously have started work on a proof-of-concept for floating points that I hope some more savvy floating experts can extend.
Comment 6 Aldy Hernandez 2022-02-09 16:35:03 UTC
Preview of what's to come.

Implementing a bare bones frange class and associated relational operators in range-ops we can get:

void link_error();
void func();

int foo(float f)
{
  if (f > 3.0)
    {
      func();
      if (f < 2.0)
        link_error();
    }
}

Folding statement: if (f_2(D) > 3.0e+0)
Not folded
Folding statement: func ();
Not folded
Folding statement: if (f_2(D) < 2.0e+0)
Folding predicate f_2(D) < 2.0e+0 to 0
Folded into: if (0 != 0)

And we can also get symbolics:

int bar(float f, float g)
{
  if (f > g)
    {
      func();
      if (f < g)
        link_error();
    }
}

Folding statement: if (f_2(D) < g_3(D))
 folding with relation f_2(D) > g_3(D)
Folding predicate f_2(D) < g_3(D) to 0
Folded into: if (0 != 0)

My proof of concept has ranger dumps looking like:

=========== BB 2 ============
Imports: f_2(D)  
Exports: f_2(D)  
f_2(D)  float VARYING
    <bb 2> :
    if (f_2(D) > 3.0e+0)
      goto <bb 3>; [INV]
    else
      goto <bb 5>; [INV]

2->3  (T) f_2(D) :      float (3.0e+0, +Inf]
2->5  (F) f_2(D) :      float [-Inf, 3.0e+0]


Interestingly, since the threader uses ranger, I had to turn off the threader for the above snippets, because ethread gets the first one before evrp even gets a whack at it:

  [1] Registering jump thread: (2, 3) incoming edge;  (3, 5) nocopy; 
path: 2->3->5 SUCCESS
Removing basic block 3
;; basic block 3, loop depth 0
;;  pred:      
func ();
if (f_2(D) < 2.0e+0)
  goto <bb 4>; [INV]
else
  goto <bb 5>; [INV]
;;  succ:       4
;;              5

As I've mentioned, I'm hoping some floating expert can take this across to goal line, as my head will start spinning as soon as we start talking about NANs and such.  The range-op work will likely require floating specialized knowledge.
Comment 7 Jeffrey A. Law 2022-02-09 17:04:54 UTC
Very cool.  ANd no, I'm not enough of an expert on the FP side to shepherd that though.  I would expect it to be exceptionally tricky on the solver side.

Probably the most useful things I've come across would be knowing if a particular value can or can not have certain special values.  ie, [qs]NaN, +-0.0, +-Inf.  Knowing how an value relates to 0 is also quite helpful.  ie, > 0, < 0 and the like.
Comment 8 Aldy Hernandez 2022-02-09 17:27:28 UTC
(In reply to Jeffrey A. Law from comment #7)
> Very cool.  ANd no, I'm not enough of an expert on the FP side to shepherd
> that though.  I would expect it to be exceptionally tricky on the solver
> side.

The solver, or ranger/gori, should be isolated from changes.  The main goal is to make ranger work with generic vrange's, not iranges or franges.  Once I'm done with the abstraction, it should only be value-range.* (class frange) and range-ops* that should be affected.  That's the theory anyhow ;-).
Comment 9 Vincent Lefèvre 2022-02-09 17:29:37 UTC
(In reply to Aldy Hernandez from comment #6)
> As I've mentioned, I'm hoping some floating expert can take this across to
> goal line, as my head will start spinning as soon as we start talking about
> NANs and such.  The range-op work will likely require floating specialized
> knowledge.

Subnormals might also need to be considered as special cases: "Whether and in what cases subnormal numbers are treated as zeros is implementation defined." will be added to C23 (some behaviors are dictated by the hardware, e.g. ARM in some non-IEEE configurations), but I've asked for clarification in the CFP mailing-list.
Comment 10 Vincent Lefèvre 2022-02-09 17:31:03 UTC
(In reply to Vincent Lefèvre from comment #9)
> Subnormals might also need to be considered as special cases: "Whether and
> in what cases subnormal numbers are treated as zeros is implementation
> defined." will be added to C23 (some behaviors are dictated by the hardware,
> e.g. ARM in some non-IEEE configurations), but I've asked for clarification
> in the CFP mailing-list.

Some details in http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2797.htm
Comment 11 Jakub Jelinek 2022-02-09 18:30:32 UTC
Various match.pd patterns use predicates like:
(simplify
 (mult @0 real_zerop@1)
 (if (!tree_expr_maybe_nan_p (@0)
      && (!HONOR_NANS (type) || !tree_expr_maybe_infinite_p (@0))
      && (!HONOR_SIGNED_ZEROS (type) || tree_expr_nonnegative_p (@0)))
  @1))
   
/* In IEEE floating point, x*1 is not equivalent to x for snans.
   Likewise for complex arithmetic with signed zeros.  */
(simplify
 (mult @0 real_onep)
 (if (!tree_expr_maybe_signaling_nan_p (@0)
      && (!HONOR_SIGNED_ZEROS (type)
          || !COMPLEX_FLOAT_TYPE_P (type)))
  (non_lvalue @0)))

etc. which would be great if it could use frange.  Though, I think we also run here into the problem that match.pd right now avoids the ranger because it could reliably only allow walks from uses to SSA_NAME_DEF_STMTs and not the other direction.
But of course those few simplifications are simple enough that they could be also repeated somewhere in the e?vrp code.

One thing to consider is that at runtime, not all arithmetics might be as precise as what mpfr does at compile time, so in some cases it should expect a few ulps or maybe even more than that as possible errors (especially for library functions).  And also take into account different rounding modes if user wants that to be honored.

As for exceptions, I guess one thing is that ranger computes ranges and another thing is that optimization based on that will sometimes need to punt if it could optimize away visible side-effects the user cares about.
Comment 12 Andrew Macleod 2022-02-09 18:31:11 UTC
(In reply to Aldy Hernandez from comment #8)
> (In reply to Jeffrey A. Law from comment #7)
> > Very cool.  ANd no, I'm not enough of an expert on the FP side to shepherd
> > that though.  I would expect it to be exceptionally tricky on the solver
> > side.
> 
> The solver, or ranger/gori, should be isolated from changes.  The main goal
> is to make ranger work with generic vrange's, not iranges or franges.  Once
> I'm done with the abstraction, it should only be value-range.* (class
> frange) and range-ops* that should be affected.  That's the theory anyhow
> ;-).

Yeah, once we get things in place, there will be a skeleton frange class representation that does some set of basic operations.
 
One or more floating point experts are welcome then to flush out additional representational requirements in the frange class and range-op implementations of any tree-codes operations.   There shouldn't be any changed required anywhere else. VRP/threader/relations/folding/etc are all driven by range-ops.

And we are of course happy to help when the time comes and GCC13 opens up and the abstraction code goes in.  If someone is really keen, they may be able to work on a development branch a bit earlier than that... 

I also have a short list of practical floating point considerations I collected a couple of years ago. So I don't lose it:

finite
non-zero/positive/negative
normalized
in the range [-1,1] (for trig functions)
which I would expect to come from adding some domain/range knowledge from standard math functions, and propagating range info from integer stuff from which floating point values are derived, on top of the usual stuff.

e.g. in embedded world floating-point values from peripherals would commonly come from some PWM feature via integer division by a constant denominator, which could tell us a fair bit (even more if it could use builtin expect information about the numerator).

as far as existing consumers for this info - I'm not sure if glibc still has special finite versions of functions, but those could be used based on range info. likewise, knowing a value is normal or in a bounded range could allow for 'relaxed' versions of library functions to be used. there should also be some inline sequences or other code transforms that are controlled by finite-math-only and also enabled for provably finite values.
Comment 13 Andrew Macleod 2022-02-09 18:35:31 UTC
> etc. which would be great if it could use frange.  Though, I think we also
> run here into the problem that match.pd right now avoids the ranger because
> it could reliably only allow walks from uses to SSA_NAME_DEF_STMTs and not
> the other direction.

That only happens with pointers and the non-null property right now, and even that is going away to start GCC13. I have the prototype underway now.

.
> One thing to consider is that at runtime, not all arithmetics might be as
> precise as what mpfr does at compile time, so in some cases it should expect
> a few ulps or maybe even more than that as possible errors (especially for
> library functions).  And also take into account different rounding modes if
> user wants that to be honored.
> 

We'll leave those to the experts :-)

> As for exceptions, I guess one thing is that ranger computes ranges and
> another thing is that optimization based on that will sometimes need to punt
> if it could optimize away visible side-effects the user cares about.

yeah, like with no-delete-null-pointer, there may be flag checking required for some things.
Comment 14 Vincent Lefèvre 2022-02-09 19:05:00 UTC
(In reply to Jakub Jelinek from comment #11)
> And also take into account different rounding modes if
> user wants that to be honored.

And that would eliminate the need to consider the possibility of double rounding in case of intermediate extended precision (as with x87).
Comment 15 Richard Biener 2022-02-10 09:33:07 UTC
I think that for match.pd we should rely mostly on global ranges which means
extending SSA name ranges for floats.  Indeed I think the most important
parts will be tracking whether values can be NaNs, Infs or whether they are
known to be negative, positive or zero.
Comment 16 Jakub Jelinek 2022-02-10 12:02:03 UTC
But just tracking those fpclassify/signbit properties wouldn't be enough, because in many cases e.g. whether something can be infinite or not will depend on more precise value ranges.
If we track just a bitmask, can the value be:
zero
subnormal
normal
infinite
qNaN
sNaN
have positive signbit
have negative signbit
then even on simple multiplication or addition we'll need to assume from normal * normal or normal + normal that it can be infinite.  When we know that
one operand is [-15.123 - epsilon, 23.152 + epsilon] and the other is
[256.0 - epsilon, 512.0 + epsilon], we actually can find out the result will not be infinite etc.
But sure, tracking in a bitmask the above properties in addition to some approximate range is useful.
Comment 17 rguenther@suse.de 2022-02-10 12:21:25 UTC
On Thu, 10 Feb 2022, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24021
> 
> --- Comment #16 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> But just tracking those fpclassify/signbit properties wouldn't be enough,
> because in many cases e.g. whether something can be infinite or not will depend
> on more precise value ranges.
> If we track just a bitmask, can the value be:
> zero
> subnormal
> normal
> infinite
> qNaN
> sNaN
> have positive signbit
> have negative signbit
> then even on simple multiplication or addition we'll need to assume from normal
> * normal or normal + normal that it can be infinite.  When we know that
> one operand is [-15.123 - epsilon, 23.152 + epsilon] and the other is
> [256.0 - epsilon, 512.0 + epsilon], we actually can find out the result will
> not be infinite etc.
> But sure, tracking in a bitmask the above properties in addition to some
> approximate range is useful.

Yes.  What I was trying to say is that while the value-range
propagation process should likely track actual FP value ranges it
remains to be seen whether we need to store those into the SSA name
info or whether tracking a set of flags is good enough in practice
(given that match.pd should generally not invoke ranger in
resolving mode but look at what's in SSA annotations).
Comment 18 Vincent Lefèvre 2022-02-10 14:17:40 UTC
I'm wondering whether it is possible to check on actual code what is needed. For instance, assume that you have a program that produces always the same results, e.g. by running it over a fixed dataset. GCC could get some information about actual FP values (a bit like profile generation). Then check what benefit you can get by using these data at compile time (contrary to optimization with profile use, you assume here that the obtained information is necessarily valid, which is true as long as the program is run on the chosen dataset). The difficulty is to find whether some benefit can be obtained by VRP, but this should give an upper bound on the speedup you can hope to get. So perhaps this can give some useful information about what to focus on.
Comment 19 Andrew Macleod 2022-04-05 19:34:39 UTC
We can use the original testcase as the litmus test for basic support if we compile it with

-O2  -fno-tree-fre  -fno-tree-dominator-opts

The unroller will unroll the loop and the VRP2 pass will be presented with:

 <bb 2> [local count: 97603129]:
 i_1 = 1.1000000149011611438876201418679556809365749359130859375e-1;
 i_17 = i_1 + 1.00000001490116119384765625e-1;
 i_22 = i_17 + 1.00000001490116119384765625e-1;
 i_27 = i_22 + 1.00000001490116119384765625e-1;
 i_32 = i_27 + 1.00000001490116119384765625e-1;
 i_37 = i_32 + 1.00000001490116119384765625e-1;
 i_42 = i_37 + 1.00000001490116119384765625e-1;
 i_47 = i_42 + 1.00000001490116119384765625e-1;
 i_52 = i_47 + 1.00000001490116119384765625e-1;
 if (i_52 == 0.0)
   goto <bb 4>; [50.00%]
 else
   goto <bb 3>; [50.00%]

 <bb 3> [local count: 48801565]:

 <bb 4> [local count: 97603129]:
 # lastPoint_12 = PHI <i_52(2), 2.0e+0(3)>
 return lastPoint_12;

Which is we can track floating point ranges in VRP and simplification, recognize that i_52 != 0.0 and VRP2 should be able to resolve this to 

return 2.0e+0;
Comment 20 Aldy Hernandez 2022-04-06 09:34:21 UTC
(In reply to Andrew Macleod from comment #19)
> We can use the original testcase as the litmus test for basic support if we
> compile it with
> 
> -O2  -fno-tree-fre  -fno-tree-dominator-opts
> 
> The unroller will unroll the loop and the VRP2 pass will be presented with:
> 
>  <bb 2> [local count: 97603129]:
>  i_1 = 1.1000000149011611438876201418679556809365749359130859375e-1;
>  i_17 = i_1 + 1.00000001490116119384765625e-1;
>  i_22 = i_17 + 1.00000001490116119384765625e-1;
>  i_27 = i_22 + 1.00000001490116119384765625e-1;
>  i_32 = i_27 + 1.00000001490116119384765625e-1;
>  i_37 = i_32 + 1.00000001490116119384765625e-1;
>  i_42 = i_37 + 1.00000001490116119384765625e-1;
>  i_47 = i_42 + 1.00000001490116119384765625e-1;
>  i_52 = i_47 + 1.00000001490116119384765625e-1;
>  if (i_52 == 0.0)
>    goto <bb 4>; [50.00%]
>  else
>    goto <bb 3>; [50.00%]
> 
>  <bb 3> [local count: 48801565]:
> 
>  <bb 4> [local count: 97603129]:
>  # lastPoint_12 = PHI <i_52(2), 2.0e+0(3)>
>  return lastPoint_12;
> 
> Which is we can track floating point ranges in VRP and simplification,
> recognize that i_52 != 0.0 and VRP2 should be able to resolve this to 
> 
> return 2.0e+0;

We can definitely get this with the current work.  All that's missing is the snippet implementing the PLUS_EXPR operator.  A simple implementation (with likely the NAN bits wrong :)) follows:

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index d1c7a3b571b..cba7166ad2b 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -1055,6 +1055,54 @@ foperator_mult::fold_range (frange &r, tree type,
   return true;
 }
 
+class foperator_plus : public range_operator
+{
+  using range_operator::fold_range;
+public:
+  virtual bool fold_range (frange &r, tree type,
+			   const frange &lh,
+			   const frange &rh,
+			   relation_kind rel = VREL_NONE) const override;
+} fop_plus;
+
+bool
+foperator_plus::fold_range (frange &r, tree type,
+			    const frange &lh,
+			    const frange &rh,
+			    relation_kind) const
+{
+  if (empty_range_varying (r, type, lh, rh))
+    return true;
+
+  if (lh.get_prop (FRANGE_PROP_NAN)
+      || rh.get_prop (FRANGE_PROP_NAN))
+    {
+      r.set_varying (type);
+      return true;
+    }
+
+  enum tree_code code_lb, code_ub;
+  tree lh_lb = lh.lower_bound (&code_lb);
+  tree lh_ub = lh.upper_bound (&code_ub);
+  tree rh_lb = rh.lower_bound ();
+  tree rh_ub = rh.upper_bound ();
+
+  REAL_VALUE_TYPE res_lb, res_ub;
+  real_arithmetic (&res_lb, PLUS_EXPR,
+		   TREE_REAL_CST_PTR (lh_lb),
+		   TREE_REAL_CST_PTR (rh_lb));
+  real_arithmetic (&res_ub, PLUS_EXPR,
+		   TREE_REAL_CST_PTR (lh_ub),
+		   TREE_REAL_CST_PTR (rh_ub));
+
+  r.set (code_lb,
+	 build_real (type, res_lb),
+	 build_real (type, res_ub),
+	 code_ub);
+  return true;
+}
+
+
 class floating_table : public range_op_table
 {
 public:
@@ -1081,6 +1129,7 @@ floating_table::floating_table ()
   set (UNORDERED_EXPR, fop_unordered);
   set (ORDERED_EXPR, fop_ordered);
   set (MULT_EXPR, fop_mult);
+  set (PLUS_EXPR, fop_plus);
 }
 
 #if CHECKING_P


And voila...

./cc1 a.c -quiet -I/tmp -O2 -fno-tree-fre -fno-tree-dominator-opts -fno-thread-jumps -fdump-tree-vrp2-details=/dev/stdout

Folding statement: i_1 = 1.1000000149011611438876201418679556809365749359130859375e-1;
Queued stmt for removal.  Folds to: 1.1000000149011611438876201418679556809365749359130859375e-1
Folding statement: i_17 = i_1 + 1.00000001490116119384765625e-1;
Queued stmt for removal.  Folds to: 2.1000000298023223377352763918679556809365749359130859375e-1
...
...
...
double BG_SplineLength ()
{
  double i;
  double lastPoint;

  <bb 2> [local count: 97603129]:
  return 2.0e+0;

}

Notice the -fno-thread-jumps, because otherwise the now floating point aware threader will thread the conditional before VRP2 gets a chance to see it.

I should probably take this PR ;-).
Comment 21 Jakub Jelinek 2022-04-06 09:46:27 UTC
(In reply to Aldy Hernandez from comment #20)
> (In reply to Andrew Macleod from comment #19)
> > We can use the original testcase as the litmus test for basic support if we
> > compile it with
> > 
> > -O2  -fno-tree-fre  -fno-tree-dominator-opts
> > 
> > The unroller will unroll the loop and the VRP2 pass will be presented with:
> > 
> >  <bb 2> [local count: 97603129]:
> >  i_1 = 1.1000000149011611438876201418679556809365749359130859375e-1;
> >  i_17 = i_1 + 1.00000001490116119384765625e-1;
> >  i_22 = i_17 + 1.00000001490116119384765625e-1;
> >  i_27 = i_22 + 1.00000001490116119384765625e-1;
> >  i_32 = i_27 + 1.00000001490116119384765625e-1;
> >  i_37 = i_32 + 1.00000001490116119384765625e-1;
> >  i_42 = i_37 + 1.00000001490116119384765625e-1;
> >  i_47 = i_42 + 1.00000001490116119384765625e-1;
> >  i_52 = i_47 + 1.00000001490116119384765625e-1;
> >  if (i_52 == 0.0)
> >    goto <bb 4>; [50.00%]
> >  else
> >    goto <bb 3>; [50.00%]
> > 
> >  <bb 3> [local count: 48801565]:
> > 
> >  <bb 4> [local count: 97603129]:
> >  # lastPoint_12 = PHI <i_52(2), 2.0e+0(3)>
> >  return lastPoint_12;
> > 
> > Which is we can track floating point ranges in VRP and simplification,
> > recognize that i_52 != 0.0 and VRP2 should be able to resolve this to 
> > 
> > return 2.0e+0;
> 
> We can definitely get this with the current work.  All that's missing is the
> snippet implementing the PLUS_EXPR operator.  A simple implementation (with
> likely the NAN bits wrong :)) follows:
> 
> diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
> index d1c7a3b571b..cba7166ad2b 100644
> --- a/gcc/range-op-float.cc
> +++ b/gcc/range-op-float.cc
> @@ -1055,6 +1055,54 @@ foperator_mult::fold_range (frange &r, tree type,
>    return true;
>  }
>  
> +class foperator_plus : public range_operator
> +{
> +  using range_operator::fold_range;
> +public:
> +  virtual bool fold_range (frange &r, tree type,
> +			   const frange &lh,
> +			   const frange &rh,
> +			   relation_kind rel = VREL_NONE) const override;
> +} fop_plus;
> +
> +bool
> +foperator_plus::fold_range (frange &r, tree type,
> +			    const frange &lh,
> +			    const frange &rh,
> +			    relation_kind) const
> +{
> +  if (empty_range_varying (r, type, lh, rh))
> +    return true;
> +
> +  if (lh.get_prop (FRANGE_PROP_NAN)
> +      || rh.get_prop (FRANGE_PROP_NAN))
> +    {
> +      r.set_varying (type);
> +      return true;
> +    }
> +
> +  enum tree_code code_lb, code_ub;
> +  tree lh_lb = lh.lower_bound (&code_lb);
> +  tree lh_ub = lh.upper_bound (&code_ub);
> +  tree rh_lb = rh.lower_bound ();
> +  tree rh_ub = rh.upper_bound ();
> +
> +  REAL_VALUE_TYPE res_lb, res_ub;
> +  real_arithmetic (&res_lb, PLUS_EXPR,
> +		   TREE_REAL_CST_PTR (lh_lb),
> +		   TREE_REAL_CST_PTR (rh_lb));
> +  real_arithmetic (&res_ub, PLUS_EXPR,
> +		   TREE_REAL_CST_PTR (lh_ub),
> +		   TREE_REAL_CST_PTR (rh_ub));
> +
> +  r.set (code_lb,
> +	 build_real (type, res_lb),
> +	 build_real (type, res_ub),
> +	 code_ub);
> +  return true;
> +}

This doesn't take flag_rounding_math or not always exactly precise floating point computations into account.
It is also missing real_convert after real_arithmetics that performs at least
some of the rounding (and perhaps the inexact return value from real_arithmetic
should be taken into account).
Without flag_rounding_math and on non-MODE_COMPOSITE_P the basic arithmetics
will be probably exact most of the time, except perhaps for denormals which are
sometimes flushed to zero.  But as soon as one jumps to even sqrt and other math functions, one needs to allow some epsilon up for upper bound and down for lower bound, similarly for the basic ops with inexact and flag_rounding_math.
For MODE_COMPOSITE_P, our emulation doesn't match what is done at runtime, so we need to be more forgiving.
Comment 22 Aldy Hernandez 2022-04-06 09:57:57 UTC
> This doesn't take flag_rounding_math or not always exactly precise floating
> point computations into account.
> It is also missing real_convert after real_arithmetics that performs at least
> some of the rounding (and perhaps the inexact return value from
> real_arithmetic
> should be taken into account).
> Without flag_rounding_math and on non-MODE_COMPOSITE_P the basic arithmetics
> will be probably exact most of the time, except perhaps for denormals which
> are
> sometimes flushed to zero.  But as soon as one jumps to even sqrt and other
> math functions, one needs to allow some epsilon up for upper bound and down
> for lower bound, similarly for the basic ops with inexact and
> flag_rounding_math.
> For MODE_COMPOSITE_P, our emulation doesn't match what is done at runtime,
> so we need to be more forgiving.

Definitely.  I'm pretty sure there's even more stuff I'm missing :).  We're only providing the core infrastructure-- the ability for ranger / vrp to handle basic floats.  We're hoping some float savvy hacker can fill in the rest.

The above does handle open and closed internals, though.  So it should be able to handle (5.0, 10.0] + [1.0, 1.0] if I understand correctly.  But I don't pretend to remotely know anything about floats.  I'm hoping the range-op entries can be implemented by a floating expert.

A bare bones implementation would provide relops, and maybe ?? singleton floats ([1.23, 1.23]).
Comment 23 Aldy Hernandez 2022-04-06 10:30:21 UTC
(In reply to Aldy Hernandez from comment #22)
> > This doesn't take flag_rounding_math or not always exactly precise floating
> > point computations into account.
> > It is also missing real_convert after real_arithmetics that performs at least
> > some of the rounding (and perhaps the inexact return value from
> > real_arithmetic
> > should be taken into account).
> > Without flag_rounding_math and on non-MODE_COMPOSITE_P the basic arithmetics
> > will be probably exact most of the time, except perhaps for denormals which
> > are
> > sometimes flushed to zero.  But as soon as one jumps to even sqrt and other
> > math functions, one needs to allow some epsilon up for upper bound and down
> > for lower bound, similarly for the basic ops with inexact and
> > flag_rounding_math.
> > For MODE_COMPOSITE_P, our emulation doesn't match what is done at runtime,
> > so we need to be more forgiving.
> 
> Definitely.  I'm pretty sure there's even more stuff I'm missing :).  We're
> only providing the core infrastructure-- the ability for ranger / vrp to
> handle basic floats.  We're hoping some float savvy hacker can fill in the
> rest.
> 
> The above does handle open and closed internals, though.  So it should be
> able to handle (5.0, 10.0] + [1.0, 1.0] if I understand correctly.  But I
> don't pretend to remotely know anything about floats.  I'm hoping the
> range-op entries can be implemented by a floating expert.
> 
> A bare bones implementation would provide relops, and maybe ?? singleton
> floats ([1.23, 1.23]).

Or...we could just use const_binop() instead of calling real_arithmetic directly.  Wouldn't that handle all the cases you mention?
Comment 24 Jakub Jelinek 2022-04-06 10:47:31 UTC
It would, but it would also give up quite often.
For VRP we can do better, because we don't have just the options exactly correct answer or give up, we can have ranges.
So, say for flag_rounding_math, we can do real_arithmetics and real_convert, if inexact or any rounding happens during real_convert, we can ensure that on upper bound we round towards positive infinity and for lower bound towards negative infinity.  The flush of denormals can be also handled.  And for composite modes, either indeed give up or do something roughly ok too.
For math functions have some param with worst ulp error to be used for VRP.  Another case is signed zeros, if it is unsure which zero sign would be used we can e.g. have a [-0.0, 0.0] range etc.
And, it would be nice to have some way to express result is or might be a qNaN, sNaN, +/- infinity.  While the infinities could stand at the end of ranges,
so have [5.0, +inf] range, for NaNs it would be nice to know whether the value can or can't be a NaN and even if it has to be NaN, whether it must be a particular NaN or any NaN.
Comment 25 Aldy Hernandez 2022-08-01 13:27:04 UTC
Adding some notes here as I work through this PR...

Even with floating aware VRP, we won't be able to do much because SCEV (which ranger and VRP use) does not work with non-integers.

At EVRP time we see:

double BG_SplineLength ()
{
  double i;
  double lastPoint;

  <bb 2> :
  goto <bb 6>; [INV]

  <bb 3> :
  if (i_3 == 0.0)
    goto <bb 5>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 4> :

  <bb 5> :
  # lastPoint_1 = PHI <i_3(3), 2.0e+0(4)>
  i_10 = i_3 + 1.00000001490116119384765625e-1;

  <bb 6> :
  # lastPoint_2 = PHI <lastPoint_5(D)(2), lastPoint_1(5)>
  # i_3 = PHI <1.00000000000000002081668171172168513294309377670288085938e-2(2), i_10(5)>
  if (i_3 <= 1.0e+0)
    goto <bb 3>; [INV]
  else
    goto <bb 7>; [INV]

  <bb 7> :
  return lastPoint_2;

}

On the 6->3 edge we know that i_3 is [-INF, 1.0], so even adding a range-op entry for the PLUS_EXPR, we'd know that i_3 is [-INF, 1.1], which is not enough to fold the i_3 == 0.0 conditional.  If you convert this testcase to integer and turn off SCEV (and unrolling and FRE, etc), you'll notice that VRP doesn't do much here (in either legacy nor ranger mode).

We would need SCEV to give us [1.0, 1.1] in order to fold the i_3 == 0.0 conditional.  For that matter, if I hack SCEV to give us the expected end points, I can get evrp to fold the conditional.  I think once VRP is FP aware, we can close this PR because this particular testcase is SCEV specific.  So for this particular testcase, perhaps we could open a SCEV+FP PR?

As an aside, the second conditional will not be folded by VRP (legacy or ranger), even with SCEV, even if you convert the above testcase to integer...it's either gotten later by unrolling+FRE for FP, or cddce for integers.

What I'm trying to say is that even with FP VRP, which we'll have shortly, the first conditional won't be folded because SCEV doesn't do floats, and the second one won't be, because it's not VRP's job.
Comment 26 Aldy Hernandez 2022-08-01 14:30:45 UTC
(In reply to Andrew Macleod from comment #19)
> We can use the original testcase as the litmus test for basic support if we
> compile it with
> 
> -O2  -fno-tree-fre  -fno-tree-dominator-opts
> 
> The unroller will unroll the loop and the VRP2 pass will be presented with:
> 
>  <bb 2> [local count: 97603129]:
>  i_1 = 1.1000000149011611438876201418679556809365749359130859375e-1;
>  i_17 = i_1 + 1.00000001490116119384765625e-1;
>  i_22 = i_17 + 1.00000001490116119384765625e-1;
>  i_27 = i_22 + 1.00000001490116119384765625e-1;
>  i_32 = i_27 + 1.00000001490116119384765625e-1;
>  i_37 = i_32 + 1.00000001490116119384765625e-1;
>  i_42 = i_37 + 1.00000001490116119384765625e-1;
>  i_47 = i_42 + 1.00000001490116119384765625e-1;
>  i_52 = i_47 + 1.00000001490116119384765625e-1;
>  if (i_52 == 0.0)
>    goto <bb 4>; [50.00%]
>  else
>    goto <bb 3>; [50.00%]
> 
>  <bb 3> [local count: 48801565]:
> 
>  <bb 4> [local count: 97603129]:
>  # lastPoint_12 = PHI <i_52(2), 2.0e+0(3)>
>  return lastPoint_12;
> 
> Which is we can track floating point ranges in VRP and simplification,
> recognize that i_52 != 0.0 and VRP2 should be able to resolve this to 
> 
> return 2.0e+0;

Errr, just saw this.  I if we let the unroller do it's thing, we can catch this at VRP2, and continue using this testcase.

And sure enough, my current work catches this:

Folding statement: i_5 = 1.1000000149011611438876201418679556809365749359130859375e-1;
Queued stmt for removal.  Folds to: 1.1000000149011611438876201418679556809365749359130859375e-1
Folding statement: i_17 = i_5 + 1.00000001490116119384765625e-1;
Queued stmt for removal.  Folds to: 2.100000029802322476513154470012523233890533447265625e-1
...
...
...
Folding statement: if (i_52 == 0.0)
gimple_simplified to if (0 != 0)
gimple_simplified to if (0 != 0)
Folded into: if (0 != 0)
Comment 27 GCC Commits 2022-11-08 15:54:07 UTC
The master branch has been updated by Aldy Hernandez <aldyh@gcc.gnu.org>:

https://gcc.gnu.org/g:9d96a286992a0fd9ecdd6a58cd9a413c8c49f477

commit r13-3812-g9d96a286992a0fd9ecdd6a58cd9a413c8c49f477
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Thu Oct 13 08:14:16 2022 +0200

    [PR24021] Implement PLUS_EXPR range-op entry for floats.
    
    This is the range-op entry for floating point PLUS_EXPR.  It's the
    most intricate range entry we have so far, because we need to keep
    track of rounding and target FP formats.  This will be the last FP
    entry I commit, mostly to avoid disturbing the tree any further, and
    also because what we have so far is enough for a solid VRP.
    
    So far we track NANs and signs correctly.  We also handle relationals
    (symbolics and numeric), both ordered and unordered, ABS_EXPR and
    NEGATE_EXPR which are used to fold __builtin_isinf, and __builtin_sign
    (__builtin_copysign is coming up).  All in all, I think this provide
    more than enough for basic VRP on floats, as well as provide a basis
    to flesh out the rest if there's interest.
    
    My goal with this entry is to provide a template for additional binary
    operators, as they tend to follow a similar pattern: handle NANs, do
    the arithmetic while keeping track of rounding, and adjust for NAN.  I
    may abstract the general parts as we do for irange's fold_range and
    wi_fold.
    
            PR tree-optimization/24021
    
    gcc/ChangeLog:
    
            * range-op-float.cc (propagate_nans): New.
            (frange_nextafter): New.
            (frange_arithmetic): New.
            (class foperator_plus): New.
            (floating_op_table::floating_op_table): Add PLUS_EXPR entry.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.dg/tree-ssa/vrp-float-plus.c: New test.
Comment 28 Aldy Hernandez 2022-11-08 15:57:14 UTC
It's my pleasure to finally close this PR.  VRP handles floats.  Obviously not every operator has been implemented, but enough to call VRP floating-point aware, and certainly enough to handle this case.