GCC Bugzilla – Full Text Bug Listing

- Home
- | New
- | Browse
- | Search
- | [?]
- | Reports
- | Help
- | New Account
- | Log In
- | Forgot Password

Summary: | VRP does not work with floating points | ||
---|---|---|---|

Product: | gcc | Reporter: | Andrew Pinski <pinskia> |

Component: | tree-optimization | Assignee: | Aldy Hernandez <aldyh> |

Status: | RESOLVED FIXED | ||

Severity: | enhancement | CC: | aldyh, aldyh, amacleod, dimhen, dje, fw, gcc-bugs, jakub, jeffreyalaw, law, manu, rguenth, sayle, vincent-gcc |

Priority: | P2 | Keywords: | missed-optimization |

Version: | 4.1.0 | ||

Target Milestone: | 13.0 | ||

Host: | Target: | ||

Build: | Known to work: | ||

Known to fail: | Last reconfirmed: | 2007-07-01 00:23:36 | |

Bug Depends on: | |||

Bug Blocks: | 85316, 68097 |

Description
Andrew Pinski
2005-09-22 20:59:19 UTC
Confirmed. Could this be a summer of code project? Numerically intensive programs should benefit tremendously, shouldn't they? But note that the optimization should be modified or disabled in contexts where floating-point exceptions need to be honored, as the i+=0.1f will sometimes trigger the inexact exception. FWIW, we hope to enable floating point range support in ranger for GCC 13. One of the post stage 1 tasks is to generalize ranger to use a vrange class, from which we can derive integral (irange), pointer (prange), and float (frange) types. possibly complex integral (cirange) and/or string (srange) have been thrown around if appropriate. Before we go to develop the frange class, we'll have a working session of some sort to flesh out what it can track, then implement some range-ops for some float tree-codes. (In reply to Andrew Macleod from comment #4) > FWIW, we hope to enable floating point range support in ranger for GCC 13. > > One of the post stage 1 tasks is to generalize ranger to use a vrange class, > from which we can derive integral (irange), pointer (prange), and float > (frange) types. possibly complex integral (cirange) and/or string (srange) > have been thrown around if appropriate. > > Before we go to develop the frange class, we'll have a working session of > some sort to flesh out what it can track, then implement some range-ops for > some float tree-codes. FWIW, I'm on this. I've begun work on vrange, and simultaneously have started work on a proof-of-concept for floating points that I hope some more savvy floating experts can extend. Preview of what's to come. Implementing a bare bones frange class and associated relational operators in range-ops we can get: void link_error(); void func(); int foo(float f) { if (f > 3.0) { func(); if (f < 2.0) link_error(); } } Folding statement: if (f_2(D) > 3.0e+0) Not folded Folding statement: func (); Not folded Folding statement: if (f_2(D) < 2.0e+0) Folding predicate f_2(D) < 2.0e+0 to 0 Folded into: if (0 != 0) And we can also get symbolics: int bar(float f, float g) { if (f > g) { func(); if (f < g) link_error(); } } Folding statement: if (f_2(D) < g_3(D)) folding with relation f_2(D) > g_3(D) Folding predicate f_2(D) < g_3(D) to 0 Folded into: if (0 != 0) My proof of concept has ranger dumps looking like: =========== BB 2 ============ Imports: f_2(D) Exports: f_2(D) f_2(D) float VARYING <bb 2> : if (f_2(D) > 3.0e+0) goto <bb 3>; [INV] else goto <bb 5>; [INV] 2->3 (T) f_2(D) : float (3.0e+0, +Inf] 2->5 (F) f_2(D) : float [-Inf, 3.0e+0] Interestingly, since the threader uses ranger, I had to turn off the threader for the above snippets, because ethread gets the first one before evrp even gets a whack at it: [1] Registering jump thread: (2, 3) incoming edge; (3, 5) nocopy; path: 2->3->5 SUCCESS Removing basic block 3 ;; basic block 3, loop depth 0 ;; pred: func (); if (f_2(D) < 2.0e+0) goto <bb 4>; [INV] else goto <bb 5>; [INV] ;; succ: 4 ;; 5 As I've mentioned, I'm hoping some floating expert can take this across to goal line, as my head will start spinning as soon as we start talking about NANs and such. The range-op work will likely require floating specialized knowledge. Very cool. ANd no, I'm not enough of an expert on the FP side to shepherd that though. I would expect it to be exceptionally tricky on the solver side. Probably the most useful things I've come across would be knowing if a particular value can or can not have certain special values. ie, [qs]NaN, +-0.0, +-Inf. Knowing how an value relates to 0 is also quite helpful. ie, > 0, < 0 and the like. (In reply to Jeffrey A. Law from comment #7) > Very cool. ANd no, I'm not enough of an expert on the FP side to shepherd > that though. I would expect it to be exceptionally tricky on the solver > side. The solver, or ranger/gori, should be isolated from changes. The main goal is to make ranger work with generic vrange's, not iranges or franges. Once I'm done with the abstraction, it should only be value-range.* (class frange) and range-ops* that should be affected. That's the theory anyhow ;-). (In reply to Aldy Hernandez from comment #6) > As I've mentioned, I'm hoping some floating expert can take this across to > goal line, as my head will start spinning as soon as we start talking about > NANs and such. The range-op work will likely require floating specialized > knowledge. Subnormals might also need to be considered as special cases: "Whether and in what cases subnormal numbers are treated as zeros is implementation defined." will be added to C23 (some behaviors are dictated by the hardware, e.g. ARM in some non-IEEE configurations), but I've asked for clarification in the CFP mailing-list. (In reply to Vincent Lefèvre from comment #9) > Subnormals might also need to be considered as special cases: "Whether and > in what cases subnormal numbers are treated as zeros is implementation > defined." will be added to C23 (some behaviors are dictated by the hardware, > e.g. ARM in some non-IEEE configurations), but I've asked for clarification > in the CFP mailing-list. Some details in http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2797.htm Various match.pd patterns use predicates like: (simplify (mult @0 real_zerop@1) (if (!tree_expr_maybe_nan_p (@0) && (!HONOR_NANS (type) || !tree_expr_maybe_infinite_p (@0)) && (!HONOR_SIGNED_ZEROS (type) || tree_expr_nonnegative_p (@0))) @1)) /* In IEEE floating point, x*1 is not equivalent to x for snans. Likewise for complex arithmetic with signed zeros. */ (simplify (mult @0 real_onep) (if (!tree_expr_maybe_signaling_nan_p (@0) && (!HONOR_SIGNED_ZEROS (type) || !COMPLEX_FLOAT_TYPE_P (type))) (non_lvalue @0))) etc. which would be great if it could use frange. Though, I think we also run here into the problem that match.pd right now avoids the ranger because it could reliably only allow walks from uses to SSA_NAME_DEF_STMTs and not the other direction. But of course those few simplifications are simple enough that they could be also repeated somewhere in the e?vrp code. One thing to consider is that at runtime, not all arithmetics might be as precise as what mpfr does at compile time, so in some cases it should expect a few ulps or maybe even more than that as possible errors (especially for library functions). And also take into account different rounding modes if user wants that to be honored. As for exceptions, I guess one thing is that ranger computes ranges and another thing is that optimization based on that will sometimes need to punt if it could optimize away visible side-effects the user cares about. (In reply to Aldy Hernandez from comment #8) > (In reply to Jeffrey A. Law from comment #7) > > Very cool. ANd no, I'm not enough of an expert on the FP side to shepherd > > that though. I would expect it to be exceptionally tricky on the solver > > side. > > The solver, or ranger/gori, should be isolated from changes. The main goal > is to make ranger work with generic vrange's, not iranges or franges. Once > I'm done with the abstraction, it should only be value-range.* (class > frange) and range-ops* that should be affected. That's the theory anyhow > ;-). Yeah, once we get things in place, there will be a skeleton frange class representation that does some set of basic operations. One or more floating point experts are welcome then to flush out additional representational requirements in the frange class and range-op implementations of any tree-codes operations. There shouldn't be any changed required anywhere else. VRP/threader/relations/folding/etc are all driven by range-ops. And we are of course happy to help when the time comes and GCC13 opens up and the abstraction code goes in. If someone is really keen, they may be able to work on a development branch a bit earlier than that... I also have a short list of practical floating point considerations I collected a couple of years ago. So I don't lose it: finite non-zero/positive/negative normalized in the range [-1,1] (for trig functions) which I would expect to come from adding some domain/range knowledge from standard math functions, and propagating range info from integer stuff from which floating point values are derived, on top of the usual stuff. e.g. in embedded world floating-point values from peripherals would commonly come from some PWM feature via integer division by a constant denominator, which could tell us a fair bit (even more if it could use builtin expect information about the numerator). as far as existing consumers for this info - I'm not sure if glibc still has special finite versions of functions, but those could be used based on range info. likewise, knowing a value is normal or in a bounded range could allow for 'relaxed' versions of library functions to be used. there should also be some inline sequences or other code transforms that are controlled by finite-math-only and also enabled for provably finite values. > etc. which would be great if it could use frange. Though, I think we also > run here into the problem that match.pd right now avoids the ranger because > it could reliably only allow walks from uses to SSA_NAME_DEF_STMTs and not > the other direction. That only happens with pointers and the non-null property right now, and even that is going away to start GCC13. I have the prototype underway now. . > One thing to consider is that at runtime, not all arithmetics might be as > precise as what mpfr does at compile time, so in some cases it should expect > a few ulps or maybe even more than that as possible errors (especially for > library functions). And also take into account different rounding modes if > user wants that to be honored. > We'll leave those to the experts :-) > As for exceptions, I guess one thing is that ranger computes ranges and > another thing is that optimization based on that will sometimes need to punt > if it could optimize away visible side-effects the user cares about. yeah, like with no-delete-null-pointer, there may be flag checking required for some things. (In reply to Jakub Jelinek from comment #11) > And also take into account different rounding modes if > user wants that to be honored. And that would eliminate the need to consider the possibility of double rounding in case of intermediate extended precision (as with x87). I think that for match.pd we should rely mostly on global ranges which means extending SSA name ranges for floats. Indeed I think the most important parts will be tracking whether values can be NaNs, Infs or whether they are known to be negative, positive or zero. But just tracking those fpclassify/signbit properties wouldn't be enough, because in many cases e.g. whether something can be infinite or not will depend on more precise value ranges. If we track just a bitmask, can the value be: zero subnormal normal infinite qNaN sNaN have positive signbit have negative signbit then even on simple multiplication or addition we'll need to assume from normal * normal or normal + normal that it can be infinite. When we know that one operand is [-15.123 - epsilon, 23.152 + epsilon] and the other is [256.0 - epsilon, 512.0 + epsilon], we actually can find out the result will not be infinite etc. But sure, tracking in a bitmask the above properties in addition to some approximate range is useful. ```
On Thu, 10 Feb 2022, jakub at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24021
>
> --- Comment #16 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> But just tracking those fpclassify/signbit properties wouldn't be enough,
> because in many cases e.g. whether something can be infinite or not will depend
> on more precise value ranges.
> If we track just a bitmask, can the value be:
> zero
> subnormal
> normal
> infinite
> qNaN
> sNaN
> have positive signbit
> have negative signbit
> then even on simple multiplication or addition we'll need to assume from normal
> * normal or normal + normal that it can be infinite. When we know that
> one operand is [-15.123 - epsilon, 23.152 + epsilon] and the other is
> [256.0 - epsilon, 512.0 + epsilon], we actually can find out the result will
> not be infinite etc.
> But sure, tracking in a bitmask the above properties in addition to some
> approximate range is useful.
Yes. What I was trying to say is that while the value-range
propagation process should likely track actual FP value ranges it
remains to be seen whether we need to store those into the SSA name
info or whether tracking a set of flags is good enough in practice
(given that match.pd should generally not invoke ranger in
resolving mode but look at what's in SSA annotations).
```
I'm wondering whether it is possible to check on actual code what is needed. For instance, assume that you have a program that produces always the same results, e.g. by running it over a fixed dataset. GCC could get some information about actual FP values (a bit like profile generation). Then check what benefit you can get by using these data at compile time (contrary to optimization with profile use, you assume here that the obtained information is necessarily valid, which is true as long as the program is run on the chosen dataset). The difficulty is to find whether some benefit can be obtained by VRP, but this should give an upper bound on the speedup you can hope to get. So perhaps this can give some useful information about what to focus on. We can use the original testcase as the litmus test for basic support if we compile it with -O2 -fno-tree-fre -fno-tree-dominator-opts The unroller will unroll the loop and the VRP2 pass will be presented with: <bb 2> [local count: 97603129]: i_1 = 1.1000000149011611438876201418679556809365749359130859375e-1; i_17 = i_1 + 1.00000001490116119384765625e-1; i_22 = i_17 + 1.00000001490116119384765625e-1; i_27 = i_22 + 1.00000001490116119384765625e-1; i_32 = i_27 + 1.00000001490116119384765625e-1; i_37 = i_32 + 1.00000001490116119384765625e-1; i_42 = i_37 + 1.00000001490116119384765625e-1; i_47 = i_42 + 1.00000001490116119384765625e-1; i_52 = i_47 + 1.00000001490116119384765625e-1; if (i_52 == 0.0) goto <bb 4>; [50.00%] else goto <bb 3>; [50.00%] <bb 3> [local count: 48801565]: <bb 4> [local count: 97603129]: # lastPoint_12 = PHI <i_52(2), 2.0e+0(3)> return lastPoint_12; Which is we can track floating point ranges in VRP and simplification, recognize that i_52 != 0.0 and VRP2 should be able to resolve this to return 2.0e+0; (In reply to Andrew Macleod from comment #19) > We can use the original testcase as the litmus test for basic support if we > compile it with > > -O2 -fno-tree-fre -fno-tree-dominator-opts > > The unroller will unroll the loop and the VRP2 pass will be presented with: > > <bb 2> [local count: 97603129]: > i_1 = 1.1000000149011611438876201418679556809365749359130859375e-1; > i_17 = i_1 + 1.00000001490116119384765625e-1; > i_22 = i_17 + 1.00000001490116119384765625e-1; > i_27 = i_22 + 1.00000001490116119384765625e-1; > i_32 = i_27 + 1.00000001490116119384765625e-1; > i_37 = i_32 + 1.00000001490116119384765625e-1; > i_42 = i_37 + 1.00000001490116119384765625e-1; > i_47 = i_42 + 1.00000001490116119384765625e-1; > i_52 = i_47 + 1.00000001490116119384765625e-1; > if (i_52 == 0.0) > goto <bb 4>; [50.00%] > else > goto <bb 3>; [50.00%] > > <bb 3> [local count: 48801565]: > > <bb 4> [local count: 97603129]: > # lastPoint_12 = PHI <i_52(2), 2.0e+0(3)> > return lastPoint_12; > > Which is we can track floating point ranges in VRP and simplification, > recognize that i_52 != 0.0 and VRP2 should be able to resolve this to > > return 2.0e+0; We can definitely get this with the current work. All that's missing is the snippet implementing the PLUS_EXPR operator. A simple implementation (with likely the NAN bits wrong :)) follows: diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc index d1c7a3b571b..cba7166ad2b 100644 --- a/gcc/range-op-float.cc +++ b/gcc/range-op-float.cc @@ -1055,6 +1055,54 @@ foperator_mult::fold_range (frange &r, tree type, return true; } +class foperator_plus : public range_operator +{ + using range_operator::fold_range; +public: + virtual bool fold_range (frange &r, tree type, + const frange &lh, + const frange &rh, + relation_kind rel = VREL_NONE) const override; +} fop_plus; + +bool +foperator_plus::fold_range (frange &r, tree type, + const frange &lh, + const frange &rh, + relation_kind) const +{ + if (empty_range_varying (r, type, lh, rh)) + return true; + + if (lh.get_prop (FRANGE_PROP_NAN) + || rh.get_prop (FRANGE_PROP_NAN)) + { + r.set_varying (type); + return true; + } + + enum tree_code code_lb, code_ub; + tree lh_lb = lh.lower_bound (&code_lb); + tree lh_ub = lh.upper_bound (&code_ub); + tree rh_lb = rh.lower_bound (); + tree rh_ub = rh.upper_bound (); + + REAL_VALUE_TYPE res_lb, res_ub; + real_arithmetic (&res_lb, PLUS_EXPR, + TREE_REAL_CST_PTR (lh_lb), + TREE_REAL_CST_PTR (rh_lb)); + real_arithmetic (&res_ub, PLUS_EXPR, + TREE_REAL_CST_PTR (lh_ub), + TREE_REAL_CST_PTR (rh_ub)); + + r.set (code_lb, + build_real (type, res_lb), + build_real (type, res_ub), + code_ub); + return true; +} + + class floating_table : public range_op_table { public: @@ -1081,6 +1129,7 @@ floating_table::floating_table () set (UNORDERED_EXPR, fop_unordered); set (ORDERED_EXPR, fop_ordered); set (MULT_EXPR, fop_mult); + set (PLUS_EXPR, fop_plus); } #if CHECKING_P And voila... ./cc1 a.c -quiet -I/tmp -O2 -fno-tree-fre -fno-tree-dominator-opts -fno-thread-jumps -fdump-tree-vrp2-details=/dev/stdout Folding statement: i_1 = 1.1000000149011611438876201418679556809365749359130859375e-1; Queued stmt for removal. Folds to: 1.1000000149011611438876201418679556809365749359130859375e-1 Folding statement: i_17 = i_1 + 1.00000001490116119384765625e-1; Queued stmt for removal. Folds to: 2.1000000298023223377352763918679556809365749359130859375e-1 ... ... ... double BG_SplineLength () { double i; double lastPoint; <bb 2> [local count: 97603129]: return 2.0e+0; } Notice the -fno-thread-jumps, because otherwise the now floating point aware threader will thread the conditional before VRP2 gets a chance to see it. I should probably take this PR ;-). (In reply to Aldy Hernandez from comment #20) > (In reply to Andrew Macleod from comment #19) > > We can use the original testcase as the litmus test for basic support if we > > compile it with > > > > -O2 -fno-tree-fre -fno-tree-dominator-opts > > > > The unroller will unroll the loop and the VRP2 pass will be presented with: > > > > <bb 2> [local count: 97603129]: > > i_1 = 1.1000000149011611438876201418679556809365749359130859375e-1; > > i_17 = i_1 + 1.00000001490116119384765625e-1; > > i_22 = i_17 + 1.00000001490116119384765625e-1; > > i_27 = i_22 + 1.00000001490116119384765625e-1; > > i_32 = i_27 + 1.00000001490116119384765625e-1; > > i_37 = i_32 + 1.00000001490116119384765625e-1; > > i_42 = i_37 + 1.00000001490116119384765625e-1; > > i_47 = i_42 + 1.00000001490116119384765625e-1; > > i_52 = i_47 + 1.00000001490116119384765625e-1; > > if (i_52 == 0.0) > > goto <bb 4>; [50.00%] > > else > > goto <bb 3>; [50.00%] > > > > <bb 3> [local count: 48801565]: > > > > <bb 4> [local count: 97603129]: > > # lastPoint_12 = PHI <i_52(2), 2.0e+0(3)> > > return lastPoint_12; > > > > Which is we can track floating point ranges in VRP and simplification, > > recognize that i_52 != 0.0 and VRP2 should be able to resolve this to > > > > return 2.0e+0; > > We can definitely get this with the current work. All that's missing is the > snippet implementing the PLUS_EXPR operator. A simple implementation (with > likely the NAN bits wrong :)) follows: > > diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc > index d1c7a3b571b..cba7166ad2b 100644 > --- a/gcc/range-op-float.cc > +++ b/gcc/range-op-float.cc > @@ -1055,6 +1055,54 @@ foperator_mult::fold_range (frange &r, tree type, > return true; > } > > +class foperator_plus : public range_operator > +{ > + using range_operator::fold_range; > +public: > + virtual bool fold_range (frange &r, tree type, > + const frange &lh, > + const frange &rh, > + relation_kind rel = VREL_NONE) const override; > +} fop_plus; > + > +bool > +foperator_plus::fold_range (frange &r, tree type, > + const frange &lh, > + const frange &rh, > + relation_kind) const > +{ > + if (empty_range_varying (r, type, lh, rh)) > + return true; > + > + if (lh.get_prop (FRANGE_PROP_NAN) > + || rh.get_prop (FRANGE_PROP_NAN)) > + { > + r.set_varying (type); > + return true; > + } > + > + enum tree_code code_lb, code_ub; > + tree lh_lb = lh.lower_bound (&code_lb); > + tree lh_ub = lh.upper_bound (&code_ub); > + tree rh_lb = rh.lower_bound (); > + tree rh_ub = rh.upper_bound (); > + > + REAL_VALUE_TYPE res_lb, res_ub; > + real_arithmetic (&res_lb, PLUS_EXPR, > + TREE_REAL_CST_PTR (lh_lb), > + TREE_REAL_CST_PTR (rh_lb)); > + real_arithmetic (&res_ub, PLUS_EXPR, > + TREE_REAL_CST_PTR (lh_ub), > + TREE_REAL_CST_PTR (rh_ub)); > + > + r.set (code_lb, > + build_real (type, res_lb), > + build_real (type, res_ub), > + code_ub); > + return true; > +} This doesn't take flag_rounding_math or not always exactly precise floating point computations into account. It is also missing real_convert after real_arithmetics that performs at least some of the rounding (and perhaps the inexact return value from real_arithmetic should be taken into account). Without flag_rounding_math and on non-MODE_COMPOSITE_P the basic arithmetics will be probably exact most of the time, except perhaps for denormals which are sometimes flushed to zero. But as soon as one jumps to even sqrt and other math functions, one needs to allow some epsilon up for upper bound and down for lower bound, similarly for the basic ops with inexact and flag_rounding_math. For MODE_COMPOSITE_P, our emulation doesn't match what is done at runtime, so we need to be more forgiving. ```
> This doesn't take flag_rounding_math or not always exactly precise floating
> point computations into account.
> It is also missing real_convert after real_arithmetics that performs at least
> some of the rounding (and perhaps the inexact return value from
> real_arithmetic
> should be taken into account).
> Without flag_rounding_math and on non-MODE_COMPOSITE_P the basic arithmetics
> will be probably exact most of the time, except perhaps for denormals which
> are
> sometimes flushed to zero. But as soon as one jumps to even sqrt and other
> math functions, one needs to allow some epsilon up for upper bound and down
> for lower bound, similarly for the basic ops with inexact and
> flag_rounding_math.
> For MODE_COMPOSITE_P, our emulation doesn't match what is done at runtime,
> so we need to be more forgiving.
Definitely. I'm pretty sure there's even more stuff I'm missing :). We're only providing the core infrastructure-- the ability for ranger / vrp to handle basic floats. We're hoping some float savvy hacker can fill in the rest.
The above does handle open and closed internals, though. So it should be able to handle (5.0, 10.0] + [1.0, 1.0] if I understand correctly. But I don't pretend to remotely know anything about floats. I'm hoping the range-op entries can be implemented by a floating expert.
A bare bones implementation would provide relops, and maybe ?? singleton floats ([1.23, 1.23]).
```
(In reply to Aldy Hernandez from comment #22) > > This doesn't take flag_rounding_math or not always exactly precise floating > > point computations into account. > > It is also missing real_convert after real_arithmetics that performs at least > > some of the rounding (and perhaps the inexact return value from > > real_arithmetic > > should be taken into account). > > Without flag_rounding_math and on non-MODE_COMPOSITE_P the basic arithmetics > > will be probably exact most of the time, except perhaps for denormals which > > are > > sometimes flushed to zero. But as soon as one jumps to even sqrt and other > > math functions, one needs to allow some epsilon up for upper bound and down > > for lower bound, similarly for the basic ops with inexact and > > flag_rounding_math. > > For MODE_COMPOSITE_P, our emulation doesn't match what is done at runtime, > > so we need to be more forgiving. > > Definitely. I'm pretty sure there's even more stuff I'm missing :). We're > only providing the core infrastructure-- the ability for ranger / vrp to > handle basic floats. We're hoping some float savvy hacker can fill in the > rest. > > The above does handle open and closed internals, though. So it should be > able to handle (5.0, 10.0] + [1.0, 1.0] if I understand correctly. But I > don't pretend to remotely know anything about floats. I'm hoping the > range-op entries can be implemented by a floating expert. > > A bare bones implementation would provide relops, and maybe ?? singleton > floats ([1.23, 1.23]). Or...we could just use const_binop() instead of calling real_arithmetic directly. Wouldn't that handle all the cases you mention? It would, but it would also give up quite often. For VRP we can do better, because we don't have just the options exactly correct answer or give up, we can have ranges. So, say for flag_rounding_math, we can do real_arithmetics and real_convert, if inexact or any rounding happens during real_convert, we can ensure that on upper bound we round towards positive infinity and for lower bound towards negative infinity. The flush of denormals can be also handled. And for composite modes, either indeed give up or do something roughly ok too. For math functions have some param with worst ulp error to be used for VRP. Another case is signed zeros, if it is unsure which zero sign would be used we can e.g. have a [-0.0, 0.0] range etc. And, it would be nice to have some way to express result is or might be a qNaN, sNaN, +/- infinity. While the infinities could stand at the end of ranges, so have [5.0, +inf] range, for NaNs it would be nice to know whether the value can or can't be a NaN and even if it has to be NaN, whether it must be a particular NaN or any NaN. Adding some notes here as I work through this PR... Even with floating aware VRP, we won't be able to do much because SCEV (which ranger and VRP use) does not work with non-integers. At EVRP time we see: double BG_SplineLength () { double i; double lastPoint; <bb 2> : goto <bb 6>; [INV] <bb 3> : if (i_3 == 0.0) goto <bb 5>; [INV] else goto <bb 4>; [INV] <bb 4> : <bb 5> : # lastPoint_1 = PHI <i_3(3), 2.0e+0(4)> i_10 = i_3 + 1.00000001490116119384765625e-1; <bb 6> : # lastPoint_2 = PHI <lastPoint_5(D)(2), lastPoint_1(5)> # i_3 = PHI <1.00000000000000002081668171172168513294309377670288085938e-2(2), i_10(5)> if (i_3 <= 1.0e+0) goto <bb 3>; [INV] else goto <bb 7>; [INV] <bb 7> : return lastPoint_2; } On the 6->3 edge we know that i_3 is [-INF, 1.0], so even adding a range-op entry for the PLUS_EXPR, we'd know that i_3 is [-INF, 1.1], which is not enough to fold the i_3 == 0.0 conditional. If you convert this testcase to integer and turn off SCEV (and unrolling and FRE, etc), you'll notice that VRP doesn't do much here (in either legacy nor ranger mode). We would need SCEV to give us [1.0, 1.1] in order to fold the i_3 == 0.0 conditional. For that matter, if I hack SCEV to give us the expected end points, I can get evrp to fold the conditional. I think once VRP is FP aware, we can close this PR because this particular testcase is SCEV specific. So for this particular testcase, perhaps we could open a SCEV+FP PR? As an aside, the second conditional will not be folded by VRP (legacy or ranger), even with SCEV, even if you convert the above testcase to integer...it's either gotten later by unrolling+FRE for FP, or cddce for integers. What I'm trying to say is that even with FP VRP, which we'll have shortly, the first conditional won't be folded because SCEV doesn't do floats, and the second one won't be, because it's not VRP's job. (In reply to Andrew Macleod from comment #19) > We can use the original testcase as the litmus test for basic support if we > compile it with > > -O2 -fno-tree-fre -fno-tree-dominator-opts > > The unroller will unroll the loop and the VRP2 pass will be presented with: > > <bb 2> [local count: 97603129]: > i_1 = 1.1000000149011611438876201418679556809365749359130859375e-1; > i_17 = i_1 + 1.00000001490116119384765625e-1; > i_22 = i_17 + 1.00000001490116119384765625e-1; > i_27 = i_22 + 1.00000001490116119384765625e-1; > i_32 = i_27 + 1.00000001490116119384765625e-1; > i_37 = i_32 + 1.00000001490116119384765625e-1; > i_42 = i_37 + 1.00000001490116119384765625e-1; > i_47 = i_42 + 1.00000001490116119384765625e-1; > i_52 = i_47 + 1.00000001490116119384765625e-1; > if (i_52 == 0.0) > goto <bb 4>; [50.00%] > else > goto <bb 3>; [50.00%] > > <bb 3> [local count: 48801565]: > > <bb 4> [local count: 97603129]: > # lastPoint_12 = PHI <i_52(2), 2.0e+0(3)> > return lastPoint_12; > > Which is we can track floating point ranges in VRP and simplification, > recognize that i_52 != 0.0 and VRP2 should be able to resolve this to > > return 2.0e+0; Errr, just saw this. I if we let the unroller do it's thing, we can catch this at VRP2, and continue using this testcase. And sure enough, my current work catches this: Folding statement: i_5 = 1.1000000149011611438876201418679556809365749359130859375e-1; Queued stmt for removal. Folds to: 1.1000000149011611438876201418679556809365749359130859375e-1 Folding statement: i_17 = i_5 + 1.00000001490116119384765625e-1; Queued stmt for removal. Folds to: 2.100000029802322476513154470012523233890533447265625e-1 ... ... ... Folding statement: if (i_52 == 0.0) gimple_simplified to if (0 != 0) gimple_simplified to if (0 != 0) Folded into: if (0 != 0) The master branch has been updated by Aldy Hernandez <aldyh@gcc.gnu.org>: https://gcc.gnu.org/g:9d96a286992a0fd9ecdd6a58cd9a413c8c49f477 commit r13-3812-g9d96a286992a0fd9ecdd6a58cd9a413c8c49f477 Author: Aldy Hernandez <aldyh@redhat.com> Date: Thu Oct 13 08:14:16 2022 +0200 [PR24021] Implement PLUS_EXPR range-op entry for floats. This is the range-op entry for floating point PLUS_EXPR. It's the most intricate range entry we have so far, because we need to keep track of rounding and target FP formats. This will be the last FP entry I commit, mostly to avoid disturbing the tree any further, and also because what we have so far is enough for a solid VRP. So far we track NANs and signs correctly. We also handle relationals (symbolics and numeric), both ordered and unordered, ABS_EXPR and NEGATE_EXPR which are used to fold __builtin_isinf, and __builtin_sign (__builtin_copysign is coming up). All in all, I think this provide more than enough for basic VRP on floats, as well as provide a basis to flesh out the rest if there's interest. My goal with this entry is to provide a template for additional binary operators, as they tend to follow a similar pattern: handle NANs, do the arithmetic while keeping track of rounding, and adjust for NAN. I may abstract the general parts as we do for irange's fold_range and wi_fold. PR tree-optimization/24021 gcc/ChangeLog: * range-op-float.cc (propagate_nans): New. (frange_nextafter): New. (frange_arithmetic): New. (class foperator_plus): New. (floating_op_table::floating_op_table): Add PLUS_EXPR entry. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/vrp-float-plus.c: New test. It's my pleasure to finally close this PR. VRP handles floats. Obviously not every operator has been implemented, but enough to call VRP floating-point aware, and certainly enough to handle this case. |