With -Ofast GCC doesn't reassociate constant multiplies or negates away from divisors to allow for more reciprocal division optimizations. It is also possible to avoid divisions (or multiplies) involving immediates in comparisons that check for a positive/negative result. float f1(float x, float y) { return x / (y * y); } // -> x * (1/y) * (1/y) float f2(float x, float y) { return x / (y * 3.0f); } // -> (x/3) / y float f3(float x, float y) { return x / -y; } // -> (-x) / y int f4(float x) { return (1.0f / x) < 0.0f; } // -> x < 0.0f int f5(float x) { return (x / 2.0f) <= 0.0f; } // -> x <= 0.0f A quick experiment shows the first transformation could remove almost 100 divisions from SPEC2006, the 2nd 50. The first transformation is only useful if there is at least one other division by y, so likely best done in the division reciprocal optimization phase.
Confirmed. Note the precision changes for all of them are different than the current transform so careful evaluation needs to be done.
The transforms int f4(float x) { return (1.0f / x) < 0.0f; } // -> x < 0.0f int f5(float x) { return (x / 2.0f) <= 0.0f; } // -> x <= 0.0f can be done as match.pd patterns, no?
(In reply to ktkachov from comment #2) > The transforms > > int f4(float x) { return (1.0f / x) < 0.0f; } // -> x < 0.0f > int f5(float x) { return (x / 2.0f) <= 0.0f; } // -> x <= 0.0f > > can be done as match.pd patterns, no? Note f5 is already transformed to (x * 0.5f) <= 0.0f even with -O2. The more general form for that one is (X * C0) <= C1 -> X <= C1/C0.
On Wed, 24 Aug 2016, ktkachov at gcc dot gnu.org wrote: > int f4(float x) { return (1.0f / x) < 0.0f; } // -> x < 0.0f Requires -fno-trapping-math, as this could lose an overflow or underflow from the division (but provided subnormals are supported, it can't underflow to zero for IEEE types). > int f5(float x) { return (x / 2.0f) <= 0.0f; } // -> x <= 0.0f Requires -funsafe-math-optimizations or similar; this is not a correct transformation for the least positive subnormal. (In principle, given -fno-rounding-math -fno-trapping-math, you could convert this to x <= FLT_TRUE_MIN and be correct.)
Author: law Date: Fri Sep 15 16:17:55 2017 New Revision: 252827 URL: https://gcc.gnu.org/viewcvs?rev=252827&root=gcc&view=rev Log: 2017-09-15 Jackson Woodruff <jackson.woodruff@arm.com> PR tree-optimization/71026 * match.pd: Move RDIV patterns from fold-const.c * fold-const.c (distribute_real_division): Removed. (fold_binary_loc): Remove calls to distribute_real_divison. PR tree-optimization/71026 * gcc/testsuire/gcc.dg/fold-div-1.c: Use -O1. Modified: trunk/gcc/ChangeLog trunk/gcc/match.pd trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/fold-div-1.c
(In reply to Jeffrey A. Law from comment #5) > Author: law > Date: Fri Sep 15 16:17:55 2017 > New Revision: 252827 > > URL: https://gcc.gnu.org/viewcvs?rev=252827&root=gcc&view=rev > Log: > 2017-09-15 Jackson Woodruff <jackson.woodruff@arm.com> > > PR tree-optimization/71026 > * match.pd: Move RDIV patterns from fold-const.c > * fold-const.c (distribute_real_division): Removed. > (fold_binary_loc): Remove calls to distribute_real_divison. > > PR tree-optimization/71026 > * gcc/testsuire/gcc.dg/fold-div-1.c: Use -O1. > > Modified: > trunk/gcc/ChangeLog > trunk/gcc/match.pd > trunk/gcc/testsuite/ChangeLog > trunk/gcc/testsuite/gcc.dg/fold-div-1.c This seems to leave out the fold-const.c changes...
Author: wilco Date: Tue Oct 17 13:22:48 2017 New Revision: 253812 URL: https://gcc.gnu.org/viewcvs?rev=253812&root=gcc&view=rev Log: Factor out division by squares and remove division around comparisons (0/2) Commit gcc/fold-const.c missing from r252827: gcc/ PR 71026/tree-optimization * fold-const.c (distribute_real_division): Removed. (fold_binary_loc): Remove calls to distribute_real_divison. Modified: trunk/gcc/ChangeLog trunk/gcc/fold-const.c
Author: wilco Date: Tue Nov 7 12:38:55 2017 New Revision: 254497 URL: https://gcc.gnu.org/viewcvs?rev=254497&root=gcc&view=rev Log: PR71026: Canonicalize negates in division Canonicalize x / (- y) into (-x) / y. This moves negates out of the RHS of a division in order to allow further simplifications and potentially more reciprocal CSEs. 2017-11-07 Wilco Dijkstra <wdijkstr@arm.com> Jackson Woodruff <jackson.woodruff@arm.com> gcc/ PR tree-optimization/71026 * match.pd: Canonicalize negate in division. testsuite/ PR 71026/tree-optimization/71026 * gcc.dg/div_neg: New test. Added: trunk/gcc/testsuite/gcc.dg/div_neg.c Modified: trunk/gcc/ChangeLog trunk/gcc/match.pd trunk/gcc/testsuite/ChangeLog
Author: wilco Date: Thu Nov 16 11:54:49 2017 New Revision: 254816 URL: https://gcc.gnu.org/viewcvs?rev=254816&root=gcc&view=rev Log: Canonicalize constant multiplies in division This patch implements some of the optimizations discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026. Canonicalize x / (C1 * y) into (x * C2) / y. This moves constant multiplies out of the RHS of a division in order to allow further simplifications (such as (C1 * x) / (C2 * y) -> (C3 * x) / y) and to enable more reciprocal CSEs. 2017-11-16 Wilco Dijkstra <wdijkstr@arm.com> Jackson Woodruff <jackson.woodruff@arm.com> gcc/ PR tree-optimization/71026 * match.pd: Canonicalize constant multiplies in division. gcc/testsuite/ PR tree-optimization/71026 * gcc.dg/cse_recip.c: New test. Added: trunk/gcc/testsuite/gcc.dg/cse_recip.c Modified: trunk/gcc/ChangeLog trunk/gcc/match.pd trunk/gcc/testsuite/ChangeLog
Author: wilco Date: Fri Nov 24 16:03:13 2017 New Revision: 255141 URL: https://gcc.gnu.org/viewcvs?rev=255141&root=gcc&view=rev Log: Factor out division by squares This patch implements the some of the division optimizations discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026. The division reciprocal optimization now handles divisions by squares: x / (y * y) -> x * (1 / y) * (1 / y) This requires at least one more division by y before it triggers - the 3 divisions of (1/ y) are then CSEd into a single division. Overall this changes 1 division into 1 multiply, which is generally much faster. 2017-11-24 Jackson Woodruff <jackson.woodruff@arm.com> gcc/ PR tree-optimization/71026 * tree-ssa-math-opts (is_division_by_square, is_square_of): New. (insert_reciprocals): Change to insert reciprocals before a division by a square and to insert the square of a reciprocal. (execute_cse_reciprocals_1): Change to consider division by a square. (register_division_in): Add importance parameter. testsuite/ PR tree-optimization/71026 * gfortran.dg/extract_recip_1.f: New test. * gcc.dg/extract_recip_3.c: New test. * gcc.dg/extract_recip_4.c: New test. Added: trunk/gcc/testsuite/gcc.dg/extract_recip_3.c trunk/gcc/testsuite/gcc.dg/extract_recip_4.c trunk/gcc/testsuite/gfortran.dg/extract_recip_1.f Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-math-opts.c
Author: wilco Date: Wed Nov 14 12:45:29 2018 New Revision: 266142 URL: https://gcc.gnu.org/viewcvs?rev=266142&root=gcc&view=rev Log: Simplify floating point comparisons This patch implements some of the optimizations discussed in PR71026. Simplify (C / x >= 0.0) into x >= 0.0 with -funsafe-math-optimizations (since C / x can underflow to zero if x is huge, it's not safe otherwise). If C is negative the comparison is reversed. Simplify (x * C1) > C2 into x > (C2 / C1) with -funsafe-math-optimizations. If C1 is negative the comparison is reversed. gcc/ PR 71026/tree-optimization * match.pd: Simplify floating point comparisons. gcc/testsuite/ PR 71026/tree-optimization * gcc.dg/div-cmp-1.c: New test. * gcc.dg/div-cmp-2.c: New test. Added: trunk/gcc/testsuite/gcc.dg/div-cmp-1.c trunk/gcc/testsuite/gcc.dg/div-cmp-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/match.pd trunk/gcc/testsuite/ChangeLog
It looks the only case left to do is f5: x * C <= 0.0 -> x <= 0.0 if C >= 1.0 x * C <= 0.0 -> x < FLT_MIN/C if C < 1.0
(In reply to Wilco from comment #12) > It looks the only case left to do is f5: > > x * C <= 0.0 -> x <= 0.0 if C >= 1.0 > x * C <= 0.0 -> x < FLT_MIN/C if C < 1.0 Commit r266142 (from #c11) added this to match.pd: (for cmp (lt le gt ge) neg_cmp (gt ge lt le) /* Simplify (x * C1) cmp C2 -> x cmp (C2 / C1), where C1 != 0. */ (simplify (cmp (mult @0 REAL_CST@1) REAL_CST@2) (with { tree tem = const_binop (RDIV_EXPR, type, @2, @1); } (if (tem && !(REAL_VALUE_ISINF (TREE_REAL_CST (tem)) || (real_zerop (tem) && !real_zerop (@1)))) (switch (if (real_less (&dconst0, TREE_REAL_CST_PTR (@1))) (cmp @0 { tem; })) (if (real_less (TREE_REAL_CST_PTR (@1), &dconst0)) (neg_cmp @0 { tem; }))))))) Doesn't || (real_zerop (tem) && !real_zerop (@1)) prevent > x * C <= 0.0 -> x <= 0.0 if C >= 1.0 from happening?