Bug 71026 - Missing division optimizations
Summary: Missing division optimizations
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 7.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: spec
  Show dependency treegraph
 
Reported: 2016-05-09 16:08 UTC by Wilco
Modified: 2019-06-17 15:29 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-05-10 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Wilco 2016-05-09 16:08:15 UTC
With -Ofast GCC doesn't reassociate constant multiplies or negates away from divisors to allow for more reciprocal division optimizations. It is also possible to avoid divisions (or multiplies) involving immediates in comparisons that check for a positive/negative result.

float f1(float x, float y) { return x / (y * y); }    // -> x * (1/y) * (1/y)
float f2(float x, float y) { return x / (y * 3.0f); } // -> (x/3) / y
float f3(float x, float y) { return x / -y; }         // -> (-x) / y
int f4(float x) { return (1.0f / x) < 0.0f; }         // -> x < 0.0f
int f5(float x) { return (x / 2.0f) <= 0.0f; }        // -> x <= 0.0f

A quick experiment shows the first transformation could remove almost 100 divisions from SPEC2006, the 2nd 50. The first transformation is only useful if there is at least one other division by y, so likely best done in the division reciprocal optimization phase.
Comment 1 Richard Biener 2016-05-10 08:59:50 UTC
Confirmed.  Note the precision changes for all of them are different than
the current transform so careful evaluation needs to be done.
Comment 2 ktkachov 2016-08-24 14:05:42 UTC
The transforms

int f4(float x) { return (1.0f / x) < 0.0f; }         // -> x < 0.0f
int f5(float x) { return (x / 2.0f) <= 0.0f; }        // -> x <= 0.0f

can be done as match.pd patterns, no?
Comment 3 Wilco 2016-08-24 14:47:20 UTC
(In reply to ktkachov from comment #2)
> The transforms
> 
> int f4(float x) { return (1.0f / x) < 0.0f; }         // -> x < 0.0f
> int f5(float x) { return (x / 2.0f) <= 0.0f; }        // -> x <= 0.0f
> 
> can be done as match.pd patterns, no?

Note f5 is already transformed to (x * 0.5f) <= 0.0f even with -O2. The more general form for that one is (X * C0) <= C1 -> X <= C1/C0.
Comment 4 jsm-csl@polyomino.org.uk 2016-08-24 21:34:35 UTC
On Wed, 24 Aug 2016, ktkachov at gcc dot gnu.org wrote:

> int f4(float x) { return (1.0f / x) < 0.0f; }         // -> x < 0.0f

Requires -fno-trapping-math, as this could lose an overflow or underflow 
from the division (but provided subnormals are supported, it can't 
underflow to zero for IEEE types).

> int f5(float x) { return (x / 2.0f) <= 0.0f; }        // -> x <= 0.0f

Requires -funsafe-math-optimizations or similar; this is not a correct 
transformation for the least positive subnormal.  (In principle, given 
-fno-rounding-math -fno-trapping-math, you could convert this to x <= 
FLT_TRUE_MIN and be correct.)
Comment 5 Jeffrey A. Law 2017-09-15 16:18:27 UTC
Author: law
Date: Fri Sep 15 16:17:55 2017
New Revision: 252827

URL: https://gcc.gnu.org/viewcvs?rev=252827&root=gcc&view=rev
Log:
2017-09-15  Jackson Woodruff  <jackson.woodruff@arm.com>

	PR tree-optimization/71026
	* match.pd: Move RDIV patterns from fold-const.c
	* fold-const.c (distribute_real_division): Removed.
	(fold_binary_loc): Remove calls to distribute_real_divison.

	PR tree-optimization/71026
	* gcc/testsuire/gcc.dg/fold-div-1.c: Use -O1.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/match.pd
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/fold-div-1.c
Comment 6 Wilco 2017-09-15 17:26:52 UTC
(In reply to Jeffrey A. Law from comment #5)
> Author: law
> Date: Fri Sep 15 16:17:55 2017
> New Revision: 252827
> 
> URL: https://gcc.gnu.org/viewcvs?rev=252827&root=gcc&view=rev
> Log:
> 2017-09-15  Jackson Woodruff  <jackson.woodruff@arm.com>
> 
> 	PR tree-optimization/71026
> 	* match.pd: Move RDIV patterns from fold-const.c
> 	* fold-const.c (distribute_real_division): Removed.
> 	(fold_binary_loc): Remove calls to distribute_real_divison.
> 
> 	PR tree-optimization/71026
> 	* gcc/testsuire/gcc.dg/fold-div-1.c: Use -O1.
> 
> Modified:
>     trunk/gcc/ChangeLog
>     trunk/gcc/match.pd
>     trunk/gcc/testsuite/ChangeLog
>     trunk/gcc/testsuite/gcc.dg/fold-div-1.c

This seems to leave out the fold-const.c changes...
Comment 7 Wilco 2017-10-17 13:23:19 UTC
Author: wilco
Date: Tue Oct 17 13:22:48 2017
New Revision: 253812

URL: https://gcc.gnu.org/viewcvs?rev=253812&root=gcc&view=rev
Log:
Factor out division by squares and remove division around comparisons (0/2)

Commit gcc/fold-const.c missing from r252827:

    gcc/
	PR 71026/tree-optimization
	* fold-const.c (distribute_real_division): Removed.
	(fold_binary_loc): Remove calls to distribute_real_divison.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/fold-const.c
Comment 8 Wilco 2017-11-07 12:39:27 UTC
Author: wilco
Date: Tue Nov  7 12:38:55 2017
New Revision: 254497

URL: https://gcc.gnu.org/viewcvs?rev=254497&root=gcc&view=rev
Log:
PR71026: Canonicalize negates in division

Canonicalize x / (- y) into (-x) / y.

This moves negates out of the RHS of a division in order to
allow further simplifications and potentially more reciprocal CSEs.

2017-11-07  Wilco Dijkstra  <wdijkstr@arm.com>
	    Jackson Woodruff  <jackson.woodruff@arm.com>

    gcc/
	PR tree-optimization/71026
	* match.pd: Canonicalize negate in division.

    testsuite/
	PR 71026/tree-optimization/71026
	* gcc.dg/div_neg: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/div_neg.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/match.pd
    trunk/gcc/testsuite/ChangeLog
Comment 9 Wilco 2017-11-16 11:55:21 UTC
Author: wilco
Date: Thu Nov 16 11:54:49 2017
New Revision: 254816

URL: https://gcc.gnu.org/viewcvs?rev=254816&root=gcc&view=rev
Log:
Canonicalize constant multiplies in division

This patch implements some of the optimizations discussed in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026.

Canonicalize x / (C1 * y) into (x * C2) / y.

This moves constant multiplies out of the RHS of a division in order
to allow further simplifications (such as (C1 * x) / (C2 * y) ->
(C3 * x) / y) and to enable more reciprocal CSEs.

2017-11-16  Wilco Dijkstra  <wdijkstr@arm.com>
	    Jackson Woodruff  <jackson.woodruff@arm.com>

    gcc/
	PR tree-optimization/71026
	* match.pd: Canonicalize constant multiplies in division.

    gcc/testsuite/
	PR tree-optimization/71026
	* gcc.dg/cse_recip.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/cse_recip.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/match.pd
    trunk/gcc/testsuite/ChangeLog
Comment 10 Wilco 2017-11-24 16:03:44 UTC
Author: wilco
Date: Fri Nov 24 16:03:13 2017
New Revision: 255141

URL: https://gcc.gnu.org/viewcvs?rev=255141&root=gcc&view=rev
Log:
Factor out division by squares

This patch implements the some of the division optimizations discussed in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026.

The division reciprocal optimization now handles divisions by squares:

     x / (y * y) -> x  * (1 / y) * (1 / y)

This requires at least one more division by y before it triggers - the
3 divisions of (1/ y) are then CSEd into a single division.  Overall
this changes 1 division into 1 multiply, which is generally much faster.


2017-11-24  Jackson Woodruff  <jackson.woodruff@arm.com>

    gcc/
	PR tree-optimization/71026
	* tree-ssa-math-opts (is_division_by_square, is_square_of): New.
	(insert_reciprocals): Change to insert reciprocals before a division
	by a square and to insert the square of a reciprocal.
	(execute_cse_reciprocals_1): Change to consider division by a square.
	(register_division_in): Add importance parameter.

    testsuite/
	PR tree-optimization/71026
	* gfortran.dg/extract_recip_1.f: New test.
	* gcc.dg/extract_recip_3.c: New test.
	* gcc.dg/extract_recip_4.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/extract_recip_3.c
    trunk/gcc/testsuite/gcc.dg/extract_recip_4.c
    trunk/gcc/testsuite/gfortran.dg/extract_recip_1.f
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-math-opts.c
Comment 11 Wilco 2018-11-14 12:46:21 UTC
Author: wilco
Date: Wed Nov 14 12:45:29 2018
New Revision: 266142

URL: https://gcc.gnu.org/viewcvs?rev=266142&root=gcc&view=rev
Log:
Simplify floating point comparisons

This patch implements some of the optimizations discussed in PR71026.

Simplify (C / x >= 0.0) into x >= 0.0 with -funsafe-math-optimizations
(since C / x can underflow to zero if x is huge, it's not safe otherwise).
If C is negative the comparison is reversed.

Simplify (x * C1) > C2 into x > (C2 / C1) with -funsafe-math-optimizations.
If C1 is negative the comparison is reversed.

    gcc/
	PR 71026/tree-optimization
	* match.pd: Simplify floating point comparisons.

    gcc/testsuite/
	PR 71026/tree-optimization
	* gcc.dg/div-cmp-1.c: New test.
	* gcc.dg/div-cmp-2.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/div-cmp-1.c
    trunk/gcc/testsuite/gcc.dg/div-cmp-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/match.pd
    trunk/gcc/testsuite/ChangeLog
Comment 12 Wilco 2018-11-14 15:45:36 UTC
It looks the only case left to do is f5:

x * C <= 0.0 -> x <= 0.0 if C >= 1.0
x * C <= 0.0 -> x < FLT_MIN/C if C < 1.0
Comment 13 Christophe Lyon 2019-06-17 15:29:05 UTC
(In reply to Wilco from comment #12)
> It looks the only case left to do is f5:
> 
> x * C <= 0.0 -> x <= 0.0 if C >= 1.0
> x * C <= 0.0 -> x < FLT_MIN/C if C < 1.0

Commit r266142 (from #c11) added this to match.pd:
 (for cmp (lt le gt ge)
      neg_cmp (gt ge lt le)
  /* Simplify (x * C1) cmp C2 -> x cmp (C2 / C1), where C1 != 0.  */
  (simplify
   (cmp (mult @0 REAL_CST@1) REAL_CST@2)
   (with
    { tree tem = const_binop (RDIV_EXPR, type, @2, @1); }
    (if (tem
         && !(REAL_VALUE_ISINF (TREE_REAL_CST (tem))
             || (real_zerop (tem) && !real_zerop (@1))))
     (switch
      (if (real_less (&dconst0, TREE_REAL_CST_PTR (@1)))
       (cmp @0 { tem; }))
      (if (real_less (TREE_REAL_CST_PTR (@1), &dconst0))
       (neg_cmp @0 { tem; })))))))

Doesn't || (real_zerop (tem) && !real_zerop (@1)) prevent
> x * C <= 0.0 -> x <= 0.0 if C >= 1.0
from happening?