This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)


On 08/29/2017 01:13 PM, Richard Biener wrote:
On Tue, Aug 29, 2017 at 1:35 PM, Jackson Woodruff
<jackson.woodruff@foss.arm.com> wrote:
Hi all,

Apologies again to those CC'ed, who (again) received this twice.

Joseph: Yes you are correct. I misread the original thread, now fixed.

Richard: I've moved the optimizations out of fold-const.c. One has been
replicated in match.pd, and the other (x / C +- y / C -> (x +- y) / C) I've
deleted as it only introduced a new optimization when running with the flags
'-O0 -funsafe-math-optimizations'.

Hmm, how did you verify that, that it only adds sth with -O0
-funsafe-math-optimizations?

By checking with various flags, although not exhaustively. I looked for reasons for the behavior in match.pd (explained below).

I have also since discovered that the combinations of '-funsafe-math-optimizations -frounding-math' (at all levels) and '-fno-recriprocal-math -funsafe-math-optimizations' mean this pattern adds something.

Is that because in GIMPLE the reassoc pass should do this transform?
It's because the pattern that changes (X / C) -> X * (1 / C) is gated with O1:

  (for cst (REAL_CST COMPLEX_CST VECTOR_CST)
   (simplify
    (rdiv @0 cst@1)
->    (if (optimize)
->     (if (flag_reciprocal_math
      && !real_zerop (@1))
      (with
{ tree tem = const_binop (RDIV_EXPR, type, build_one_cst (type), @1); }
       (if (tem)
        (mult @0 { tem; } )))
      (if (cst != COMPLEX_CST)
       (with { tree inverse = exact_inverse (type, @1); }
        (if (inverse)
         (mult @0 { inverse; } ))))))))


I've flagged the two lines that are particularly relevant to this.

Removing this pattern, as I would expect, means that the divisions in the above optimization (and the one further down) are not removed.

So then there is the question of edge cases. This pattern is (ignoring the second case) going to fail when const_binop returns null. Looking through that function says that it will fail (for reals) when:

- Either argument is null (not the case)

- The operation is not in the list (PLUS_EXPR,
  MINUS_EXPR, MULT_EXPR, RDIV_EXPR, MIN_EXPR, MAX_EXPR)
  (again not the case)

- We honor Signalling NaNs and one of the operands is a sNaN.

- The operation is a division, and the second argument is zero 	
  and dividing by 0.0 raises an exception.

- The result is infinity and neither of the operands were infinity
  and flag_trapping_math is set.

- The result isn't exact and flag_rounding_math is set.


For (x / ( y * C) -> (x / C) / y), I will add some gating for each of these so that the pattern is never executed if one of these would be the case.

The additional cases where this isn't converted to a multiplication by the reciprocal appear to be when -freciprocal-math is used, but we don't have -fno-rounding-math, or funsafe-math-optimizations.

For the removal of (x / C +- y / C -> (x +- y) / C), there is a trade-off of whether it is worth having an extra pattern for these edge cases. On this I'm not sure.


You added

+/* Simplify x / (- y) to -x / y.  */
+(simplify
+ (rdiv @0 (negate @1))
+ (rdiv (negate @0) @1))

Sorry, this was unclear, the changes to fold const should be split up from the additions to match.pd.

This was one of the changes discussed before. The idea is to canonicalize this so that y can be extracted in the cse_reciprocals pass.



for

       /* (-A) / (-B) -> A / B  */
       if (TREE_CODE (arg0) == NEGATE_EXPR && negate_expr_p (arg1))
         return fold_build2_loc (loc, RDIV_EXPR, type,
                             TREE_OPERAND (arg0, 0),
                             negate_expr (arg1));
       if (TREE_CODE (arg1) == NEGATE_EXPR && negate_expr_p (arg0))
         return fold_build2_loc (loc, RDIV_EXPR, type,
                             negate_expr (arg0),
                             TREE_OPERAND (arg1, 0));

presumably?  This isn't equivalent.  It's more like

(simplify
   (rdiv (negate_expr_p @0) (negate @1))
   (rdiv (negate @0) @1))
(simplify
   (rdiv (negate @0) (negate_expr_p @1))
   (rdiv @0 (negate @1)))

and you should remove the corresponding fold-const.c code.

Please do these changes independently to aid bisecting in case of issues.

I'll resubmit two different patches when I can get them separated. It should also make it easier to review when it is clearer what belongs where.


  (if (flag_reciprocal_math)
- /* Convert (A/B)/C to A/(B*C)  */
+ /* Convert (A/B)/C to A/(B*C) where neither B nor C are constant.  */
   (simplify
    (rdiv (rdiv:s @0 @1) @2)
-   (rdiv @0 (mult @1 @2)))
+   (if (TREE_CODE (@1) != REAL_CST && TREE_CODE (@1) != REAL_CST)
+    (rdiv @0 (mult @1 @2))))

why?  I guess to avoid ping-poning with

Yes, although there is a mistake there. It should be:

(if (TREE_CODE (@1) != REAL_CST && TREE_CODE (@2) != REAL_CST)

I'll fix that up.


+ /* Simplify x / (C * y) to (x / C) / y where C is a constant.  */
+ (simplify
+  (rdiv @0
+   (mult @1 REAL_CST@2))
+  (rdiv (rdiv @0 @2) @1))

?  If so why not just disallow for @1 == REAL_CST?

The ping-ponging issue is there even when only one of the operands is a constant (which of course it has to be for this pattern to apply). For example, something like x / (y * 3), with '-O0 -ffast-math',
when the reciprocal isn't computed, ping pongs back and forth.

I'm not sure that it can be fixed without a condition on the pattern already there.


On O1 and up, the pattern that replaces 'x / C' with 'x * (1 / C)'
is enabled and then the pattern is covered by that and
(x * C +- y * C -> C * (x +- y)) (which is already present in match.pd)

I have also updated the testcase for those optimizations to use 'O1' to
avoid that case.


On 08/24/2017 10:06 PM, Jeff Law wrote:

On 08/17/2017 03:55 AM, Wilco Dijkstra wrote:

Richard Biener wrote:

On Tue, Aug 15, 2017 at 4:11 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com>
wrote:

Richard Biener wrote:

We also change the association of

        x / (y * C) -> (x / C) / y

If C is a constant.


Why's that profitable?


It enables (x * C1) / (y * C2) -> (x * C1/C2) / y for example.
Also 1/y is now available to the reciprocal optimization, see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026 for details.


Sure, but on its own it's going to be slower.  So this isn't the
correct way to enable those followup transforms.


How can it be any slower? It's one division and one multiply in both
cases.

x / (y * C) -> (x / C) / y

Goes from one division and one multiplication to two divisions.  I'm
guessing that's what Richi is (reasonably) concerned about.

So it may be the case that we need more complex pattern matching here
for when to perform the first transformation to ultimately enable the
second.


The only case where we don't remove the division but still execute this
pattern is when run with'-O0 -freciprocal-math'.

As long as we have 'O1' or greater (and -freciprocal-math), that is enough
to enable the removal of the reciprocal.

I don't see this.  I presume you mean this happens in the recip pass?

It's one of the match.pd patterns that actually does the extraction since C is always constant. I've copied in the pattern above in response to one of your previous comments.

But we don't optimize this when optimizing for size (but the above pattern > still applies) or when targetm.min_divisions_for_recip_mul is too large

It was my understanding that this is used to specify the number of divisions you have to be able to replace with a multiplication before it is worthwhile inserting an extra multiplication.

So for situations like this, I think that this is not quite what is being measured, because there isn't an extra multiplication being inserted?


So I still think this is the wrong place to do this and instead the recip
pass should be extended.



Jackson.


Jeff




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]