[Bug middle-end/19988] [4.0/4.1/4.2/4.3 Regression] pessimizes fp multiply-add/subtract combo

stevenj at alum dot mit dot edu gcc-bugzilla@gcc.gnu.org
Wed Aug 1 01:01:00 GMT 2007



------- Comment #11 from stevenj at alum dot mit dot edu  2007-08-01 01:01 -------
Given that there seems to be no progress, or even any prospect of progress, on
Roger's "un-CSE" suggestion, isn't there a simpler fix?

The basic issue here seems to be the abovementioned 2004-02-07 patch identified
by Andrew Pinski:

        Optimize A - B as A + (-B), if B is easily negated.

Here we have a concrete example, which appears in real applications, of how
this can hurt performance: basically, it destroys common subexpressions if A+B
also appears in the code.  Roger opined that A + (-B) seems "more canonical"
and "should help" with other optimizations, but he didn't give any specific
examples where this actually helps performance. Absent such examples, isn't it
better to revert that part of the patch?

I mentioned the case of the PowerPC, where gcc previously (3.4) destroyed an
fma by pulling out C*Y via CSE in X +/- (C*Y).  But gcc 4+ preserves the fmas
(saving one fp op) at the cost of an extra constant load, which hardly seems an
improvement.  A better behavior would be, for architectures with fma, to honor
fma's that appear explicitly in the source code, since CSE of C*Y will never
yield an improvement for X +/- C*Y on those architectures (see above). 
However, even without that fix, the old behavior of leaving X - C*Y as-is seems
preferable. [Even if somehow an extra load is better than an extra multiply on
fma architectures(?), marginally improving gcc for PowerPC etc. at the expense
of x86 seems a poor tradeoff.]

The following patch (to gcc-4.2.1) fixes the problem for me (it removes the
extra constant load and the extra multiplication from the example code I
posted), by just disabling the A - B -> A + (-B) transformation for
floating-point expressions, and hence reverting to the gcc-3.4 behavior in that
case.  

+++ fold-const.c        2007-07-31 20:50:29.512587550 -0400
@@ -9038,11 +9038,8 @@

       /* A - B -> A + (-B) if B is easily negatable.  */
       if (negate_expr_p (arg1)
-         && ((FLOAT_TYPE_P (type)
-               /* Avoid this transformation if B is a positive REAL_CST.  */
-              && (TREE_CODE (arg1) != REAL_CST
-                  ||  REAL_VALUE_NEGATIVE (TREE_REAL_CST (arg1))))
-             || (INTEGRAL_TYPE_P (type) && flag_wrapv && !flag_trapv)))
+         /* !FLOAT_TYPE_P (type) might increase # of fp constants */
+         && (INTEGRAL_TYPE_P (type) && flag_wrapv && !flag_trapv))
        return fold_build2 (PLUS_EXPR, type,
                            fold_convert (type, arg0),
                            fold_convert (type, negate_expr (arg1)));

Better yet, disable it for the integer case too (deleting the transformation
entirely), since this transformation can still destroy common subexpressions
even when you have direct constants:

+++ fold-const.c      2007-07-31 20:57:35.339929690 -0400
@@ -9036,17 +9036,6 @@
          && operand_equal_p (arg0, arg1, 0))
        return fold_convert (type, integer_zero_node);

-      /* A - B -> A + (-B) if B is easily negatable.  */
-      if (negate_expr_p (arg1)
-         && ((FLOAT_TYPE_P (type)
-               /* Avoid this transformation if B is a positive REAL_CST.  */
-              && (TREE_CODE (arg1) != REAL_CST
-                  ||  REAL_VALUE_NEGATIVE (TREE_REAL_CST (arg1))))
-             || (INTEGRAL_TYPE_P (type) && flag_wrapv && !flag_trapv)))
-       return fold_build2 (PLUS_EXPR, type,
-                           fold_convert (type, arg0),
-                           fold_convert (type, negate_expr (arg1)));
-
       /* Try folding difference of addresses.  */
       {
        HOST_WIDE_INT diff;


What is the downside?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19988



More information about the Gcc-bugs mailing list