[Bug middleend/19988] [4.0/4.1/4.2/4.3 Regression] pessimizes fp multiplyadd/subtract combo
stevenj at alum dot mit dot edu
gccbugzilla@gcc.gnu.org
Wed Aug 1 01:01:00 GMT 2007
 Comment #11 from stevenj at alum dot mit dot edu 20070801 01:01 
Given that there seems to be no progress, or even any prospect of progress, on
Roger's "unCSE" suggestion, isn't there a simpler fix?
The basic issue here seems to be the abovementioned 20040207 patch identified
by Andrew Pinski:
Optimize A  B as A + (B), if B is easily negated.
Here we have a concrete example, which appears in real applications, of how
this can hurt performance: basically, it destroys common subexpressions if A+B
also appears in the code. Roger opined that A + (B) seems "more canonical"
and "should help" with other optimizations, but he didn't give any specific
examples where this actually helps performance. Absent such examples, isn't it
better to revert that part of the patch?
I mentioned the case of the PowerPC, where gcc previously (3.4) destroyed an
fma by pulling out C*Y via CSE in X +/ (C*Y). But gcc 4+ preserves the fmas
(saving one fp op) at the cost of an extra constant load, which hardly seems an
improvement. A better behavior would be, for architectures with fma, to honor
fma's that appear explicitly in the source code, since CSE of C*Y will never
yield an improvement for X +/ C*Y on those architectures (see above).
However, even without that fix, the old behavior of leaving X  C*Y asis seems
preferable. [Even if somehow an extra load is better than an extra multiply on
fma architectures(?), marginally improving gcc for PowerPC etc. at the expense
of x86 seems a poor tradeoff.]
The following patch (to gcc4.2.1) fixes the problem for me (it removes the
extra constant load and the extra multiplication from the example code I
posted), by just disabling the A  B > A + (B) transformation for
floatingpoint expressions, and hence reverting to the gcc3.4 behavior in that
case.
+++ foldconst.c 20070731 20:50:29.512587550 0400
@@ 9038,11 +9038,8 @@
/* A  B > A + (B) if B is easily negatable. */
if (negate_expr_p (arg1)
 && ((FLOAT_TYPE_P (type)
 /* Avoid this transformation if B is a positive REAL_CST. */
 && (TREE_CODE (arg1) != REAL_CST
  REAL_VALUE_NEGATIVE (TREE_REAL_CST (arg1))))
  (INTEGRAL_TYPE_P (type) && flag_wrapv && !flag_trapv)))
+ /* !FLOAT_TYPE_P (type) might increase # of fp constants */
+ && (INTEGRAL_TYPE_P (type) && flag_wrapv && !flag_trapv))
return fold_build2 (PLUS_EXPR, type,
fold_convert (type, arg0),
fold_convert (type, negate_expr (arg1)));
Better yet, disable it for the integer case too (deleting the transformation
entirely), since this transformation can still destroy common subexpressions
even when you have direct constants:
+++ foldconst.c 20070731 20:57:35.339929690 0400
@@ 9036,17 +9036,6 @@
&& operand_equal_p (arg0, arg1, 0))
return fold_convert (type, integer_zero_node);
 /* A  B > A + (B) if B is easily negatable. */
 if (negate_expr_p (arg1)
 && ((FLOAT_TYPE_P (type)
 /* Avoid this transformation if B is a positive REAL_CST. */
 && (TREE_CODE (arg1) != REAL_CST
  REAL_VALUE_NEGATIVE (TREE_REAL_CST (arg1))))
  (INTEGRAL_TYPE_P (type) && flag_wrapv && !flag_trapv)))
 return fold_build2 (PLUS_EXPR, type,
 fold_convert (type, arg0),
 fold_convert (type, negate_expr (arg1)));

/* Try folding difference of addresses. */
{
HOST_WIDE_INT diff;
What is the downside?

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19988
More information about the Gccbugs
mailing list