This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[patch] tree-optimize.c: Run forwprop right before VRP.


Hi,

Attached is a patch to run forwprop right before VRP.

Consider

int
foo (int a)
{
  int b = a != 0;
  if (b)
    if (a != 0)
      return 1;
  return 0;
}

Note that 'b' is used in only one place, namely the first "if"
statement.  In this case, VRP does not insert any ASSERT_EXPR because
there is no use of 'b' below the first "if" statement.  Nor is there
use of 'a' below the second "if" statement.  Having inserted no
ASSERT_EXPR, VRP cannot determine that the condition in the second
"if" statement is always true.

The patch fixes this problem by running forwprop before VRP.  In doing
so, we propagate the definition of 'b', which is 'a != 0', into the
sole use of b, obtaining

int
foo (int a)
{
  if (a != 0)
    if (a != 0)
      return 1;
  return 0;
}

which is then trivial for VRP to optimize.

Here are some justifications.

1) With the current pass ordering, the number of COND_EXPRs that VRP
   folds into unconditional jumps does increase from 50 to 277 while
   compiling cc1-i files.  An impressive 454% improvement!

   With a TCB-like pass ordering

     NEXT_PASS (pass_ccp);
     NEXT_PASS (pass_copy_prop);
     NEXT_PASS (pass_fre);
     NEXT_PASS (pass_dce);
     NEXT_PASS (pass_vrp);

   but without this patch, VRP folds 1039 COND_EXPRs into unconditonal
   jumps while compiling cc1-i files.  If we run forwprop immediately
   before VRP, VRP folds 1189 COND_EXPRs into unconditional jumps.  A
   14% improvement!  (There are other cases where we fold COND_EXPRs
   into an SSA_NAME, but I am not including those as they are not as
   cool as folding them into unconditonal jumps.)

2) We do not want to insert ASSERT_EXPRs for variables that are dead
   after COND_EXPRs because the incremental SSA update isn't free.
   Running DCE before VRP exactly solves this problem in the original
   pass ordering.  In the new pass ordering, running forwprop between
   DCE and VRP still maintains dead-code free instruction stream
   because forwprop removes statements as they become dead as a result
   of forward propagation.

3) You might wonder if running VRP may give rise to more forwprop
   opportunities by making some variables used only once.  To answer
   this question, I put another forwprop where the first forwprop was
   originally at, that is, right before the first PHI-OPT.  As far as
   I can tell from compiling cc1-i files and MICO's files, the
   forwprop between merge_phi and PHIOPT did not find any new forward
   propagation opportunity.

4) merge_phi seems to be unaffected.  With or with this patch,
   merge_phi removes the same number of basic blocks, namely 2841
   blocks (or 1.7% of all basic blocks), while compiling cc1-i files
   although the number of all basic blocks is slightly different (less
   than 0.02%) with and without the patch.

5) We could have VRP walk the use-def chains upward starting from a
   COND_EXPR and insert ASSERT_EXPRs on those variables that appear in
   the chain.  For example, if we have

     c = a | b;
     if (c == 0)

   then we could insert "a = 0;" and "b = 0;" on the "then" arm of the
   "if" statement (as described in PR 20580), but that adds some
   complexity to VRP.  By running forwprop before VRP, we can skip
   this kind of complexity.  (I'm still interested in solving PR
   20580, though.)

6) Last but not least, the compile-time stays almost unchanged.
   Without this patch, compiling cc1-i files takes 177.655 seconds.
   With this patch, it takes 177.784 seconds.  The patched version
   takes 0.072% more.

As mentioned above, the improvement that this patch brings in the
current pass ordering is very small, but I am wondering if I could
apply the patch anyway so that Diego could keep this new order,
forwprop followed by VRP, in mind when he reorders tree passes
(assuming he agrees that this order is a good idea).  The testcase
itself should also ensure that forwprop is run before VRP.

Tested on i686-pc-linux-gnu.  OK to apply?

Kazu Hirata

2005-04-15  Kazu Hirata  <kazu@cs.umass.edu>

	PR tree-optimization/21001
	* tree-optimize.c (init_tree_optimization_passes): Move the
	first pass_forwprop immediately before pass_vrp.

2005-04-15  Kazu Hirata  <kazu@cs.umass.edu>

	PR tree-optimization/21001
	* gcc.dg/tree-ssa/pr21001.c: New.

Index: tree-optimize.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-optimize.c,v
retrieving revision 2.83
diff -u -d -p -r2.83 tree-optimize.c
--- tree-optimize.c	13 Apr 2005 04:29:39 -0000	2.83
+++ tree-optimize.c	15 Apr 2005 15:06:25 -0000
@@ -356,9 +356,9 @@ init_tree_optimization_passes (void)
   NEXT_PASS (pass_dominator);
   NEXT_PASS (pass_copy_prop);
   NEXT_PASS (pass_dce);
+  NEXT_PASS (pass_forwprop);
   NEXT_PASS (pass_vrp);
   NEXT_PASS (pass_merge_phi);
-  NEXT_PASS (pass_forwprop);
   NEXT_PASS (pass_phiopt);
   NEXT_PASS (pass_may_alias);
   NEXT_PASS (pass_tail_recursion);
--- /dev/null	2005-04-13 14:59:22.325563136 -0400
+++ testsuite/gcc.dg/tree-ssa/pr21001.c	2005-04-13 15:08:31.000000000 -0400
@@ -0,0 +1,20 @@
+/* PR tree-optimization/21001
+   VRP did not insert ASSERT_EXPRs when the variable tested in a
+   COND_EXPR is a single-use variable.  By propagating the definition
+   of the single-use variable into the COND_EXPR, we can get useful
+   range infomation out of the conditional.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-tree-dominator-opts -fdump-tree-vrp-details" } */
+
+int
+foo (int a)
+{
+  int b = a != 0;
+  if (b)
+    if (a != 0)
+      return 1;
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Folding predicate" 1 "vrp"} } */


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]