This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[patch] tree-optimize.c: Run forwprop right before VRP.
- From: Kazu Hirata <kazu at cs dot umass dot edu>
- To: gcc-patches at gcc dot gnu dot org
- Cc: dnovillo at redhat dot com
- Date: Fri, 15 Apr 2005 16:14:36 -0400 (EDT)
- Subject: [patch] tree-optimize.c: Run forwprop right before VRP.
Hi,
Attached is a patch to run forwprop right before VRP.
Consider
int
foo (int a)
{
int b = a != 0;
if (b)
if (a != 0)
return 1;
return 0;
}
Note that 'b' is used in only one place, namely the first "if"
statement. In this case, VRP does not insert any ASSERT_EXPR because
there is no use of 'b' below the first "if" statement. Nor is there
use of 'a' below the second "if" statement. Having inserted no
ASSERT_EXPR, VRP cannot determine that the condition in the second
"if" statement is always true.
The patch fixes this problem by running forwprop before VRP. In doing
so, we propagate the definition of 'b', which is 'a != 0', into the
sole use of b, obtaining
int
foo (int a)
{
if (a != 0)
if (a != 0)
return 1;
return 0;
}
which is then trivial for VRP to optimize.
Here are some justifications.
1) With the current pass ordering, the number of COND_EXPRs that VRP
folds into unconditional jumps does increase from 50 to 277 while
compiling cc1-i files. An impressive 454% improvement!
With a TCB-like pass ordering
NEXT_PASS (pass_ccp);
NEXT_PASS (pass_copy_prop);
NEXT_PASS (pass_fre);
NEXT_PASS (pass_dce);
NEXT_PASS (pass_vrp);
but without this patch, VRP folds 1039 COND_EXPRs into unconditonal
jumps while compiling cc1-i files. If we run forwprop immediately
before VRP, VRP folds 1189 COND_EXPRs into unconditional jumps. A
14% improvement! (There are other cases where we fold COND_EXPRs
into an SSA_NAME, but I am not including those as they are not as
cool as folding them into unconditonal jumps.)
2) We do not want to insert ASSERT_EXPRs for variables that are dead
after COND_EXPRs because the incremental SSA update isn't free.
Running DCE before VRP exactly solves this problem in the original
pass ordering. In the new pass ordering, running forwprop between
DCE and VRP still maintains dead-code free instruction stream
because forwprop removes statements as they become dead as a result
of forward propagation.
3) You might wonder if running VRP may give rise to more forwprop
opportunities by making some variables used only once. To answer
this question, I put another forwprop where the first forwprop was
originally at, that is, right before the first PHI-OPT. As far as
I can tell from compiling cc1-i files and MICO's files, the
forwprop between merge_phi and PHIOPT did not find any new forward
propagation opportunity.
4) merge_phi seems to be unaffected. With or with this patch,
merge_phi removes the same number of basic blocks, namely 2841
blocks (or 1.7% of all basic blocks), while compiling cc1-i files
although the number of all basic blocks is slightly different (less
than 0.02%) with and without the patch.
5) We could have VRP walk the use-def chains upward starting from a
COND_EXPR and insert ASSERT_EXPRs on those variables that appear in
the chain. For example, if we have
c = a | b;
if (c == 0)
then we could insert "a = 0;" and "b = 0;" on the "then" arm of the
"if" statement (as described in PR 20580), but that adds some
complexity to VRP. By running forwprop before VRP, we can skip
this kind of complexity. (I'm still interested in solving PR
20580, though.)
6) Last but not least, the compile-time stays almost unchanged.
Without this patch, compiling cc1-i files takes 177.655 seconds.
With this patch, it takes 177.784 seconds. The patched version
takes 0.072% more.
As mentioned above, the improvement that this patch brings in the
current pass ordering is very small, but I am wondering if I could
apply the patch anyway so that Diego could keep this new order,
forwprop followed by VRP, in mind when he reorders tree passes
(assuming he agrees that this order is a good idea). The testcase
itself should also ensure that forwprop is run before VRP.
Tested on i686-pc-linux-gnu. OK to apply?
Kazu Hirata
2005-04-15 Kazu Hirata <kazu@cs.umass.edu>
PR tree-optimization/21001
* tree-optimize.c (init_tree_optimization_passes): Move the
first pass_forwprop immediately before pass_vrp.
2005-04-15 Kazu Hirata <kazu@cs.umass.edu>
PR tree-optimization/21001
* gcc.dg/tree-ssa/pr21001.c: New.
Index: tree-optimize.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-optimize.c,v
retrieving revision 2.83
diff -u -d -p -r2.83 tree-optimize.c
--- tree-optimize.c 13 Apr 2005 04:29:39 -0000 2.83
+++ tree-optimize.c 15 Apr 2005 15:06:25 -0000
@@ -356,9 +356,9 @@ init_tree_optimization_passes (void)
NEXT_PASS (pass_dominator);
NEXT_PASS (pass_copy_prop);
NEXT_PASS (pass_dce);
+ NEXT_PASS (pass_forwprop);
NEXT_PASS (pass_vrp);
NEXT_PASS (pass_merge_phi);
- NEXT_PASS (pass_forwprop);
NEXT_PASS (pass_phiopt);
NEXT_PASS (pass_may_alias);
NEXT_PASS (pass_tail_recursion);
--- /dev/null 2005-04-13 14:59:22.325563136 -0400
+++ testsuite/gcc.dg/tree-ssa/pr21001.c 2005-04-13 15:08:31.000000000 -0400
@@ -0,0 +1,20 @@
+/* PR tree-optimization/21001
+ VRP did not insert ASSERT_EXPRs when the variable tested in a
+ COND_EXPR is a single-use variable. By propagating the definition
+ of the single-use variable into the COND_EXPR, we can get useful
+ range infomation out of the conditional. */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-tree-dominator-opts -fdump-tree-vrp-details" } */
+
+int
+foo (int a)
+{
+ int b = a != 0;
+ if (b)
+ if (a != 0)
+ return 1;
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Folding predicate" 1 "vrp"} } */