[patch 2/2] PR27313 Transform conditional stores
Michael Matz
matz@suse.de
Thu Apr 12 19:35:00 GMT 2007
Hi,
so, this is the second patch for fixing PR27313. It's inspired by
456.hmmer of SPEC2006, with the important situation looking like so:
int *p, *q, sc;
if ((sc = q[0]) > *p) *p = sc;
if ((sc = q[1]) > *p) *p = sc;
if ((sc = q[2]) > *p) *p = sc;
I.e. a MINMAX pattern, where both arguments are memory accesses, the
destination is the same all the time, and we can't prove that destination
and the other source don't alias. Additionally this is natural data,
hence any jump prediction is bound to fail here. This is in the hot loop
of 456.hmmer, and if we transform these into a jump-less sequence via
conditional moves we get a relatively large speedup of about 16%.
How we'd like to transform this is so:
1) prove that the writes don't trap. Trivial, as the same value is
accessed already in the condition. Done by the first patch.
2) Reorder above sequence to be an _un_conditional store, relying on other
optimization to transform this into a MINMAX pattern. Done by this
patch.
For (2) we rewrite the code (conceptually) like so:
if (cond) *p = x;
-->
int temp;
if (cond) temp = x;
else temp = *p;
*p = temp;
We do that only if the then-block only contains one instruction (the
store). The value '*p' will be available most of the time (it's used in
the condition after all, and someone also made the store provably
non-trapping), so the later dominator and phi optimizers will turn this
eventually into:
int t2, t3;
t2 = *p;
t3 = MAX_EXPR (t2, x);
*p = t3
As the current phi optimizer provides already the right infrastructure for
iterating over the blocks, I've chosen to implement the transformation
using the same functions. It can't really be run at the same time as
phiopt itself, as it creates opportunities for doing other phiopt
transformation, but only after a couple of cleanups.
For the curious, here's the result of that patch on SPEC2006:
base flags: -mtune=athlon64 -msse3 -O3 -ffast-math
peak flags: -mtune=athlon64 -msse3 -O3 -ffast-math -ftree-nontrap
No compile regressions introduced by the patch alone:
400.perlbench NR NR
401.bzip2 9650 1190 8.09 * 9650 1180 8.18 *
403.gcc 8050 1090 7.37 * 8050 1070 7.52 *
429.mcf 9120 1420 6.40 * 9120 1420 6.40 *
445.gobmk 10490 839 12.5 * 10490 828 12.7 *
456.hmmer 9330 1080 8.66 * 9330 941 9.92 *
458.sjeng 12100 1090 11.1 * 12100 1080 11.2 *
462.libquantum 20720 1830 11.3 * 20720 1830 11.3 *
464.h264ref 22130 1500 14.7 * 22130 1500 14.8 *
471.omnetpp 6250 896 6.98 * 6250 892 7.00 *
473.astar 7020 997 7.04 * 7020 1000 7.01 *
483.xalancbmk 6900 1320 5.23 * 6900 1320 5.25 *
410.bwaves 13590 1920 7.08 * 13590 1910 7.11 *
416.gamess 19580 1520 12.8 * 19580 1530 12.8 *
433.milc 9180 982 9.35 * 9180 984 9.33 *
434.zeusmp 9100 1000 9.06 * 9100 1000 9.07 *
435.gromacs 7140 980 7.28 * 7140 980 7.28 *
436.cactusADM 11950 1580 7.56 * 11950 1560 7.68 *
437.leslie3d 9400 1320 7.11 * 9400 1330 7.08 *
444.namd 8020 768 10.4 * 8020 762 10.5 *
447.dealII NR NR
450.soplex 8340 1080 7.71 * 8340 1080 7.71 *
453.povray 5320 385 13.8 * 5320 388 13.7 *
454.calculix 8250 1790 4.60 * 8250 1790 4.60 *
459.GemsFDTD 10610 1790 5.92 * 10610 1780 5.95 *
465.tonto 9840 1390 7.10 * 9840 1370 7.18 *
470.lbm 13740 1660 8.28 * 13740 1640 8.38 *
481.wrf NR NR
482.sphinx3 19490 1670 11.7 * 19490 1650 11.8 *
So, overall it's a win, improving many testcases, with 456.hmmer only
running 941 second instead of 1080.
As I said, tested together with the first patch on x86_64-linux, with
flag_tree_nontrap forced to be true.
Okay for trunk?
Ciao,
Michael.
--
PR middle-end/27313
* tree-pass.h (pass_cselim): Declare new pass.
* passes.c (init_optimization_passes): Link in pass_cselim.
* tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Renamed from
tree_ssa_phiopt; add do_store_elim parameter, handle it by calling
cond_store_replacement.
(condstoretemp): New static variable.
(cond_store_replacement): New function.
(tree_ssa_phiopt, tree_ssa_cs_elim): New wrappers around
tree_ssa_phiopt_worker.
(pass_cselim): Define new pass.
Index: tree-pass.h
===================================================================
--- tree-pass.h (revision 123256)
+++ tree-pass.h (working copy)
@@ -283,6 +283,7 @@ extern struct tree_opt_pass pass_cse_rec
extern struct tree_opt_pass pass_cse_sincos;
extern struct tree_opt_pass pass_warn_function_return;
extern struct tree_opt_pass pass_warn_function_noreturn;
+extern struct tree_opt_pass pass_cselim;
extern struct tree_opt_pass pass_phiopt;
extern struct tree_opt_pass pass_forwprop;
extern struct tree_opt_pass pass_nontrap;
Index: passes.c
===================================================================
--- passes.c (revision 123256)
+++ passes.c (working copy)
@@ -532,6 +532,7 @@ init_optimization_passes (void)
NEXT_PASS (pass_vrp);
NEXT_PASS (pass_dce);
NEXT_PASS (pass_nontrap);
+ NEXT_PASS (pass_cselim);
NEXT_PASS (pass_dominator);
/* The only const/copy propagation opportunities left after
Index: tree-ssa-phiopt.c
===================================================================
--- tree-ssa-phiopt.c (revision 123256)
+++ tree-ssa-phiopt.c (working copy)
@@ -35,7 +35,7 @@ Software Foundation, 51 Franklin Street,
#include "tree-dump.h"
#include "langhooks.h"
-static unsigned int tree_ssa_phiopt (void);
+static unsigned int tree_ssa_phiopt_worker (bool);
static bool conditional_replacement (basic_block, basic_block,
edge, edge, tree, tree, tree);
static bool value_replacement (basic_block, basic_block,
@@ -44,6 +44,7 @@ static bool minmax_replacement (basic_bl
edge, edge, tree, tree, tree);
static bool abs_replacement (basic_block, basic_block,
edge, edge, tree, tree, tree);
+static bool cond_store_replacement (basic_block, basic_block, edge, edge);
static void replace_phi_edge_with_variable (basic_block, edge, tree, tree);
static basic_block *blocks_in_phiopt_order (void);
@@ -136,11 +137,28 @@ static basic_block *blocks_in_phiopt_ord
static unsigned int
tree_ssa_phiopt (void)
{
+ return tree_ssa_phiopt_worker (false);
+}
+
+static unsigned int
+tree_ssa_cs_elim (void)
+{
+ return tree_ssa_phiopt_worker (true);
+}
+
+static tree condstoretemp;
+
+static unsigned int
+tree_ssa_phiopt_worker (bool do_store_elim)
+{
basic_block bb;
basic_block *bb_order;
unsigned n, i;
bool cfgchanged = false;
+ if (do_store_elim)
+ condstoretemp = NULL_TREE;
+
/* Search every basic block for COND_EXPR we may be able to optimize.
We walk the blocks in order that guarantees that a block with
@@ -211,36 +229,56 @@ tree_ssa_phiopt (void)
|| single_pred (bb1) != bb)
continue;
- phi = phi_nodes (bb2);
-
- /* Check to make sure that there is only one PHI node.
- TODO: we could do it with more than one iff the other PHI nodes
- have the same elements for these two edges. */
- if (!phi || PHI_CHAIN (phi) != NULL)
- continue;
-
- arg0 = PHI_ARG_DEF_TREE (phi, e1->dest_idx);
- arg1 = PHI_ARG_DEF_TREE (phi, e2->dest_idx);
+ if (do_store_elim)
+ {
+ /* bb1 is the middle block, bb2 the join block, bb the split block
+ e1 the fallthrough edge from bb1 to bb2. We can't do the
+ optimization if the join block has more than two predecessors. */
+ if (EDGE_COUNT (bb2->preds) > 2)
+ continue;
+ if (cond_store_replacement (bb1, bb2, e1, e2))
+ cfgchanged = true;
+ }
+ else
+ {
+ phi = phi_nodes (bb2);
- /* Something is wrong if we cannot find the arguments in the PHI
- node. */
- gcc_assert (arg0 != NULL && arg1 != NULL);
-
- /* Do the replacement of conditional if it can be done. */
- if (conditional_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
- cfgchanged = true;
- else if (value_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
- cfgchanged = true;
- else if (abs_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
- cfgchanged = true;
- else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
- cfgchanged = true;
+ /* Check to make sure that there is only one PHI node.
+ TODO: we could do it with more than one iff the other PHI nodes
+ have the same elements for these two edges. */
+ if (!phi || PHI_CHAIN (phi) != NULL)
+ continue;
+
+ arg0 = PHI_ARG_DEF_TREE (phi, e1->dest_idx);
+ arg1 = PHI_ARG_DEF_TREE (phi, e2->dest_idx);
+
+ /* Something is wrong if we cannot find the arguments in the PHI
+ node. */
+ gcc_assert (arg0 != NULL && arg1 != NULL);
+
+ /* Do the replacement of conditional if it can be done. */
+ if (conditional_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
+ cfgchanged = true;
+ else if (value_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
+ cfgchanged = true;
+ else if (abs_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
+ cfgchanged = true;
+ else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
+ cfgchanged = true;
+ }
}
free (bb_order);
- /* If the CFG has changed, we should cleanup the CFG. */
- return cfgchanged ? TODO_cleanup_cfg : 0;
+ /* If the CFG has changed, we should cleanup the CFG. */
+ if (cfgchanged && do_store_elim)
+ {
+ bsi_commit_edge_inserts ();
+ return TODO_cleanup_cfg | TODO_update_ssa_only_virtuals;
+ }
+ else if (cfgchanged)
+ return TODO_cleanup_cfg;
+ return 0;
}
/* Returns the list of basic blocks in the function in an order that guarantees
@@ -991,6 +1029,72 @@ abs_replacement (basic_block cond_bb, ba
return true;
}
+static bool
+cond_store_replacement (basic_block middle_bb, basic_block join_bb,
+ edge e0, edge e1)
+{
+ tree assign = last_and_only_stmt (middle_bb);
+ tree lhs, rhs, newexpr, name;
+ tree newphi;
+ block_stmt_iterator bsi;
+
+ /* Check if bb1 contains of only one store. */
+ if (!assign
+ || TREE_CODE (assign) != GIMPLE_MODIFY_STMT)
+ return false;
+
+ lhs = GIMPLE_STMT_OPERAND (assign, 0);
+ if (!INDIRECT_REF_P (lhs))
+ return false;
+ rhs = GIMPLE_STMT_OPERAND (assign, 1);
+ if (TREE_CODE (rhs) != SSA_NAME)
+ return false;
+ /* Prove that we can move the store down. */
+ if (!TREE_THIS_NOTRAP (lhs))
+ return false;
+
+ mark_symbols_for_renaming (assign);
+ bsi = bsi_for_stmt (assign);
+ bsi_remove (&bsi, true);
+
+ /* Build and insert the assignment of the end result to the temporary
+ that we will return. */
+ if (!condstoretemp || TREE_TYPE (lhs) != TREE_TYPE (condstoretemp))
+ {
+ condstoretemp = create_tmp_var (TREE_TYPE (lhs), "cstore");
+ get_var_ann (condstoretemp);
+ }
+
+ add_referenced_var (condstoretemp);
+
+ lhs = unshare_expr (lhs);
+ newexpr = build_gimple_modify_stmt (condstoretemp, lhs);
+ name = make_ssa_name (condstoretemp, newexpr);
+ GIMPLE_STMT_OPERAND (newexpr, 0) = name;
+ mark_symbols_for_renaming (newexpr);
+ bsi_insert_on_edge (e1, newexpr);
+
+ newphi = create_phi_node (condstoretemp, join_bb);
+ add_phi_arg (newphi, rhs, e0);
+ add_phi_arg (newphi, name, e1);
+
+ lhs = unshare_expr (lhs);
+ newexpr = build_gimple_modify_stmt (lhs, PHI_RESULT (newphi));
+ mark_symbols_for_renaming (newexpr);
+
+ bsi = bsi_start (join_bb);
+ while (!bsi_end_p (bsi) && TREE_CODE (bsi_stmt (bsi)) == LABEL_EXPR)
+ bsi_next (&bsi);
+ if (bsi_end_p (bsi))
+ {
+ bsi = bsi_last (join_bb);
+ bsi_insert_after (&bsi, newexpr, BSI_NEW_STMT);
+ }
+ else
+ bsi_insert_before (&bsi, newexpr, BSI_NEW_STMT);
+
+ return true;
+}
/* Always do these optimizations if we have SSA
trees to work on. */
@@ -1020,3 +1124,24 @@ struct tree_opt_pass pass_phiopt =
| TODO_verify_stmts, /* todo_flags_finish */
0 /* letter */
};
+
+struct tree_opt_pass pass_cselim =
+{
+ "cselim", /* name */
+ gate_phiopt, /* gate */
+ tree_ssa_cs_elim, /* execute */
+ NULL, /* sub */
+ NULL, /* next */
+ 0, /* static_pass_number */
+ TV_TREE_PHIOPT, /* tv_id */
+ PROP_cfg | PROP_ssa | PROP_alias, /* properties_required */
+ 0, /* properties_provided */
+ 0, /* properties_destroyed */
+ 0, /* todo_flags_start */
+ TODO_dump_func
+ | TODO_ggc_collect
+ | TODO_verify_ssa
+ | TODO_verify_flow
+ | TODO_verify_stmts, /* todo_flags_finish */
+ 0 /* letter */
+};
More information about the Gcc-patches
mailing list