[patch 2/2] PR27313 Transform conditional stores

Michael Matz matz@suse.de
Thu Apr 12 19:35:00 GMT 2007


Hi,

so, this is the second patch for fixing PR27313.  It's inspired by 
456.hmmer of SPEC2006, with the important situation looking like so:

  int *p, *q, sc;
  if ((sc = q[0]) > *p) *p = sc;
  if ((sc = q[1]) > *p) *p = sc;
  if ((sc = q[2]) > *p) *p = sc;

I.e. a MINMAX pattern, where both arguments are memory accesses, the 
destination is the same all the time, and we can't prove that destination 
and the other source don't alias.  Additionally this is natural data, 
hence any jump prediction is bound to fail here.  This is in the hot loop 
of 456.hmmer, and if we transform these into a jump-less sequence via 
conditional moves we get a relatively large speedup of about 16%.

How we'd like to transform this is so:
1) prove that the writes don't trap.  Trivial, as the same value is
   accessed already in the condition.  Done by the first patch.
2) Reorder above sequence to be an _un_conditional store, relying on other 
   optimization to transform this into a MINMAX pattern.  Done by this 
   patch.

For (2) we rewrite the code (conceptually) like so:

if (cond) *p = x;

-->

int temp;
if (cond) temp = x;
else      temp = *p;
*p = temp;

We do that only if the then-block only contains one instruction (the 
store).  The value '*p' will be available most of the time (it's used in 
the condition after all, and someone also made the store provably 
non-trapping), so the later dominator and phi optimizers will turn this 
eventually into:

int t2, t3;
t2 = *p;
t3 = MAX_EXPR (t2, x);
*p = t3

As the current phi optimizer provides already the right infrastructure for 
iterating over the blocks, I've chosen to implement the transformation 
using the same functions.  It can't really be run at the same time as 
phiopt itself, as it creates opportunities for doing other phiopt 
transformation, but only after a couple of cleanups.

For the curious, here's the result of that patch on SPEC2006:
base flags: -mtune=athlon64 -msse3 -O3 -ffast-math
peak flags: -mtune=athlon64 -msse3 -O3 -ffast-math -ftree-nontrap
No compile regressions introduced by the patch alone:

400.perlbench                               NR                              NR
401.bzip2        9650       1190       8.09 *    9650       1180       8.18 *
403.gcc          8050       1090       7.37 *    8050       1070       7.52 *
429.mcf          9120       1420       6.40 *    9120       1420       6.40 *
445.gobmk       10490        839      12.5  *   10490        828      12.7  *
456.hmmer        9330       1080       8.66 *    9330        941       9.92 *
458.sjeng       12100       1090      11.1  *   12100       1080      11.2  *
462.libquantum  20720       1830      11.3  *   20720       1830      11.3  *
464.h264ref     22130       1500      14.7  *   22130       1500      14.8  *
471.omnetpp      6250        896       6.98 *    6250        892       7.00 *
473.astar        7020        997       7.04 *    7020       1000       7.01 *
483.xalancbmk    6900       1320       5.23 *    6900       1320       5.25 *

410.bwaves      13590   1920          7.08  *   13590   1910          7.11  *
416.gamess      19580   1520         12.8   *   19580   1530         12.8   *
433.milc         9180    982          9.35  *    9180    984          9.33  *
434.zeusmp       9100   1000          9.06  *    9100   1000          9.07  *
435.gromacs      7140    980          7.28  *    7140    980          7.28  *
436.cactusADM   11950   1580          7.56  *   11950   1560          7.68  *
437.leslie3d     9400   1320          7.11  *    9400   1330          7.08  *
444.namd         8020    768         10.4   *    8020    762         10.5   *
447.dealII                                  NR                              NR
450.soplex       8340   1080          7.71  *    8340   1080          7.71  *
453.povray       5320    385         13.8   *    5320    388         13.7   *
454.calculix     8250   1790          4.60  *    8250   1790          4.60  *
459.GemsFDTD    10610   1790          5.92  *   10610   1780          5.95  *
465.tonto        9840   1390          7.10  *    9840   1370          7.18  *
470.lbm         13740   1660          8.28  *   13740   1640          8.38  *
481.wrf                                     NR                              NR
482.sphinx3     19490   1670         11.7   *   19490   1650         11.8   *

So, overall it's a win, improving many testcases, with 456.hmmer only 
running 941 second instead of 1080.

As I said, tested together with the first patch on x86_64-linux, with 
flag_tree_nontrap forced to be true.

Okay for trunk?


Ciao,
Michael.
-- 
	PR middle-end/27313

	* tree-pass.h (pass_cselim): Declare new pass.
	* passes.c (init_optimization_passes): Link in pass_cselim.
	* tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Renamed from
	tree_ssa_phiopt; add do_store_elim parameter, handle it by calling
	cond_store_replacement.
	(condstoretemp): New static variable.
	(cond_store_replacement): New function.
	(tree_ssa_phiopt, tree_ssa_cs_elim): New wrappers around
	tree_ssa_phiopt_worker.
	(pass_cselim): Define new pass.

Index: tree-pass.h
===================================================================
--- tree-pass.h	(revision 123256)
+++ tree-pass.h	(working copy)
@@ -283,6 +283,7 @@ extern struct tree_opt_pass pass_cse_rec
 extern struct tree_opt_pass pass_cse_sincos;
 extern struct tree_opt_pass pass_warn_function_return;
 extern struct tree_opt_pass pass_warn_function_noreturn;
+extern struct tree_opt_pass pass_cselim;
 extern struct tree_opt_pass pass_phiopt;
 extern struct tree_opt_pass pass_forwprop;
 extern struct tree_opt_pass pass_nontrap;
Index: passes.c
===================================================================
--- passes.c	(revision 123256)
+++ passes.c	(working copy)
@@ -532,6 +532,7 @@ init_optimization_passes (void)
       NEXT_PASS (pass_vrp);
       NEXT_PASS (pass_dce);
       NEXT_PASS (pass_nontrap);
+      NEXT_PASS (pass_cselim);
       NEXT_PASS (pass_dominator);
 
       /* The only const/copy propagation opportunities left after
Index: tree-ssa-phiopt.c
===================================================================
--- tree-ssa-phiopt.c	(revision 123256)
+++ tree-ssa-phiopt.c	(working copy)
@@ -35,7 +35,7 @@ Software Foundation, 51 Franklin Street,
 #include "tree-dump.h"
 #include "langhooks.h"
 
-static unsigned int tree_ssa_phiopt (void);
+static unsigned int tree_ssa_phiopt_worker (bool);
 static bool conditional_replacement (basic_block, basic_block,
 				     edge, edge, tree, tree, tree);
 static bool value_replacement (basic_block, basic_block,
@@ -44,6 +44,7 @@ static bool minmax_replacement (basic_bl
 				edge, edge, tree, tree, tree);
 static bool abs_replacement (basic_block, basic_block,
 			     edge, edge, tree, tree, tree);
+static bool cond_store_replacement (basic_block, basic_block, edge, edge);
 static void replace_phi_edge_with_variable (basic_block, edge, tree, tree);
 static basic_block *blocks_in_phiopt_order (void);
 
@@ -136,11 +137,28 @@ static basic_block *blocks_in_phiopt_ord
 static unsigned int
 tree_ssa_phiopt (void)
 {
+  return tree_ssa_phiopt_worker (false);
+}
+
+static unsigned int
+tree_ssa_cs_elim (void)
+{
+  return tree_ssa_phiopt_worker (true);
+}
+
+static tree condstoretemp;
+
+static unsigned int
+tree_ssa_phiopt_worker (bool do_store_elim)
+{
   basic_block bb;
   basic_block *bb_order;
   unsigned n, i;
   bool cfgchanged = false;
 
+  if (do_store_elim)
+    condstoretemp = NULL_TREE;
+
   /* Search every basic block for COND_EXPR we may be able to optimize.
 
      We walk the blocks in order that guarantees that a block with
@@ -211,36 +229,56 @@ tree_ssa_phiopt (void)
           || single_pred (bb1) != bb)
 	continue;
 
-      phi = phi_nodes (bb2);
-
-      /* Check to make sure that there is only one PHI node.
-         TODO: we could do it with more than one iff the other PHI nodes
-	 have the same elements for these two edges.  */
-      if (!phi || PHI_CHAIN (phi) != NULL)
-	continue;
-
-      arg0 = PHI_ARG_DEF_TREE (phi, e1->dest_idx);
-      arg1 = PHI_ARG_DEF_TREE (phi, e2->dest_idx);
+      if (do_store_elim)
+	{
+	  /* bb1 is the middle block, bb2 the join block, bb the split block
+	     e1 the fallthrough edge from bb1 to bb2.  We can't do the
+	     optimization if the join block has more than two predecessors.  */
+	  if (EDGE_COUNT (bb2->preds) > 2)
+	    continue;
+	  if (cond_store_replacement (bb1, bb2, e1, e2))
+	    cfgchanged = true;
+	}
+      else
+	{
+	  phi = phi_nodes (bb2);
 
-      /* Something is wrong if we cannot find the arguments in the PHI
-	 node.  */
-      gcc_assert (arg0 != NULL && arg1 != NULL);
-
-      /* Do the replacement of conditional if it can be done.  */
-      if (conditional_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
-	cfgchanged = true;
-      else if (value_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
-	cfgchanged = true;
-      else if (abs_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
-	cfgchanged = true;
-      else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
-	cfgchanged = true;
+	  /* Check to make sure that there is only one PHI node.
+	     TODO: we could do it with more than one iff the other PHI nodes
+	     have the same elements for these two edges.  */
+	  if (!phi || PHI_CHAIN (phi) != NULL)
+	    continue;
+
+	  arg0 = PHI_ARG_DEF_TREE (phi, e1->dest_idx);
+	  arg1 = PHI_ARG_DEF_TREE (phi, e2->dest_idx);
+
+	  /* Something is wrong if we cannot find the arguments in the PHI
+	     node.  */
+	  gcc_assert (arg0 != NULL && arg1 != NULL);
+
+	  /* Do the replacement of conditional if it can be done.  */
+	  if (conditional_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
+	    cfgchanged = true;
+	  else if (value_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
+	    cfgchanged = true;
+	  else if (abs_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
+	    cfgchanged = true;
+	  else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
+	    cfgchanged = true;
+	}
     }
 
   free (bb_order);
   
-  /* If the CFG has changed, we should cleanup the CFG. */
-  return cfgchanged ? TODO_cleanup_cfg : 0;
+  /* If the CFG has changed, we should cleanup the CFG.  */
+  if (cfgchanged && do_store_elim)
+    {
+      bsi_commit_edge_inserts ();
+      return TODO_cleanup_cfg | TODO_update_ssa_only_virtuals;
+    }
+  else if (cfgchanged)
+    return TODO_cleanup_cfg;
+  return 0;
 }
 
 /* Returns the list of basic blocks in the function in an order that guarantees
@@ -991,6 +1029,72 @@ abs_replacement (basic_block cond_bb, ba
   return true;
 }
 
+static bool
+cond_store_replacement (basic_block middle_bb, basic_block join_bb,
+			edge e0, edge e1)
+{
+  tree assign = last_and_only_stmt (middle_bb);
+  tree lhs, rhs, newexpr, name;
+  tree newphi;
+  block_stmt_iterator bsi;
+
+  /* Check if bb1 contains of only one store.  */
+  if (!assign
+      || TREE_CODE (assign) != GIMPLE_MODIFY_STMT)
+    return false;
+
+  lhs = GIMPLE_STMT_OPERAND (assign, 0);
+  if (!INDIRECT_REF_P (lhs))
+    return false;
+  rhs = GIMPLE_STMT_OPERAND (assign, 1);
+  if (TREE_CODE (rhs) != SSA_NAME)
+    return false;
+  /* Prove that we can move the store down.  */
+  if (!TREE_THIS_NOTRAP (lhs))
+    return false;
+    
+  mark_symbols_for_renaming (assign);
+  bsi = bsi_for_stmt (assign);
+  bsi_remove (&bsi, true);
+
+  /* Build and insert the assignment of the end result to the temporary
+     that we will return.  */
+  if (!condstoretemp || TREE_TYPE (lhs) != TREE_TYPE (condstoretemp))
+    {
+      condstoretemp = create_tmp_var (TREE_TYPE (lhs), "cstore");
+      get_var_ann (condstoretemp);
+    }
+
+  add_referenced_var (condstoretemp);
+
+  lhs = unshare_expr (lhs);
+  newexpr = build_gimple_modify_stmt (condstoretemp, lhs);
+  name = make_ssa_name (condstoretemp, newexpr);
+  GIMPLE_STMT_OPERAND (newexpr, 0) = name;
+  mark_symbols_for_renaming (newexpr);
+  bsi_insert_on_edge (e1, newexpr);
+
+  newphi = create_phi_node (condstoretemp, join_bb);
+  add_phi_arg (newphi, rhs, e0);
+  add_phi_arg (newphi, name, e1);
+
+  lhs = unshare_expr (lhs);
+  newexpr = build_gimple_modify_stmt (lhs, PHI_RESULT (newphi));
+  mark_symbols_for_renaming (newexpr);
+
+  bsi = bsi_start (join_bb);
+  while (!bsi_end_p (bsi) && TREE_CODE (bsi_stmt (bsi)) == LABEL_EXPR)
+    bsi_next (&bsi);
+  if (bsi_end_p (bsi))
+    {
+      bsi = bsi_last (join_bb);
+      bsi_insert_after (&bsi, newexpr, BSI_NEW_STMT);
+    }
+  else
+    bsi_insert_before (&bsi, newexpr, BSI_NEW_STMT);
+
+  return true;
+}
 
 /* Always do these optimizations if we have SSA
    trees to work on.  */
@@ -1020,3 +1124,24 @@ struct tree_opt_pass pass_phiopt =
     | TODO_verify_stmts,		/* todo_flags_finish */
   0					/* letter */
 };
+
+struct tree_opt_pass pass_cselim =
+{
+  "cselim",				/* name */
+  gate_phiopt,				/* gate */
+  tree_ssa_cs_elim,			/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_TREE_PHIOPT,			/* tv_id */
+  PROP_cfg | PROP_ssa | PROP_alias,	/* properties_required */
+  0,					/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  TODO_dump_func
+    | TODO_ggc_collect
+    | TODO_verify_ssa
+    | TODO_verify_flow
+    | TODO_verify_stmts,		/* todo_flags_finish */
+  0					/* letter */
+};



More information about the Gcc-patches mailing list