[PATCH][RFC] final-value replacement from DCE

Richard Biener rguenther@suse.de
Wed May 29 13:39:00 GMT 2019


The following tries to address PR90648 by performing final
value replacement from DCE when DCE knows the final value
computation is not used during loop iteration.  This fits
neatly enough into existing tricks performed by DCE like
removing unused malloc/free pairs.

There's a few complications, one is it fails to bootstrap
because it exposes a few uninit warning false positives,
another is that -fno-tree-sccp is no longer effective.
As written this turns gcc.dg/pr34027-1.c into a division
again (I did not copy the expression_expensive checking).
It seems to also need -ftrapv adjustements (gcc.dg/pr81661.c).

The goal of this patch is to remove the SCCP pass, or rather
us unconditionally replacing loop-closed PHIs with final
value computations which we've got complaints in the past
already that it duplicates computation that is readily
available.  I've not yet figured testsuite fallout from that
change.

For the -fno-tree-sccp I consider to simply honor that
flag in the DCE path, for the gcc.dg/pr34027-1.c I'll
re-install the expression_expensive checking.  I'll
also fix the -ftrapv issue.

Does this otherwise look a sensible way forward?

Thanks,
Richard.

FAIL: gcc.dg/builtin-object-size-1.c execution test
FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort
FAIL: gcc.dg/pr34027-1.c scan-tree-dump-times optimized " / " 0
FAIL: gcc.dg/pr81661.c (internal compiler error)
FAIL: gcc.dg/pr81661.c (test for excess errors)
XPASS: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized " \\\\+ " 0
FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "if " 1
FAIL: gcc.dg/tree-ssa/loop-26.c scan-tree-dump-times optimized "if" 2
FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized " / " 0
FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized "if" 6
FAIL: gcc.dg/tree-ssa/pr64183.c scan-tree-dump cunroll "Loop 2 iterates at most 3 times"
FAIL: gcc.dg/tree-ssa/ssa-pre-3.c scan-tree-dump-times pre "Eliminated: 2" 1
FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-3.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-4.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-5.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-11.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-13.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-14.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-15.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-18.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-2.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
FAIL: gcc.dg/vect/no-scevccp-outer-20.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-3.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-5.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-6-global.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-6.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-7.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-8.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-vect-iv-1.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1
FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect "vectorized 1 loops" 1

Running target unix//-m32
FAIL: gcc.dg/builtin-object-size-1.c execution test
FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort
FAIL: gcc.dg/pr34027-1.c scan-tree-dump-times optimized " / " 0
FAIL: gcc.dg/pr81661.c (internal compiler error)
FAIL: gcc.dg/pr81661.c (test for excess errors)
XPASS: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized " \\\\+ " 0
FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "if " 1
FAIL: gcc.dg/tree-ssa/loop-26.c scan-tree-dump-times optimized "if" 2
FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized " / " 0
FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized "if" 6
FAIL: gcc.dg/tree-ssa/pr64183.c scan-tree-dump cunroll "Loop 2 iterates at most 3 times"
FAIL: gcc.dg/tree-ssa/ssa-pre-3.c scan-tree-dump-times pre "Eliminated: 2" 1
FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-3.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-4.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-5.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-11.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-13.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-14.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-15.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-18.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-2.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
FAIL: gcc.dg/vect/no-scevccp-outer-20.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-3.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-5.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-6-global.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-6.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-7.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-8.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-vect-iv-1.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1
FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.target/i386/pr85073.c scan-assembler-times test 1

2019-05-29  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/90594
	* tree-scalar-evolution.h (final_value_replacement): Declare.
	* tree-scalar-evolution.c (final_value_replacement): Split out
	worker on one PHI from ...
	(final_value_replacement_loop): ... here.
	* gimple-fold.h (rewrite_seq_to_defined_overflow): Declare.
	* gimple-fold.c (rewrite_seq_to_defined_overflow): Wrap
	around rewrite_to_defined_overflow to handle a whole sequence.
	* tree-ssa-dce.c (lcphi_map): New.
	(mark_operands_necessary): New walk_tree worker.
	(propagate_necessity): Elide defs of LC PHIs we can perform value
	replacement on.
	(remove_dead_phis): Perform value replacement on LC PHIs that
	had their def removed.
	(perform_tree_ssa_dce): Go into LC SSA in aggressive mode.
	Precompute final value replacements.

	* gcc.dg/tree-ssa/loop-14.c: Adjust for new expected place
	for transform.

Index: gcc/tree-scalar-evolution.c
===================================================================
--- gcc/tree-scalar-evolution.c	(revision 271644)
+++ gcc/tree-scalar-evolution.c	(working copy)
@@ -3601,6 +3601,35 @@ expression_expensive_p (tree expr)
     }
 }
 
+tree
+final_value_replacement (gphi *phi, edge exit, bool *folded_casts)
+{
+  struct loop *ex_loop
+    = superloop_at_depth (exit->src->loop_father,
+			  loop_depth (exit->dest->loop_father) + 1);
+
+  tree def = PHI_ARG_DEF_FROM_EDGE (phi, exit);
+  if (virtual_operand_p (def))
+    return NULL_TREE;
+
+  if (!POINTER_TYPE_P (TREE_TYPE (def))
+      && !INTEGRAL_TYPE_P (TREE_TYPE (def)))
+    return NULL_TREE;
+
+  *folded_casts = false;
+  def = analyze_scalar_evolution_in_loop (ex_loop, exit->src->loop_father, def,
+					  folded_casts);
+  def = compute_overall_effect_of_inner_loop (ex_loop, def);
+  if (!tree_does_not_contain_chrecs (def)
+      || chrec_contains_symbols_defined_in_loop (def, ex_loop->num)
+      /* Moving the computation from the loop may prolong life range
+	 of some ssa names, which may cause problems if they appear
+	 on abnormal edges.  */
+      || contains_abnormal_ssa_name_p (def))
+    return NULL_TREE;
+  return def;
+}
+
 /* Do final value replacement for LOOP, return true if we did anything.  */
 
 bool
Index: gcc/tree-scalar-evolution.h
===================================================================
--- gcc/tree-scalar-evolution.h	(revision 271644)
+++ gcc/tree-scalar-evolution.h	(working copy)
@@ -34,6 +34,7 @@ extern tree instantiate_scev (edge, stru
 extern tree resolve_mixers (struct loop *, tree, bool *);
 extern void gather_stats_on_scev_database (void);
 extern bool final_value_replacement_loop (struct loop *);
+extern tree final_value_replacement (gphi *, edge, bool *);
 extern unsigned int scev_const_prop (void);
 extern bool expression_expensive_p (tree);
 extern bool simple_iv_with_niters (struct loop *, struct loop *, tree,
Index: gcc/gimple-fold.c
===================================================================
--- gcc/gimple-fold.c	(revision 271644)
+++ gcc/gimple-fold.c	(working copy)
@@ -7381,6 +7381,29 @@ rewrite_to_defined_overflow (gimple *stm
   return stmts;
 }
 
+/* Wrapper around rewrite_to_defined_overflow rewriting STMTS in-place.  */
+
+void
+rewrite_seq_to_defined_overflow (gimple_seq *stmts)
+{
+  gimple_stmt_iterator gsi = gsi_start (*stmts);
+  while (!gsi_end_p (gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+      if (is_gimple_assign (stmt)
+	  && arith_code_with_undefined_signed_overflow
+		(gimple_assign_rhs_code (stmt)))
+	{
+	  gsi_remove (&gsi, false);
+	  gsi_insert_seq_before (&gsi,
+				 rewrite_to_defined_overflow (stmt),
+				 GSI_SAME_STMT);
+	}
+      else
+	gsi_next (&gsi);
+    }
+}
+
 
 /* The valueization hook we use for the gimple_build API simplification.
    This makes us match fold_buildN behavior by only combining with
Index: gcc/gimple-fold.h
===================================================================
--- gcc/gimple-fold.h	(revision 271644)
+++ gcc/gimple-fold.h	(working copy)
@@ -60,6 +60,7 @@ extern bool gimple_fold_builtin_sprintf
 extern bool gimple_fold_builtin_snprintf (gimple_stmt_iterator *);
 extern bool arith_code_with_undefined_signed_overflow (tree_code);
 extern gimple_seq rewrite_to_defined_overflow (gimple *);
+extern void rewrite_seq_to_defined_overflow (gimple_seq *);
 extern void replace_call_with_value (gimple_stmt_iterator *, tree);
 extern tree tree_vec_extract (gimple_stmt_iterator *, tree, tree, tree, tree);
 
Index: gcc/tree-ssa-dce.c
===================================================================
--- gcc/tree-ssa-dce.c	(revision 271644)
+++ gcc/tree-ssa-dce.c	(working copy)
@@ -67,6 +67,8 @@ along with GCC; see the file COPYING3.
 #include "tree-scalar-evolution.h"
 #include "tree-ssa-propagate.h"
 #include "gimple-fold.h"
+#include "tree-ssa-loop-manip.h"
+#include "gimplify-me.h"
 
 static struct stmt_stats
 {
@@ -114,6 +116,9 @@ static bool cfg_altered;
 /* When non-NULL holds map from basic block index into the postorder.  */
 static int *bb_postorder;
 
+/* Map from loop-closed PHI argument to final value replacement tree.  */
+hash_map<tree, std::pair<tree, bool> > *lcphi_map;
+
 
 /* If STMT is not already marked necessary, mark it, and add it to the
    worklist if ADD_TO_WORKLIST is true.  */
@@ -181,6 +186,19 @@ mark_operand_necessary (tree op)
   worklist.safe_push (stmt);
 }
 
+/* Helper for walk_tree calling mark_operand_necessary.  */
+
+static tree
+mark_operands_necessary (tree *tp, int *walk_subtrees, void *)
+{
+  if (EXPR_P (*tp))
+    return NULL_TREE;
+  if (TREE_CODE (*tp) == SSA_NAME)
+    mark_operand_necessary (*tp);
+  *walk_subtrees = 0;
+  return NULL_TREE;
+}
+
 
 /* Mark STMT as necessary if it obviously is.  Add it to the worklist if
    it can make other statements necessary.
@@ -657,6 +675,22 @@ propagate_necessity (bool aggressive)
 	  gphi *phi = as_a <gphi *> (stmt);
 	  size_t k;
 
+	  /* There's the special exception of loop-closed PHI nodes
+	     which we might be able to elide.  Do not mark those.  */
+	  if (aggressive
+	      && gimple_phi_num_args (stmt) == 1)
+	    {
+	      tree def = gimple_phi_arg_def (stmt, 0);
+	      if (std::pair<tree, bool> *repl = lcphi_map->get (def))
+		{
+		  /* Now we can elide marking DEF as well as control
+		     dependences.  But we have to mark the replacement
+		     uses instead, else those may get eliminated.  */
+		  walk_tree (&repl->first, mark_operands_necessary, NULL, NULL);
+		  continue;
+		}
+	    }
+
 	  for (k = 0; k < gimple_phi_num_args (stmt); k++)
             {
 	      tree arg = PHI_ARG_DEF (stmt, k);
@@ -975,6 +1009,43 @@ remove_dead_phis (basic_block bb)
 	  continue;
 	}
 
+      /* If we have a necessary PHI node that got its single operand elided
+         perform final value replacement.  */
+      tree arg0 = gimple_phi_arg_def (phi, 0);
+      if (lcphi_map
+	  && gimple_phi_num_args (phi) == 1
+	  && TREE_CODE (arg0) == SSA_NAME
+	  && !SSA_NAME_IS_DEFAULT_DEF (arg0)
+	  && !gimple_plf (SSA_NAME_DEF_STMT (arg0), STMT_NECESSARY))
+	{
+	  something_changed = true;
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      fprintf (dump_file, "Performing final value replacement: ");
+	      print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
+	      fprintf (dump_file, "\n");
+	    }
+
+	  tree rslt = PHI_RESULT (phi);
+	  std::pair<tree, bool> *repl = lcphi_map->get (arg0);
+	  gcc_assert (repl);
+
+	  remove_phi_node (&gsi, false);
+	  stats.removed_phis++;
+
+	  gimple_seq stmts = NULL;
+	  tree nw = force_gimple_operand (unshare_expr (repl->first),
+					  &stmts, false, NULL_TREE);
+	  if (repl->second)
+	    rewrite_seq_to_defined_overflow (&stmts);
+	  gassign *ass = gimple_build_assign (rslt, nw);
+	  gimple_set_plf (ass, STMT_NECESSARY, true);
+	  gimple_seq_add_stmt_without_update (&stmts, ass);
+	  gimple_stmt_iterator gsi2 = gsi_after_labels (bb);
+	  gsi_insert_seq_before (&gsi2, stmts, GSI_SAME_STMT);
+	  continue;
+	}
+
       gsi_next (&gsi);
     }
   return something_changed;
@@ -1559,6 +1630,7 @@ perform_tree_ssa_dce (bool aggressive)
       scev_initialize ();
       loop_optimizer_init (LOOPS_NORMAL
 			   | LOOPS_HAVE_RECORDED_EXITS);
+      rewrite_into_loop_closed_ssa (NULL, 0);
     }
 
   tree_dce_init (aggressive);
@@ -1574,6 +1646,29 @@ perform_tree_ssa_dce (bool aggressive)
       bitmap_clear (visited_control_parents);
 
       mark_dfs_back_edges ();
+
+      lcphi_map = new hash_map<tree, std::pair<tree, bool> > ();
+      struct loop *loop;
+      FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
+	{
+	  if (edge exit = single_exit (loop))
+	    /* When we currently handle a single PHI arg only, we'd
+	       also not be able to insert into the destination block
+	       and thus had to fixup loop-closed SSA form.  */
+	    if (single_pred_p (exit->dest))
+	      {
+		for (gphi_iterator psi = gsi_start_phis (exit->dest);
+		     !gsi_end_p (psi); gsi_next (&psi))
+		  {
+		    gphi *phi = psi.phi ();
+		    tree def = PHI_ARG_DEF_FROM_EDGE (phi, exit);
+		    bool folded_casts;
+		    if (tree repl = final_value_replacement (phi, exit,
+							     &folded_casts))
+		      lcphi_map->put (def, std::make_pair (repl, folded_casts));
+		  }
+	      }
+	}
     }
 
   find_obviously_necessary_stmts (aggressive);
@@ -1604,6 +1699,12 @@ perform_tree_ssa_dce (bool aggressive)
   if (cfg_altered)
     free_dominance_info (CDI_DOMINATORS);
 
+  if (aggressive)
+    {
+      delete lcphi_map;
+      lcphi_map = NULL;
+    }
+
   statistics_counter_event (cfun, "Statements deleted", stats.removed);
   statistics_counter_event (cfun, "PHI nodes deleted", stats.removed_phis);
 
Index: gcc/testsuite/gcc.dg/tree-ssa/loop-14.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/loop-14.c	(revision 271644)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-14.c	(working copy)
@@ -1,6 +1,6 @@
 /* A test for final value replacement.  */
 
-/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fdump-tree-cddce1" } */
 
 int foo(void);
 
@@ -15,4 +15,4 @@ int bla(void)
   return j;
 }
 
-/* { dg-final { scan-tree-dump-times "\\+ 100" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\+ 100" 1 "cddce1" } } */



More information about the Gcc-patches mailing list