[GCC 9][RFC][PATCH] Optimize PHIs with constant arguments better

Jeff Law law@redhat.com
Thu Nov 30 23:38:00 GMT 2017



This addresses some code quality regressions with some work scheduled
for gcc-9.  However, if anyone is aware of BZs that could be helped with
this patch, don't hesitate to pass them along.  Depending on their
severity we could always re-evaluate the plan for this patch.

The patch optimizes sequences like this:


 # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
 _30 = (unsigned char) iftmp.4_16;

ie, we have a PHI where all its arguments are known (differing)
constants.  The PHI feeds an operation which can be compile-time
evaluated for all the PHI's arguments.

This arises most often where the result of the PHI is type converted via
 NOP_EXPR.  But it also happens for a boolean comparisons, arithmetic
and logicals and other type conversions.

We can optimize away the statement by creating a new PHI node holding
the result of the statement applied to each of the original PHI's
constant operands.  So for the fragment above we'd generate:

 # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
 # _30 = PHI <0(7), 1(8), 0(9)>

These kind of backward propagations can cascade.  Here's an example I
saw during analysis of test results:

 # m_5 = PHI <10(5), 12(6), 14(7)>
<L13>:
  _2 = (long unsigned int) m_5;
  _3 = _2 * 32;
  goto <bb 12>; [100.00%]

After this patch the two statements have been eliminated in favor of
generating PHIs with constant arguments:

And after PHI propagation we have:
  # m_5 = PHI <10(5), 12(6), 14(7)>
  # _2 = PHI <10(5), 12(6), 14(7)>
  # _3 = PHI <320(5), 384(6), 448(7)>
<L13>:
  goto <bb 12>; [100.00%]

DCE will come along and wipe out m_5 and _2 if they are unused.

I initially covered just NOP_EXPR in the proof of concept.   But it's
trivial to extend to cover other unaries, and binary/comparisons where
one operand was already a constant, so I did that.  This applies
surprisingly often during a bootstrap.  An earlier version (which didn't
handle multiple uses of the result of the PHI) triggered over 6000 times
during a bootstrap:

NOP_EXPR    5161
LT_EXPR      658
GE_EXPR      504
NE_EXPR      310
BIT_NOT_EXPR 295
BIT_AND_EXPR  94
PLUS_EXPR     77
NEGATE_EXPR   48
LSHIFT_EXPR   40
EQ_EXPR       34
GT_EXPR       16
BIT_IOR_EXPR  10
MINUS_EXPR     8

There's actually other cases we could handle were

x = PHI (a, 0);
y = a & x;

Turns into

x = PHI (a, 0);
y = PHI (a, 0);


I'm not sure how often these non-constant cases happen -- I haven't
tried to support this or gain any data on how often this kidn of case
might happen.

FWIW, there are cases were the PHI arguments are constant, but we still
can't simplify.  For example:


x = PHI (0, 1, 2)

y = 1234 / x;


When we process this we'll try to simplify 1234 / 0 to a constant which
fails.  We have to gracefully remove the partially initialized PHI node
in that case.  This is tested by existing tests in the suite.

You'll also see that certain tests in the suite such as pr61839_3,
ssa-pre-4.c are twiddled.  I've suppressed phi backwards propagation in
the original test so that it's not compromised.  THere's also a variant
of each test which verifies that the transformation is handled by phi
back propagation.

Bootstrapped and regression tested on x86, both in isolation and in a
larger patch series.

Will ping when gcc-9 opens for development.


Jeff

-------------- next part --------------
	* tree-ssa-phiprop.c (propagate_with_phi_1): Renamed from
	propagate_with_phi.
	(stmt_likely_produces_constant_result): New function.
	(fold_use_into_phi, propagate_with_phi_2):  Likewise.
	(pass_phiprop::execute): Corresponding changes.  Call
	propagate_with_phi_2.

	* gcc.dg/tree-ssa/pr61743-1.c: Use -fno-tree-phiprop.
	* gcc.dg/tree-ssa/pr61839_1.c: Likewise.
	* gcc.dg/tree-ssa/pr61839_3.c: Likewise.
	* gcc.dg/tree-ssa/ssa-pre-4.c: Likewise.

	* gcc.dg/tree-ssa/pr61743-1a.c: New test.
	* gcc.dg/tree-ssa/pr61839_1a.c: Likewise.
	* gcc.dg/tree-ssa/pr61839_3a.c: Likewise.
	* gcc.dg/tree-ssa/ssa-pre-4a.c: New test.

diff --git a/gcc/tree-ssa-phiprop.c b/gcc/tree-ssa-phiprop.c
index 494158b..f2ec2b3 100644
--- a/gcc/tree-ssa-phiprop.c
+++ b/gcc/tree-ssa-phiprop.c
@@ -259,8 +259,8 @@ chk_uses (tree, tree *idx, void *data)
    with aliasing issues as we are moving memory reads.  */
 
 static bool
-propagate_with_phi (basic_block bb, gphi *phi, struct phiprop_d *phivn,
-		    size_t n)
+propagate_with_phi_1 (basic_block bb, gphi *phi, struct phiprop_d *phivn,
+		      size_t n)
 {
   tree ptr = PHI_RESULT (phi);
   gimple *use_stmt;
@@ -440,6 +440,207 @@ next:;
   return phi_inserted;
 }
 
+/* Return TRUE if USE_STMT, which uses PHI_RESULT will likely produce
+   a constant result if PHI_RESULT is itself a constant.  Else return
+   FALSE.
+
+   It is safe for this routine to always return TRUE.  If the result
+   is not a constant it will be detected and properly handled later.
+
+   This is overly conservative.  If USE_STMT were to produce an SSA_NAME,
+   then that would still work for our purposes.  */
+
+static bool
+stmt_likely_produces_constant_result (gimple *use_stmt, tree phi_result)
+{
+  enum tree_code code = gimple_assign_rhs_code (use_stmt);
+  switch (TREE_CODE_CLASS (code))
+    {
+    /* With few exceptions, a unary operator applied to a constant
+       will result in a constant.  So they are OK without digging
+       deeper into the operator/operands.  */
+    case tcc_unary:
+      return true;
+
+    /* For binary and comparisons we'll generally be able to generate
+       a constant if only one operand is an SSA_NAME.  */
+    case tcc_binary:
+    case tcc_comparison:
+      {
+	tree rhs1 = gimple_assign_rhs1 (use_stmt);
+	tree rhs2 = gimple_assign_rhs2 (use_stmt);
+
+	/* We need to check for RES op CONST and CONST op RES.  Consider
+	   MINUS_EXPR or MIN/MAX where RES could be the first or the second
+	   argument.  */
+	if (rhs1 == phi_result
+	    && TREE_CODE_CLASS (TREE_CODE (rhs2)) == tcc_constant)
+	  return true;
+
+	if (rhs2 == phi_result
+	    && TREE_CODE_CLASS (TREE_CODE (rhs1)) == tcc_constant)
+	  return true;
+
+	/* We do not try to handle two SSA_NAME operands.  */
+	return false;
+      }
+
+    /* There might be some tcc_reference or tcc_exceptional types we
+       could handle with deeper investigation.  COND_EXPR comes to mind.  */
+    default:
+      return false;
+    }
+}
+
+/* PHI in block BB produces RESULT.  PHI_RESULT is used in USE_STMT.
+
+   All of PHIs arguments are simple constants.  See if propagating
+   the PHI arguments into USE_STMT produces a constant.  If that is
+   the case for all the PHI arguments, then STMT is replaced with a
+   new PHI and we return TRUE.  Else make no changes to the IL and
+   return FALSE.
+
+   When applied this transformation reduces runtime computations and
+   replaces them with either constant initializations.  This also enables
+   jump threading to occur earlier in the optimization pipeline.  */
+
+static bool
+fold_use_into_phi (gphi *phi, gimple *use_stmt,
+		   basic_block bb, tree phi_result)
+{
+  /* Now verify the use is an assigment to an SSA_NAME.  */
+  if (!is_gimple_assign (use_stmt)
+      || TREE_CODE (gimple_assign_lhs (use_stmt)) != SSA_NAME)
+    return false;
+
+  /* If USE_STMT does not likely produce a constant result, then
+     do not try to fold this use.  Note this is overly conservative
+     as we could handle USE_STMT simplifying to an SSA_NAME rather than
+     just constants.  */
+  if (!stmt_likely_produces_constant_result (use_stmt, phi_result))
+    return false;
+
+  /* Build a new PHI node to replace the definition of the USE_STMT's LHS.  */
+  gphi *new_phi = create_phi_node (NULL_TREE, bb);
+
+  /* Now fill in its arguments by applying the operator to each
+     argument of the original PHI.  */
+  edge e;
+  edge_iterator ei;
+  tree use_result = gimple_assign_lhs (use_stmt);
+  tree type = TREE_TYPE (use_result);
+  enum tree_code code = gimple_assign_rhs_code (use_stmt);
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      tree old_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
+      tree new_arg;
+
+      switch (TREE_CODE_CLASS (code))
+	{
+	case tcc_unary:
+	  new_arg = fold_unary_to_constant (code, type, old_arg);
+	  break;
+
+	case tcc_binary:
+	case tcc_comparison:
+	  {
+	    tree rhs1 = gimple_assign_rhs1 (use_stmt);
+	    tree rhs2 = gimple_assign_rhs2 (use_stmt);
+	  if (rhs1 == phi_result)
+	    new_arg = fold_binary (code, type,
+				   old_arg, gimple_assign_rhs2 (use_stmt));
+	  else if (rhs2 == phi_result)
+	    new_arg = fold_binary (code, type,
+				   gimple_assign_rhs1 (use_stmt), old_arg);
+	  else
+	    gcc_unreachable ();
+	  break;
+	  }
+
+	default:
+	  gcc_unreachable ();
+	}
+
+      /* We did not get a constant.  This can happen (imagine division
+	 by zero).  We have to remove the PHI from the IL, as the PHI
+	 is only partially constructed.  */
+      if (!new_arg)
+	{
+	  /* Set the PHI_RESULT to a new, throw-away SSA_NAME.  This avoids
+	     problems unlinking the "uses" of a currently NULL PHI_RESULT.  */
+	  gimple_phi_set_result (new_phi, make_ssa_name (type));
+
+	  /* Actually remove the PHI from the IL.  It is safe to remove the
+	     PHI if some of its PHI arguments are still uninitialized.  */
+	  gimple_stmt_iterator gsi = gsi_for_stmt (new_phi);
+	  remove_phi_node (&gsi, true);
+	  return false;
+	}
+
+      source_location locus = gimple_phi_arg_location_from_edge (phi, e);
+      add_phi_arg (new_phi, new_arg, e, locus);
+    }
+
+  gimple_phi_set_result (new_phi, use_result);
+
+  /* At this point we have our new PHI installed where we want it.
+     Delete the old PHI and USE_STMT.  */
+  gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+  gsi_remove (&gsi, true);
+  return true;
+}
+
+/* Look for sequences like this:
+
+    # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
+    _30 = (unsigned char) iftmp.4_16;
+
+   We can "hoist" the conversion into the PHI and get it for free, generating:
+
+     # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
+     # _30 = PHI <0(7), 1(8), 0(9)>
+
+   DCE will eliminate the first PHI.  In addition to getting the conversion
+   for free, removal of the conversion improves backwards threading.  */
+
+static bool
+propagate_with_phi_2 (basic_block bb, gphi *phi)
+{
+  /* First verify that all the arguments of the PHI are constants.
+
+     This is overly conservative.  Consider:
+
+       y = PHI (x, -1, 0);
+       z = y & x;
+
+     We could transform that into:
+
+       y = PHI (x, -1, 0);
+       z = PHI (x, x, 0);
+
+     It's not clear how important those cases are and we punt them
+     for now.  */
+  use_operand_p arg_p;
+  ssa_op_iter i;
+  FOR_EACH_PHI_ARG (arg_p, phi, i, SSA_OP_USE)
+    {
+      tree arg = USE_FROM_PTR (arg_p);
+      if (TREE_CODE_CLASS (TREE_CODE (arg)) != tcc_constant)
+	return false;
+    }
+
+  /* Not iterate over each immediate use to see if the statement can
+     be implemented as a PHI rather than a real statement.  */
+  gimple *use_stmt;
+  imm_use_iterator ui;
+  bool retval = false;
+  tree phi_result = PHI_RESULT (phi);
+  FOR_EACH_IMM_USE_STMT (use_stmt, ui, phi_result)
+    retval |= fold_use_into_phi (phi, use_stmt, bb, phi_result);
+
+  return retval;
+}
+
 /* Main entry for phiprop pass.  */
 
 namespace {
@@ -492,7 +693,10 @@ pass_phiprop::execute (function *fun)
 				  single_succ (ENTRY_BLOCK_PTR_FOR_FN (fun)));
   FOR_EACH_VEC_ELT (bbs, i, bb)
     for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-      did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n);
+      {
+          did_something |= propagate_with_phi_1 (bb, gsi.phi (), phivn, n);
+	  did_something |= propagate_with_phi_2 (bb, gsi.phi ());
+      }
 
   if (did_something)
     gsi_commit_edge_inserts ();

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
index a5c83cf..7771044 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -funroll-loops -fno-tree-vectorize -fdump-tree-cunroll-details -fdump-tree-cunrolli-details -fno-peel-loops" } */
+/* { dg-options "-O3 -fno-tree-phiprop -funroll-loops -fno-tree-vectorize -fdump-tree-cunroll-details -fdump-tree-cunrolli-details -fno-peel-loops" } */

 #define N 8
 #define M 14
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1a.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1a.c
new file mode 100644
index 0000000..dd64472
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1a.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -funroll-loops -fno-tree-vectorize -fdump-tree-cunroll-details -fdump-tree-cunrolli-details -fdump-tree-phiprop -fno-peel-loops" } */
+
+#include "pr61743-1.c"
+
+/* { dg-final { scan-tree-dump-times "PHI <(\?:320|384|448)\\(\[0-9\]+\\), (\?:320|384|448)\\(\[0-9\]+\\), (\?:320|384|448)\\(\[0-9\]+\\)>" 1 "phiprop"} } */
+
+/* The code with PHI back propagation is simpler and we unswitch more.  */
+/* { dg-final { scan-tree-dump-times "loop with 4 iterations completely unrolled" 10 "cunroll" } } */
+/* { dg-final { scan-tree-dump-times "loop with 9 iterations completely unrolled" 2 "cunrolli" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c
index 9f8168a..5bf1982 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/61839.  */
 /* { dg-do run } */
-/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fdump-tree-vrp1 -fno-tree-phiprop -fdump-tree-optimized" } */
 /* { dg-require-effective-target int32plus } */
 
 __attribute__ ((noinline))
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1a.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1a.c
new file mode 100644
index 0000000..99a9016
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1a.c
@@ -0,0 +1,11 @@
+/* PR tree-optimization/61839.  */
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-phiprop" } */
+/* { dg-require-effective-target int32plus } */
+
+#include "pr61839_1.c"
+
+/* Scan for c = 972195717) >> [0, 1] in function foo.  */
+/* { dg-final { scan-tree-dump-times "486097858 : 972195717" 1  "vrp1" } } */
+/* Scan for c = 972195717) >> [2, 3] in function bar.  */
+/* { dg-final { scan-tree-dump-times "PHI <(?:243048929|121524464)\\(\[0-9\]+\\), (?:243048929|121524464)\\(\[0-9\]+\\)>" 1 "phiprop"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c
index 5ceb073..ad908c0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/61839.  */
 /* { dg-do run } */
-/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fno-tree-phiprop -fdump-tree-vrp1 -fdump-tree-optimized" } */
 
 __attribute__ ((noinline))
 int foo (int a, unsigned b)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3a.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3a.c
new file mode 100644
index 0000000..3a4d800
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3a.c
@@ -0,0 +1,9 @@
+/* PR tree-optimization/61839.  */
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-phiprop -fdump-tree-optimized" } */
+#include "pr61839_3.c"
+
+/* Scan for a PHI with (12, 13) << 8 in function foo.  */
+/* { dg-final { scan-tree-dump-times "PHI <\(3072|3328\)\\(\[0-9\]+\\), \(3072|3328\)\\(\[0-9\
+]+\\)>" 1 "phiprop"} } */
+/* { dg-final { scan-tree-dump-times "3072" 0  "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
index 5b4fd54..0bc5f73 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fno-tree-phiprop -fdump-tree-pre-stats" } */
 int foo(void)
 {
 	int x, c, y;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4a.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4a.c
new file mode 100644
index 0000000..70e8b16
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4a.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-phiprop" } */
+#include "ssa-pre-4.c"
+/* We should eliminate the x+1 computation from this routine, replacing
+   it with a phi of 3, 4 */
+/* { dg-final { scan-tree-dump-times "PHI <\[34\]\\(\[0-9\]+\\), \[34\]\\(\[0-9\]+\\)>" 1 "phiprop"} } */


More information about the Gcc-patches mailing list