This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[patch] PR39077


Hi,

RTL PRE is based on lazy code motion, an algorithm that automatically
handles partial-partial redundancies (partially available, partially
anticipated). This can lead to the inserting of O(N_incoming_edges**2)
new expressions i.e. code explosion,

The attached patch uses a big hammer to fix this problem. In Jeff's
words: "It's overkill, but overkill I can live with".

Bootstrapped&tested on ia64-unknown-linux-gnu. I have no suitable test
case for the test suite, unfortunately.
OK for trunk?

Ciao!
Steven

 	* params.def (PARAM_MAX_GCSE_PRE_INCOMING_EDGES: New param.
	* params.h (MAX_GCSE_PRE_INCOMING_EDGES): Shorthand for it.
	* doc/invoke.texi: Document it.
	* gcse.c (compute_pre_data): Make all expressions not anticpated
	and not transparent if there are more incoming edges than the
	value of the new param.

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 156926)
+++ doc/invoke.texi	(working copy)
@@ -7913,6 +7913,16 @@ order to perform the global common subex
 optimization.  If more memory than specified is required, the
 optimization will not be done.

+@item max-gcse-pre-incoming-edges
+If a basic block has more incoming edges than the value of this
+parameter, then RTL PRE will not insert new computations in the
+predecessors of that basic block. These insertions may be necessary
+to eliminate a partial redundant computation, but in unusual
+cases (when a computation is partially available and partially
+anticipated) this can lead to an explosion of inserted computations
+and severe pessimization of the code
+The default value is 20.
+
 @item max-pending-list-length
 The maximum number of pending dependencies scheduling will allow
 before flushing the current state and starting over.  Large functions
Index: params.h
===================================================================
--- params.h	(revision 156926)
+++ params.h	(working copy)
@@ -121,6 +121,8 @@ typedef enum compiler_param
   PARAM_VALUE (PARAM_MAX_PENDING_LIST_LENGTH)
 #define MAX_GCSE_MEMORY \
   ((size_t) PARAM_VALUE (PARAM_MAX_GCSE_MEMORY))
+#define MAX_GCSE_PRE_INCOMING_EDGES \
+  ((unsigned) PARAM_VALUE (PARAM_MAX_GCSE_PRE_INCOMING_EDGES))
 #define GCSE_AFTER_RELOAD_PARTIAL_FRACTION \
   PARAM_VALUE (PARAM_GCSE_AFTER_RELOAD_PARTIAL_FRACTION)
 #define GCSE_AFTER_RELOAD_CRITICAL_FRACTION \
Index: gcse.c
===================================================================
--- gcse.c	(revision 156926)
+++ gcse.c	(working copy)
@@ -3269,18 +3269,31 @@ compute_pre_data (void)
       edge e;
       edge_iterator ei;

-      /* If the current block is the destination of an abnormal edge, we
-	 kill all trapping expressions because we won't be able to properly
-	 place the instruction on the edge.  So make them neither
-	 anticipatable nor transparent.  This is fairly conservative.  */
-      FOR_EACH_EDGE (e, ei, bb->preds)
-	if (e->flags & EDGE_ABNORMAL)
-	  {
-	    sbitmap_difference (antloc[bb->index], antloc[bb->index], trapping_expr);
-	    sbitmap_difference (transp[bb->index], transp[bb->index], trapping_expr);
-	    break;
-	  }
-
+      if (EDGE_COUNT (bb->preds) > MAX_GCSE_PRE_INCOMING_EDGES)
+	{
+	  /* If a block has a large number of incoming edges, then inserting
+	     many expressions in the predecessors to make one/few expression
+	     fully redundant is probably not a profitable transformation.
+	     Make all expressions non-anticipatable and non-transparent.  */
+	  sbitmap_zero (antloc[bb->index]);
+	  sbitmap_zero (transp[bb->index]);
+	}
+      else
+	{
+	  /* If the current block is the destination of an abnormal edge, we
+	     kill all trapping expressions because we won't be able to properly
+	     place the instruction on the edge.  So make them neither
+	     anticipatable nor transparent.  This is fairly conservative.  */
+	  FOR_EACH_EDGE (e, ei, bb->preds)
+	    if (e->flags & EDGE_ABNORMAL)
+	      {
+		sbitmap_difference (antloc[bb->index], antloc[bb->index],
+				    trapping_expr);
+		sbitmap_difference (transp[bb->index], transp[bb->index],
+				    trapping_expr);
+		break;
+	      }
+	}
       sbitmap_a_or_b (ae_kill[bb->index], transp[bb->index], comp[bb->index]);
       sbitmap_not (ae_kill[bb->index], ae_kill[bb->index]);
     }
Index: params.def
===================================================================
--- params.def	(revision 156926)
+++ params.def	(working copy)
@@ -202,6 +202,16 @@ DEFPARAM(PARAM_MAX_GCSE_MEMORY,
 	 "The maximum amount of memory to be allocated by GCSE",
 	 50 * 1024 * 1024, 0, 0)

+/* If a basic block has more incoming edges than the value of  this
+   parameter, then no insertings will be done in predecessors to make
+   a partial redundancy fully redundant.
+   This is to avoid inserting O(incoming_edges^2) insertions, see
+   e.g. PR 39077.  */
+DEFPARAM(PARAM_MAX_GCSE_PRE_INCOMING_EDGES,
+	 "max-gcse-pre-incoming-edges",
+	 "The maximum number of predecessors that PRE is allowed to insert
new computations into",
+	 20, 0, 0)
+
 /* This is the threshold ratio when to perform partial redundancy
    elimination after reload. We perform partial redundancy elimination
    when the following holds:


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]