This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[patch] PR39077
- From: Steven Bosscher <stevenb dot gcc at gmail dot com>
- To: Jeff Law <law at redhat dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Sat, 20 Feb 2010 18:43:27 +0100
- Subject: [patch] PR39077
Hi,
RTL PRE is based on lazy code motion, an algorithm that automatically
handles partial-partial redundancies (partially available, partially
anticipated). This can lead to the inserting of O(N_incoming_edges**2)
new expressions i.e. code explosion,
The attached patch uses a big hammer to fix this problem. In Jeff's
words: "It's overkill, but overkill I can live with".
Bootstrapped&tested on ia64-unknown-linux-gnu. I have no suitable test
case for the test suite, unfortunately.
OK for trunk?
Ciao!
Steven
* params.def (PARAM_MAX_GCSE_PRE_INCOMING_EDGES: New param.
* params.h (MAX_GCSE_PRE_INCOMING_EDGES): Shorthand for it.
* doc/invoke.texi: Document it.
* gcse.c (compute_pre_data): Make all expressions not anticpated
and not transparent if there are more incoming edges than the
value of the new param.
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi (revision 156926)
+++ doc/invoke.texi (working copy)
@@ -7913,6 +7913,16 @@ order to perform the global common subex
optimization. If more memory than specified is required, the
optimization will not be done.
+@item max-gcse-pre-incoming-edges
+If a basic block has more incoming edges than the value of this
+parameter, then RTL PRE will not insert new computations in the
+predecessors of that basic block. These insertions may be necessary
+to eliminate a partial redundant computation, but in unusual
+cases (when a computation is partially available and partially
+anticipated) this can lead to an explosion of inserted computations
+and severe pessimization of the code
+The default value is 20.
+
@item max-pending-list-length
The maximum number of pending dependencies scheduling will allow
before flushing the current state and starting over. Large functions
Index: params.h
===================================================================
--- params.h (revision 156926)
+++ params.h (working copy)
@@ -121,6 +121,8 @@ typedef enum compiler_param
PARAM_VALUE (PARAM_MAX_PENDING_LIST_LENGTH)
#define MAX_GCSE_MEMORY \
((size_t) PARAM_VALUE (PARAM_MAX_GCSE_MEMORY))
+#define MAX_GCSE_PRE_INCOMING_EDGES \
+ ((unsigned) PARAM_VALUE (PARAM_MAX_GCSE_PRE_INCOMING_EDGES))
#define GCSE_AFTER_RELOAD_PARTIAL_FRACTION \
PARAM_VALUE (PARAM_GCSE_AFTER_RELOAD_PARTIAL_FRACTION)
#define GCSE_AFTER_RELOAD_CRITICAL_FRACTION \
Index: gcse.c
===================================================================
--- gcse.c (revision 156926)
+++ gcse.c (working copy)
@@ -3269,18 +3269,31 @@ compute_pre_data (void)
edge e;
edge_iterator ei;
- /* If the current block is the destination of an abnormal edge, we
- kill all trapping expressions because we won't be able to properly
- place the instruction on the edge. So make them neither
- anticipatable nor transparent. This is fairly conservative. */
- FOR_EACH_EDGE (e, ei, bb->preds)
- if (e->flags & EDGE_ABNORMAL)
- {
- sbitmap_difference (antloc[bb->index], antloc[bb->index], trapping_expr);
- sbitmap_difference (transp[bb->index], transp[bb->index], trapping_expr);
- break;
- }
-
+ if (EDGE_COUNT (bb->preds) > MAX_GCSE_PRE_INCOMING_EDGES)
+ {
+ /* If a block has a large number of incoming edges, then inserting
+ many expressions in the predecessors to make one/few expression
+ fully redundant is probably not a profitable transformation.
+ Make all expressions non-anticipatable and non-transparent. */
+ sbitmap_zero (antloc[bb->index]);
+ sbitmap_zero (transp[bb->index]);
+ }
+ else
+ {
+ /* If the current block is the destination of an abnormal edge, we
+ kill all trapping expressions because we won't be able to properly
+ place the instruction on the edge. So make them neither
+ anticipatable nor transparent. This is fairly conservative. */
+ FOR_EACH_EDGE (e, ei, bb->preds)
+ if (e->flags & EDGE_ABNORMAL)
+ {
+ sbitmap_difference (antloc[bb->index], antloc[bb->index],
+ trapping_expr);
+ sbitmap_difference (transp[bb->index], transp[bb->index],
+ trapping_expr);
+ break;
+ }
+ }
sbitmap_a_or_b (ae_kill[bb->index], transp[bb->index], comp[bb->index]);
sbitmap_not (ae_kill[bb->index], ae_kill[bb->index]);
}
Index: params.def
===================================================================
--- params.def (revision 156926)
+++ params.def (working copy)
@@ -202,6 +202,16 @@ DEFPARAM(PARAM_MAX_GCSE_MEMORY,
"The maximum amount of memory to be allocated by GCSE",
50 * 1024 * 1024, 0, 0)
+/* If a basic block has more incoming edges than the value of this
+ parameter, then no insertings will be done in predecessors to make
+ a partial redundancy fully redundant.
+ This is to avoid inserting O(incoming_edges^2) insertions, see
+ e.g. PR 39077. */
+DEFPARAM(PARAM_MAX_GCSE_PRE_INCOMING_EDGES,
+ "max-gcse-pre-incoming-edges",
+ "The maximum number of predecessors that PRE is allowed to insert
new computations into",
+ 20, 0, 0)
+
/* This is the threshold ratio when to perform partial redundancy
elimination after reload. We perform partial redundancy elimination
when the following holds: