This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Quadratic behavior due to tree-cfg.c:tree_redirect_edge_and_branch
- From: Steven Bosscher <stevenb at suse dot de>
- To: gcc at gcc dot gnu dot org
- Date: Sun, 19 Sep 2004 22:32:04 +0200
- Subject: Quadratic behavior due to tree-cfg.c:tree_redirect_edge_and_branch
- Organization: SUSE Labs
Hmm,
So I fixed one example of this interesting O(N^2) behavior in DOM
(PR15524), only to find that we do the same thing in other places.
Say we have a very large switch statement. We represent switches
as a condition and a vector of labels indexed by the runtime value
of the condition. But every now and then we can determine the
condition at compile time and we'll try to optimize the switch away,
or we want to split all critical edges, so we may have to replace
the label in the label vector with something else.
Take for example tree-cfg.c:split_critical_edges:
FOR_ALL_BB (bb)
{
for (e = bb->succ; e ; e = e->succ_next)
if (EDGE_CRITICAL_P (e) && !(e->flags & EDGE_ABNORMAL))
{
split_edge (e);
}
}
In other words, walk all edges and split the critical ones. Now,
split_edge uses tree_redirect_and_branch. This *should* be cheap,
but unfortunately it is not if a block ends in a switch. Namely,
if the last statement is a SWITCH_EXPR, we do
case SWITCH_EXPR:
{
tree vec = SWITCH_LABELS (stmt);
size_t i, n = TREE_VEC_LENGTH (vec);
for (i = 0; i < n; ++i)
{
tree elt = TREE_VEC_ELT (vec, i);
if (label_to_block (CASE_LABEL (elt)) == e->dest)
CASE_LABEL (elt) = label;
}
}
break;
In english: we look at all labels in the switch label vector.
So the algorithm for splitting critical edges out of blocks ending
in a SWITCH_EXPR has the following behavior,
for all outgoing edges of the block ending in a SWITCH_EXPR
for all labels in the SWITCH_EXPR
replace the label if necessary
This is O(E*L) where E is the number of outgoing edges and L is the
number of case labels in the SWITCH_EXPR. And we have L >= E, so
this is quadratic behavior.
As a result, many passes that should be trivial and fast, are really
unbearably slow. For a limited version of the test case of PR15524,
the top most expensive passes are:
dominator optimization : 30.74 (35%) usr 2.78 (80%) sys 33.91 (34%) wall
tree CFG cleanup : 17.74 (20%) usr 0.02 ( 1%) sys 19.29 (19%) wall
tree branch prediction : 13.86 (16%) usr 0.01 ( 0%) sys 13.88 (14%) wall
tree split crit edges : 7.66 ( 9%) usr 0.00 ( 0%) sys 7.66 ( 8%) wall
and the top of the profile according to oprofile is,
samples % samples % image name symbol name
437578 14.7650 14777 14.3664 cc1 tree_redirect_edge_and_branch
Ouch.
Note that at -O2 or better we always split critical edges for PRE, so
this behavior does not only show up in artificial test cases, it happens
in real code too. It looks like we really need a reverse mapping from
edges to case labels for each SWITCH_EXPR :-/
Gr.
Steven