This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Quadratic behavior due to tree-cfg.c:tree_redirect_edge_and_branch


Hmm,

So I fixed one example of this interesting O(N^2) behavior in DOM
(PR15524), only to find that we do the same thing in other places.

Say we have a very large switch statement.  We represent switches
as a condition and a vector of labels indexed by the runtime value
of the condition.  But every now and then we can determine the
condition at compile time and we'll try to optimize the switch away,
or we want to split all critical edges, so we may have to replace
the label in the label vector with something else.

Take for example tree-cfg.c:split_critical_edges:

  FOR_ALL_BB (bb)
    {
      for (e = bb->succ; e ; e = e->succ_next)
        if (EDGE_CRITICAL_P (e) && !(e->flags & EDGE_ABNORMAL))
          {
            split_edge (e);
          }
    }

In other words, walk all edges and split the critical ones.  Now,
split_edge uses tree_redirect_and_branch.  This *should* be cheap,
but unfortunately it is not if a block ends in a switch.  Namely,
if the last statement is a SWITCH_EXPR, we do

    case SWITCH_EXPR:
      {
        tree vec = SWITCH_LABELS (stmt);
        size_t i, n = TREE_VEC_LENGTH (vec);

        for (i = 0; i < n; ++i)
          {
            tree elt = TREE_VEC_ELT (vec, i);
            if (label_to_block (CASE_LABEL (elt)) == e->dest)
              CASE_LABEL (elt) = label;
          }
      }
      break;

In english: we look at all labels in the switch label vector.

So the algorithm for splitting critical edges out of blocks ending
in a SWITCH_EXPR has the following behavior,

  for all outgoing edges of the block ending in a SWITCH_EXPR
    for all labels in the SWITCH_EXPR
      replace the label if necessary

This is O(E*L) where E is the number of outgoing edges and L is the
number of case labels in the SWITCH_EXPR.  And we have L >= E, so
this is quadratic behavior.

As a result, many passes that should be trivial and fast, are really
unbearably slow.  For a limited version of the test case of PR15524,
the top most expensive passes are:

 dominator optimization :  30.74 (35%) usr   2.78 (80%) sys  33.91 (34%) wall
 tree CFG cleanup       :  17.74 (20%) usr   0.02 ( 1%) sys  19.29 (19%) wall
 tree branch prediction :  13.86 (16%) usr   0.01 ( 0%) sys  13.88 (14%) wall
 tree split crit edges  :   7.66 ( 9%) usr   0.00 ( 0%) sys   7.66 ( 8%) wall

and the top of the profile according to oprofile is,

samples  %        samples  %        image name   symbol name
437578   14.7650  14777    14.3664  cc1          tree_redirect_edge_and_branch

Ouch.

Note that at -O2 or better we always split critical edges for PRE, so
this behavior does not only show up in artificial test cases, it happens
in real code too.  It looks like we really need a reverse mapping from
edges to case labels for each SWITCH_EXPR :-/

Gr.
Steven



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]