[tcb] Incremental SSA updates

Mon Mar 28 03:12:00 GMT 2005

Hi Diego,

> > In other words, DOM figures out a taken edge using conditional
> > equivalences it creates while following a dominator tree.  With VRP,
> > these conditional equivalences are stored in the combination of
> > ASSERT_EXPRs and PHI nodes.  Plus, these conditional equivalences are
> > pretty easy to get at.  Just follow a ud chain and you will hit an
> > ASSERT_EXPR.
> > 
> OK.  If that's all the information needed by the jump threader,
> then it should be straightforward to replace the tables built by
> DOM with the information already present in the IL.

I wouldn't say "all" but "most".  As far as I have figured out so far,
here is the rough breakdown of jump threading opportunities that my
pass followed by DOM picks up on TCB.

87% VRP-based (a.k.a. my pass)

13% DOM

    9% Threading through a COND_EXPR with its COND_EXPR_COND being
       SSA_NAME op SSA_NAME

    4% Secondary jump threading opportunities
       FRE of loads

"Secondary jump threading opportunities" refers to jump threading
opportunities that are exposed by the VRP-based jump threader.  Note
that jump threading often gives rise to "a_2 = PHI <a_1>;", allowing
further propagation and jump threading opportunities.

"FRE of loads" refers to a jump threading opportunity like so

  a_1 = global_variable;
  if (a_1 == 0) goto here; else <L1>;

<L1>:;
  /* Some statements that do not modify memory.  */
  a_2 = global_variable;
  if (a_2 == 0) goto here; else goto there;

Note that we always go to "there" if we get to the second "if"
statement (assuming there is no other incoming edge to <L1>), but
AFAIK, TCB doesn't have a replacement for this functionality.  (FRE
doesn't eliminate redundant loads, does it?)

"Threading through a COND_EXPR with ..." refers to a jump threading
opportunity of the form

  if (a_1 == b_2)
    /       \
   /         \
 blah;      blah;
   \         /
    \       /
  if (a_1 == b_2)

Note that we have "SSA_NAME op SSA_NAME".  I don't think ASSERT_EXPR
is good at keeping track of this kind of symbolic ranges.  In theory,
we *could* force this kind of structure into

  tem_3 = a_1 == b_2;
  if (tem_3)
    /       \
   /         \
 blah;      blah;
   \         /
    \       /
  tem4 = PHI <1, 0>
  if (tem_4)

so that the newly introduced PHI node is sufficient to determine a
taken edge, but the cost of the transformation wouldn't be negligible.
(FWIW, LLVM seems to only allow a boolean SSA_NAME in COND_EXPR_COND.)

> Now that we can have VRP insert PHI nodes again, it should be a
> matter of trying then.

Yes, In fact, I am already doing so. :-) You'll find
update_ssa (true); in

  http://gcc.gnu.org/ml/gcc-patches/2005-03/msg02388.html

> > Jeff seems to want to thread edges through a block that contains some
> > statements before a COND_EXPR or SWITCH_EXPR.  In that case, DOM would
> > only need to build various tables starting at the beginning of the
> > block being threaded through.
> > 
> Got an example?  Not sure I follow what you're saying here.

Sure.

  # a_3 = PHI <1(31), 0(29), 0(28), 0(26), 0(30)>;
<L6>:;
  a_4 = (_Bool) a_3;
  if (a_4 == 0) goto <L8>; else goto <L37>;

Note that all of the incoming edges can be threaded if you evaluate
the intervening cast for each PHI argument, which is what DOM (with
Jeff's pending improvement) does.  In this example, we already have
enough information in the PHI node to get us started.  For example, if
we want to thread an incoming edge from basic block 31.  We would walk

  a_3 = 1;
  a_4 = (_Bool) a_3;
  if (a_4 == 0) ...

while folding each statement and recording its result.  At the end, we
get

  a_3 = 1;
  a_4 = 1;
  if (0) ...

meaning that we take the "false" edge.  Constant propagation, copy
propagation, and VRP (with PHI insertion) create useful PHI nodes for
us.  That is, a PHI argument is a constant if it should be, or an
SSA_NAME.  In the latter case, the SSA_NAME is annotated with range
information.  So, on TCB, DOM doesn't have to walk every single
statement.  It suffices to walk statements in a block being threaded
through.

Having said all this, I should note that there are still some cases
that the current DOM (with Jeff's pending improvement) catches but the
proposed simplified DOM (or my jump threader rewrite) would miss.
Consider

  /* Several incoming edges.  */
<L6>:;
  a_4 = global_variable;
  if (a_4 == 0) goto <L8>; else goto <L37>;

In this case, DOM effectively performs PRE of loads(!).  That is, if
the value of global_variable is known for one of the incoming edges,
DOM will associate a_4 with the recorded value of global_variable and
then determines a taken edge.  Daniel Berlin seems to be planning to
add load elimination in his PRE, so maybe we can rely on that.

So in conclusion, if you are shooting for a perfect replacement that
does not miss anything that DOM catches, that would take a lot of
work.  But if we are looking for a faster replacement with some misses
and a few additional catches, it's already doable.

Note that my pass currently catches more jump threading opportunities
than (but not a superset of) what DOM catches.  It's not difficult to
extend it to walk the statements leading up to COND_EXPR or
SWITCH_EXPR and/or to keep track of COND_EXPR_COND of the form
SSA_NAME op SSA_NAME.

Kazu Hirata