This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [tcb] Incremental SSA updates


On Sun, Mar 27, 2005 at 10:00:43PM -0500, Kazu Hirata wrote:
> Hi Diego,
> 
> > > In other words, DOM figures out a taken edge using conditional
> > > equivalences it creates while following a dominator tree.  With VRP,
> > > these conditional equivalences are stored in the combination of
> > > ASSERT_EXPRs and PHI nodes.  Plus, these conditional equivalences are
> > > pretty easy to get at.  Just follow a ud chain and you will hit an
> > > ASSERT_EXPR.
> > > 
> > OK.  If that's all the information needed by the jump threader,
> > then it should be straightforward to replace the tables built by
> > DOM with the information already present in the IL.
> 
> I wouldn't say "all" but "most".  As far as I have figured out so far,
> here is the rough breakdown of jump threading opportunities that my
> pass followed by DOM picks up on TCB.
> 
> 87% VRP-based (a.k.a. my pass)
> 
> 13% DOM
> 
>     9% Threading through a COND_EXPR with its COND_EXPR_COND being
>        SSA_NAME op SSA_NAME
> 
VRP explicitly adds ASSERT_EXPRs for only one operand.  Adding
assertions for both operands gets in the way of range propagation
afterwards.

>     4% Secondary jump threading opportunities
>        FRE of loads
> 
> "Secondary jump threading opportunities" refers to jump threading
> opportunities that are exposed by the VRP-based jump threader.  Note
> that jump threading often gives rise to "a_2 = PHI <a_1>;", allowing
> further propagation and jump threading opportunities.
> 
VRP could be told to copy propagate range information in these
cases.

> "FRE of loads" refers to a jump threading opportunity like so
> 
>   a_1 = global_variable;
>   if (a_1 == 0) goto here; else <L1>;
> 
> <L1>:;
>   /* Some statements that do not modify memory.  */
>   a_2 = global_variable;
>   if (a_2 == 0) goto here; else goto there;
> 
> Note that we always go to "there" if we get to the second "if"
> statement (assuming there is no other incoming edge to <L1>), but
> AFAIK, TCB doesn't have a replacement for this functionality.  (FRE
> doesn't eliminate redundant loads, does it?)
> 
Hmm, it should.  ISTR adding it last year.

> "Threading through a COND_EXPR with ..." refers to a jump threading
> opportunity of the form
> 
>   if (a_1 == b_2)
>     /       \
>    /         \
>  blah;      blah;
>    \         /
>     \       /
>   if (a_1 == b_2)
> 
> Note that we have "SSA_NAME op SSA_NAME".  I don't think ASSERT_EXPR
> is good at keeping track of this kind of symbolic ranges.  In theory,
> we *could* force this kind of structure into
> 
We do register symbolic ranges in VRP:

a_1: VARYING
b_2: VARYING
a_3: [b_2, b_2]
a_4: ~[b_2, b_2]

foo (a, b)
{
  if (a_1 == b_2) goto <L0>; else goto <L1>;

<L0>:;
  a_3 = b_2;
  baz1 (a_3);
  goto <bb 3> (<L2>);

<L1>:;
  a_4 = a_1;
  bar ();

<L2>:;
  if (a_1 == b_2) goto <L3>; else goto <L4>;

<L3>:;
  baz2 (a_1);

<L4>:;
  return;
}

However, you may not be able to take advantage of them with the
current insertion strategy because (a) we only insert
ASSERT_EXPRs if the operands are used downstream (I only got VRP
to compute the ranges after I inserted uses of a_1 in the
conditional bodies), and (b) we still don't insert PHI nodes at
join blocks.

Both are easy to overcome, and you already have a solution for
(b).

> So in conclusion, if you are shooting for a perfect replacement that
> does not miss anything that DOM catches, that would take a lot of
> work.  But if we are looking for a faster replacement with some misses
> and a few additional catches, it's already doable.
> 
> Note that my pass currently catches more jump threading opportunities
> than (but not a superset of) what DOM catches.  It's not difficult to
> extend it to walk the statements leading up to COND_EXPR or
> SWITCH_EXPR and/or to keep track of COND_EXPR_COND of the form
> SSA_NAME op SSA_NAME.
> 
Well, bit-for-bit identity shouldn't be the only goal.  We have
to consider other factors.  Is the new implementation faster?  Is
it easier to maintain?  Does it provide other opportunities?
The basic metric of counting the number of transformations is not
always the best.

How is the quality of the final code?  If one version makes N+1
transformations than the other one and the final executable is as
fast, then it doesn't matter that it made one more
transformation.

I'd like to hear from Jeff about your plan.  He's been dealing
with the jump threader for far longer and maybe his new
improvements can be added to this other implementation.  It'd be
a shame to keep all this duplicate implementation in the
way that DOM operates.  If it can use information gathered by
other passes, then we win.


Diego.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]