This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Jump threading/bypassing volume 2


Even with all the problems I detailed with the new jump bypassing
algorithm, there's still something to be gained from it.

Specifically, the idea of using block/statement copying to make
updating the SSA graph easier for jump threading is amazingly 
powerful.


Given two edges A->B and B->C in an arbitrary CFG, we can use block
copying to allow us to create A->B'->C.

That is clearly safe if B' is an exact duplicate of B and any PHIs
in C are updated to have a new argument for the edge B'->C and we 
may need to insert PHIs for any objects which are set in B and B'.

With that basic concept in place we can pretty easily see how to
implement jump threading.

Given A->B and B->C where B has multiple outgoing edges and an
arbitrary number of incoming edges.  If traversing the edge A->B
guarantees traversal of B->C then:

  1. Make a copy of B (including its outgoing edges and statements).
     Call the copy B'.  Note B' has no incoming edges or PHIs at this
     time.

  2. Remove the control statement at the end of B' and all outgoing
     edges except B'->C.

  3. Add a new argument to each PHI in C with the same value as the
     existing argument associated with edge B->C.  Associate the
     new arguments with the edge B'->C.

  4. For each PHI in B, find or create a new PHI in B'.  Add an
     argument to the new PHI with the same value as the PHI in B
     associated with the edge A->B.  Associate the new argument in
     the PHI in B' with the edge A->B.

  5. Change the edge A->B to A->B'

     5a. This automatically deletes the now useless PHI arguments
         associated with A->B in B.

     5b. This automatically associates the arguments to PHIs in B'
         with A->B' instead of A->B.

  6. Repeat for other incoming edges into B.

  7. Put any duplicated resources in B and all the B' blocks into
     SSA form.

There's a couple tricks that can be used to minimize the amount of
block duplication, but that's the basic algorithm.

That algorithm is a direct replacement for the old updating code in
tree-ssa-dom.c, so we get all the utility of the old jump threading
code, without the insane contortions to update the SSA form that we
used to do (take a set of variables out of SSA form, modify the CFG,
then put those variables back into SSA form).

Since that algorithm is a drop-in replacement for the old one in
tree-ssa-dom.c, it should result in nearly identical runtime jumping
behavior in the generated code.  Small differences are expected as
the old algorithm would sometimes "cancel" a jump thread in cases
where it didn't know how to update the SSA graph.   And the runtime
results show precisely that:

        Conditional Branches
	Static  Dynamic
OLD	123852  81349225669 
NEW	123850	81345075767



And the compile time behavior is noticeably improved; DOM itself is
roughly 20% faster, which results in a net improvement of around 
1.3% in the tests I've run.

This code also allows us to remove the per-variable out of SSA
code.  I've taken a first stab at that.  There may be more code
that can be removed.

It's also the case that I believe this scheme can be extended to
handle the jump threading cases that are currently missed; the
ability to thread through a block with side effects that must be
preserved is the single biggest class of cases we're missing.  And
being able to do that also means its likely we won't need the 
ping-pong between DOM and DCE to fully thread jumps.

Finally, I believe we can experiment with step #7 to evaluate
various schemes for updating the SSA graph:

  a. rewrite_ssa_into_ssa (which is what's used now)

  b. Incremental statement by statement update using the primitives
     found in the paper from Choi, Sarkar and Schonberg (and 
     advocated by Zadeck)

  c. A scheme found in a paper from Sastry and Ju which is designed
     to update multiple duplicated objects by walking up the dominator
     tree from uses which are not dominated by their sets which
     the authors claim to be more efficient than "b".

Anyway, I'll be posting a patch to change tree-ssa-dom.c to use the
new block duplication based updating scheme shortly.

Jeff






Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]