This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Jump threading/bypassing volume 2
- From: Jeffrey A Law <law at redhat dot com>
- To: gcc at gcc dot gnu dot org
- Date: Mon, 09 Aug 2004 13:01:26 -0600
- Subject: Jump threading/bypassing volume 2
- Organization: Red Hat, Inc
- Reply-to: law at redhat dot com
Even with all the problems I detailed with the new jump bypassing
algorithm, there's still something to be gained from it.
Specifically, the idea of using block/statement copying to make
updating the SSA graph easier for jump threading is amazingly
powerful.
Given two edges A->B and B->C in an arbitrary CFG, we can use block
copying to allow us to create A->B'->C.
That is clearly safe if B' is an exact duplicate of B and any PHIs
in C are updated to have a new argument for the edge B'->C and we
may need to insert PHIs for any objects which are set in B and B'.
With that basic concept in place we can pretty easily see how to
implement jump threading.
Given A->B and B->C where B has multiple outgoing edges and an
arbitrary number of incoming edges. If traversing the edge A->B
guarantees traversal of B->C then:
1. Make a copy of B (including its outgoing edges and statements).
Call the copy B'. Note B' has no incoming edges or PHIs at this
time.
2. Remove the control statement at the end of B' and all outgoing
edges except B'->C.
3. Add a new argument to each PHI in C with the same value as the
existing argument associated with edge B->C. Associate the
new arguments with the edge B'->C.
4. For each PHI in B, find or create a new PHI in B'. Add an
argument to the new PHI with the same value as the PHI in B
associated with the edge A->B. Associate the new argument in
the PHI in B' with the edge A->B.
5. Change the edge A->B to A->B'
5a. This automatically deletes the now useless PHI arguments
associated with A->B in B.
5b. This automatically associates the arguments to PHIs in B'
with A->B' instead of A->B.
6. Repeat for other incoming edges into B.
7. Put any duplicated resources in B and all the B' blocks into
SSA form.
There's a couple tricks that can be used to minimize the amount of
block duplication, but that's the basic algorithm.
That algorithm is a direct replacement for the old updating code in
tree-ssa-dom.c, so we get all the utility of the old jump threading
code, without the insane contortions to update the SSA form that we
used to do (take a set of variables out of SSA form, modify the CFG,
then put those variables back into SSA form).
Since that algorithm is a drop-in replacement for the old one in
tree-ssa-dom.c, it should result in nearly identical runtime jumping
behavior in the generated code. Small differences are expected as
the old algorithm would sometimes "cancel" a jump thread in cases
where it didn't know how to update the SSA graph. And the runtime
results show precisely that:
Conditional Branches
Static Dynamic
OLD 123852 81349225669
NEW 123850 81345075767
And the compile time behavior is noticeably improved; DOM itself is
roughly 20% faster, which results in a net improvement of around
1.3% in the tests I've run.
This code also allows us to remove the per-variable out of SSA
code. I've taken a first stab at that. There may be more code
that can be removed.
It's also the case that I believe this scheme can be extended to
handle the jump threading cases that are currently missed; the
ability to thread through a block with side effects that must be
preserved is the single biggest class of cases we're missing. And
being able to do that also means its likely we won't need the
ping-pong between DOM and DCE to fully thread jumps.
Finally, I believe we can experiment with step #7 to evaluate
various schemes for updating the SSA graph:
a. rewrite_ssa_into_ssa (which is what's used now)
b. Incremental statement by statement update using the primitives
found in the paper from Choi, Sarkar and Schonberg (and
advocated by Zadeck)
c. A scheme found in a paper from Sastry and Ju which is designed
to update multiple duplicated objects by walking up the dominator
tree from uses which are not dominated by their sets which
the authors claim to be more efficient than "b".
Anyway, I'll be posting a patch to change tree-ssa-dom.c to use the
new block duplication based updating scheme shortly.
Jeff