This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Speedup loop header copying [part of PR 46590]
- From: Michael Matz <matz at suse dot de>
- To: gcc-patches at gcc dot gnu dot org
- Date: Sun, 2 Sep 2012 21:35:10 +0200 (CEST)
- Subject: Speedup loop header copying [part of PR 46590]
Hi,
as the bug report tells us one speed problem is loop header copying, in
particular the update_ssa call that is done for each and every copied loop
header but touches all blocks in a function.
Now, one idea was to use an optimized update_ssa that works only on the
relevant subset of blocks (it's dominance frontiers that are the problem).
I've experimented with the original formulation of frontiers as per Cytron
which allows to calculate the domfrontier of one basic block lazily. The
end result was none, no speedup, no slowdown. I haven't investigated,
but I guess the problem is that too often most of the blocks are relevant
for most of the header copies, either because of virtual ops or because
calculating the domfrontier of a block needs all domfrontiers of all
dominated childs (i.e. the domfrontier of the entry needs domfrontiers of
all blocks). So, no cake there.
But actually there's no reason that we need to keep SSA form uptodate
during the multiple header copyings. We use gimple_duplicate_sese_region
to do the copying which updates the SSA web before returning (actually
loop header copying is the only caller of it). The next thing done is
just another call to gimple_duplicate_sese_region to copy some other BBs,
then some split edges, repeat from start. We can just as well defer the
whole SSA web updating to after we've duplicated everything we want.
That's what this patch does. Time for various things on the testcase
(with -O1):
without with patch
tree SSA rewrite 26.2s 4.8s
tree SSA incremental 21.7s 4.6s
dominance computation 15.0s 4.2s
dominance frontiers 25.6s 6.7s
TOTAL 135.6s 67.8s
Regstrapped on x86_64-linux, no regressions. Okay for trunk?
Ciao,
Michael.
------------------
PR tree-optimization/46590
* tree-cfg.c (gimple_duplicate_sese_region): Don't update
SSA web here ...
* tree-ssa-loop-ch.c (copy_loop_headers): ... but here.
Index: tree-cfg.c
===================================================================
--- tree-cfg.c (revision 190803)
+++ tree-cfg.c (working copy)
@@ -5530,9 +5530,10 @@ add_phi_args_after_copy (basic_block *re
important exit edge EXIT. By important we mean that no SSA name defined
inside region is live over the other exit edges of the region. All entry
edges to the region must go to ENTRY->dest. The edge ENTRY is redirected
- to the duplicate of the region. SSA form, dominance and loop information
- is updated. The new basic blocks are stored to REGION_COPY in the same
- order as they had in REGION, provided that REGION_COPY is not NULL.
+ to the duplicate of the region. Dominance and loop information is
+ updated, but not the SSA web. The new basic blocks are stored to
+ REGION_COPY in the same order as they had in REGION, provided that
+ REGION_COPY is not NULL.
The function returns false if it is unable to copy the region,
true otherwise. */
@@ -5593,8 +5594,6 @@ gimple_duplicate_sese_region (edge entry
free_region_copy = true;
}
- gcc_assert (!need_ssa_update_p (cfun));
-
/* Record blocks outside the region that are dominated by something
inside. */
doms = NULL;
@@ -5663,9 +5662,6 @@ gimple_duplicate_sese_region (edge entry
/* Add the other PHI node arguments. */
add_phi_args_after_copy (region_copy, n_region, NULL);
- /* Update the SSA web. */
- update_ssa (TODO_update_ssa);
-
if (free_region_copy)
free (region_copy);
Index: tree-ssa-loop-ch.c
===================================================================
--- tree-ssa-loop-ch.c (revision 190803)
+++ tree-ssa-loop-ch.c (working copy)
@@ -241,6 +241,7 @@ copy_loop_headers (void)
split_edge (loop_latch_edge (loop));
}
+ update_ssa (TODO_update_ssa);
free (bbs);
free (copied_bbs);