This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, PR43864] Gimple level duplicate block cleanup.
On Sun, Jul 17, 2011 at 8:33 PM, Tom de Vries <vries@codesourcery.com> wrote:
> Bootstrapped and reg-tested on x86_64. ?Ok for trunk (after ARM testing)?
+static int
+same_succ_equal (const void *ve1, const void *ve2)
+{
...
+ if (bitmap_bit_p (e1->bbs, ENTRY_BLOCK)
+ || bitmap_bit_p (e1->bbs, EXIT_BLOCK)
+ || bitmap_bit_p (e2->bbs, ENTRY_BLOCK)
+ || bitmap_bit_p (e2->bbs, EXIT_BLOCK))
+ return 0;
that's odd - what are these checks for?
+ if (dump_file)
+ {
+ fprintf (dump_file, "initial worklist:\n");
with dump_flags & TDF_DETAILS
I'm now at merge_calls and wondering about alias info again. We are
probably safe for the per-pointer information because we are not
operating flow-sensitive for memory and for merging require value-equivalence
for SSA names. For calls the same should be true - we are not
flow- or context-sensitive, and even if we were context-sentitive we
require equivalent arguments (for memory arguments we should be safe
because of the non-flow-sensitivity).
So, did you actually run into problems? If not then I suggest to remove
merge_calls completely (and the related changes that it requires).
+/* Create or update a vop phi in BB2. Use VUSE1 arguments for all the
+ REDIRECTED_EDGES, or if VUSE1 is NULL_TREE, use BB_VOP_AT_EXIT. If a new
+ phis is created, use the phi instead of VUSE2 in BB2. */
+
+static void
+update_vuses (tree vuse1, tree vuse2, basic_block bb2,
+ VEC (edge,heap) *redirected_edges)
...
+ if (vuse2 == NULL_TREE)
+ return;
hm, that's the case when there is no VUSE that is dominated by BB2
(or is in BB2). Ok, might happen.
+ locus1 = gimple_location (SSA_NAME_DEF_STMT (arg));
+ add_phi_arg (phi, arg, e, locus1);
I don't think virtual operand PHIs should have locations, just use
UNKNOWN_LOCATION here.
+ locus2 = gimple_location (def_stmt2);
Likewise.
+ /* Create a phi, first with default argument vuse2 for all preds. */
+ lhs = make_ssa_name (SSA_NAME_VAR (vuse2), NULL);
+ VN_INFO_GET (lhs);
+ phi = create_phi_node (lhs, bb2);
+ SSA_NAME_DEF_STMT (lhs) = phi;
+ FOR_EACH_EDGE (e, ei, bb2->preds)
+ add_phi_arg (phi, vuse2, e, locus2);
+
+ /* Now overwrite the arguments associated with the redirected edges with
+ vuse1. */
+ for (i = 0; i < EDGE_COUNT (redirected_edges); ++i)
+ {
+ e = VEC_index (edge, redirected_edges, i);
+ gcc_assert (PHI_ARG_DEF_FROM_EDGE (phi, e));
No need for this assert.
+ if (vuse1)
+ arg = vuse1;
+ else
+ arg = BB_VOP_AT_EXIT (e->src);
+ SET_PHI_ARG_DEF (phi, e->dest_idx, arg);
+ locus1 = gimple_location (SSA_NAME_DEF_STMT (arg));
See above.
+ gimple_phi_arg_set_location (phi, e->dest_idx, locus1);
+ }
Can you maybe merge this with the update-existing-phi-case? They
look all too similar.
+ /* Replace uses of vuse2 in bb2 with phi. */
+ FOR_EACH_IMM_USE_STMT (stmt, iter, vuse2)
+ {
+ if (gimple_code (stmt) == GIMPLE_PHI)
Does FOR_EACH_IMM_USE_ON_STMT really not work for PHIs?
Other code doesn't seem to care.
+ {
+ edge e;
+ if (stmt == phi)
+ continue;
+ e = find_edge (bb2, gimple_bb (stmt));
+ if (e == NULL)
+ continue;
+ use_p = PHI_ARG_DEF_PTR_FROM_EDGE (stmt, e);
+ SET_USE (use_p, lhs);
+ update_stmt (stmt);
+ }
+ else if (gimple_bb (stmt) == bb2)
That check looks odd. A use can very well appear in a forwarder block.
+ {
+ FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
+ SET_USE (use_p, lhs);
+ update_stmt (stmt);
+ }
+ }
+/* Scans the vdefs and vuses of the insn of BB, and returns the vop at entry in
+ VOP_AT_ENTRY, and the vop at exit in VOP_AT_EXIT. */
+
+static void
+insn_vops (basic_block bb, tree *vop_at_entry, tree *vop_at_exit)
it's easier to start from the bb end and walk until you see the
first vdef or vuse. Then you have *vop_at_exit. From there
just walk the SSA_NAME_DEF_STMTs of the vuse until you
hit one whose definition is not in BB - and you have *vop_at_entry.
That way you can avoid scanning most of the stmts.
The function also has an odd name ;) It should be something like
vops_at_bb_entry_and_exit.
+static void
+vop_at_entry (basic_block bb1, basic_block bb2, tree *vop_at_entry1,
+ tree *vop_at_entry2)
so you don't need the vop at exit at all? The function is a bit unclear
to me given it does so much stuff other than just computing the BBs
entry VOPs ...
+static void
+replace_block_by (basic_block bb1, basic_block bb2, bool update_vops)
+{
can you add some comments before the different phases of update?
I _think_ I understand what it does, but ...
+/* Runs tail merge optimization. */
+
+unsigned int
+tail_merge_optimize (unsigned int todo)
+{
+ int nr_bbs_removed_total = 0;
+ int nr_bbs_removed;
+ bool loop_entered = false;
+ int iteration_nr = 0;
+ bool update_vops = ((todo & TODO_update_ssa_only_virtuals) == 0
+ || !symbol_marked_for_renaming (gimple_vop (cfun)));
you need to simplify this to
bool update_vops = !symbol_marked_for_renaming (gimple_vop (cfun));
+ if (nr_bbs_removed == 0)
+ break;
+
+ free_dominance_info (CDI_DOMINATORS);
we might want to limit the number of iterations we perform - especially
as you are re-computing dominators on each iteration. What's the
maximum number of iterations you see during a GCC bootstrap?
+ if (dump_file)
+ fprintf (dump_file, "htab collision / search: %f\n",
+ htab_collisions (same_succ_htab));
in general without dump_flags & TDF_DETAILS passes should print
at most things when they did a transformation (some even don't do that).
Please double-check all your dump-prints.
+ todo |= (TODO_verify_ssa | TODO_verify_stmts | TODO_verify_flow
+ | TODO_dump_func);
should all be already set.
@ -4945,6 +4944,9 @@ execute_pre (bool do_fre)
scev_finalize ();
fini_pre (do_fre);
+ todo |= tail_merge_optimize (todo);
+ free_scc_vn ();
Please only run tail_merge_optimize once. As we are running through
this code three times at -O2. I suggest to try it in the !do_fre case
only which we only run once (as PRE). If that doesn't work out
nicely we need to find other means (like introduce a
pass_fre_and_tail_merge which passes down another flag and replace
one FRE pass with that new combined pass).
Index: gcc/tree-flow.h
===================================================================
--- gcc/tree-flow.h (revision 175801)
+++ gcc/tree-flow.h (working copy)
@@ -806,6 +806,9 @@ bool multiplier_allowed_in_address_p (HO
unsigned multiply_by_cost (HOST_WIDE_INT, enum machine_mode, bool);
bool may_be_nonaddressable_p (tree expr);
+/* In tree-ssa-tail-merge.c. */
+extern unsigned int tail_merge_optimize (unsigned int);
Eh, tree-flow.h kitchen-sink ;) Please put it into tree-pass.h instead.
That said - I'm reasonably happy with the pass now, but it's rather large
(this review took 40min again ...) so I appreciate a second look from
somebody else.
Btw, can you expand a bit on the amount of testcases?
Thanks,
Richard.
> Thanks,
> - Tom
>
> 2011-07-17 ?Tom de Vries ?<tom@codesourcery.com>
>
> ? ? ? ?PR middle-end/43864
> ? ? ? ?* tree-ssa-tail-merge.c: New file.
> ? ? ? ?(struct same_succ): Define.
> ? ? ? ?(same_succ_t, const_same_succ_t): New typedef.
> ? ? ? ?(struct bb_cluster): Define.
> ? ? ? ?(bb_cluster_t, const_bb_cluster_t): New typedef.
> ? ? ? ?(struct aux_bb_info): Define.
> ? ? ? ?(BB_SIZE, BB_SAME_SUCC, BB_CLUSTER, BB_VOP_AT_EXIT): Define.
> ? ? ? ?(gvn_uses_equal): New function.
> ? ? ? ?(same_succ_print, same_succ_print_traverse, same_succ_hash)
> ? ? ? ?(inverse_flags, same_succ_equal, same_succ_alloc, same_succ_delete)
> ? ? ? ?(same_succ_reset): New function.
> ? ? ? ?(same_succ_htab, same_succ_edge_flags)
> ? ? ? ?(deleted_bbs, deleted_bb_preds): New var.
> ? ? ? ?(debug_same_succ): New function.
> ? ? ? ?(worklist): New var.
> ? ? ? ?(print_worklist, add_to_worklist, find_same_succ_bb, find_same_succ)
> ? ? ? ?(init_worklist, delete_worklist, delete_basic_block_same_succ)
> ? ? ? ?(same_succ_flush_bbs, update_worklist): New function.
> ? ? ? ?(print_cluster, debug_cluster, same_predecessors)
> ? ? ? ?(add_bb_to_cluster, new_cluster, delete_cluster): New function.
> ? ? ? ?(all_clusters): New var.
> ? ? ? ?(alloc_cluster_vectors, reset_cluster_vectors, delete_cluster_vectors)
> ? ? ? ?(merge_clusters, set_cluster): New function.
> ? ? ? ?(gimple_equal_p, find_duplicate, same_phi_alternatives_1)
> ? ? ? ?(same_phi_alternatives, bb_has_non_vop_phi, find_clusters_1)
> ? ? ? ?(find_clusters): New function.
> ? ? ? ?(merge_calls, update_vuses, vop_phi, insn_vops, vop_at_entry)
> ? ? ? ?(replace_block_by): New function.
> ? ? ? ?(update_bbs): New var.
> ? ? ? ?(apply_clusters): New function.
> ? ? ? ?(update_debug_stmt, update_debug_stmts): New function.
> ? ? ? ?(tail_merge_optimize): New function.
> ? ? ? ?tree-flow.h (tail_merge_optimize): Declare.
> ? ? ? ?* tree-ssa-pre.c (execute_pre): Use tail_merge_optimize.
> ? ? ? ?* Makefile.in (OBJS-common): Add tree-ssa-tail-merge.o.
> ? ? ? ?(tree-ssa-tail-merge.o): New rule.
> ? ? ? ?* opts.c (default_options_table): Set OPT_ftree_tail_merge by default at
> ? ? ? ?OPT_LEVELS_2_PLUS.
> ? ? ? ?* tree-ssa-sccvn.c (vn_valueize): Move to ...
> ? ? ? ?* tree-ssa-sccvn.h (vn_valueize): Here.
> ? ? ? ?* tree-ssa-alias.h (pt_solution_ior_into_shared): Declare.
> ? ? ? ?* tree-ssa-structalias.c (find_what_var_points_to): Factor out and
> ? ? ? ?use ...
> ? ? ? ?(pt_solution_share): New function.
> ? ? ? ?(pt_solution_unshare, pt_solution_ior_into_shared): New function.
> ? ? ? ?(delete_points_to_sets): Nullify shared_bitmap_table after deletion.
> ? ? ? ?* timevar.def (TV_TREE_TAIL_MERGE): New timevar.
> ? ? ? ?* common.opt (ftree-tail-merge): New switch.
> ? ? ? ?* params.def (PARAM_MAX_TAIL_MERGE_COMPARISONS): New parameter.
> ? ? ? ?* doc/invoke.texi (Optimization Options, -O2): Add -ftree-tail-merge.
> ? ? ? ?(-ftree-tail-merge, max-tail-merge-comparisons): New item.
>