PR19038 causes regression of 3.9% in SPEC int
Jeffrey A Law
law@redhat.com
Wed Dec 29 19:30:00 GMT 2004
On Fri, 2004-12-24 at 14:26 +0100, Steven Bosscher wrote:
> On Friday 24 December 2004 11:02, Roger Sayle wrote:
> > On Fri, 24 Dec 2004, Mostafa Hagog wrote:
> > > Performing the loop-header-copying is a sort of "work around" to this PR,
> > > I have ported the loop-header-copying from tree's to RTL. This caused a
> > > total 3.9% improvements with the rtl loop header copying (lhc).
> >
> > That's PR tree-optimization/19038, *not* PR target/19039.
>
> http://gcc.gnu.org/PR19038 for the quick link lovers.
>
>
> > Would you mind posting your RTL-based loop header copying patch to
> > gcc-patches? Not only might it be catching more optimizations than
> > out-of-ssa is causing, but as a solution it might be more suitable
> > for gcc 4.0 (depending upon compile-time performance impact, ugliness,
> > intrusivenss, corrections to Andrew's "quick fix", etc...).
>
> I'm unimpressed by either RTL loop header copying, or Andrew's fix,
> both just paper over the underlying problem. The real problem is
> that apparently we are creating a situation where copy propagation
> makes coalescing impossible. Fix that and you don't get the extra
> basic block - bug fixed, done.
>
>
> > Very impressive performance numbers though.
>
> Indeed. We really should fix this one way or another for GCC 4.0.
Well, this ought to fix the poor placement of the copy instruction.
Bootstrapped and regression tested on i686-pc-linux-gnu.
-------------- next part --------------
* tree-outof-ssa.c (insert_backedge_copies): New function.
(rewrite_out_of_ssa): Use it.
Index: tree-outof-ssa.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-outof-ssa.c,v
retrieving revision 2.37
diff -c -p -r2.37 tree-outof-ssa.c
*** tree-outof-ssa.c 24 Dec 2004 05:23:08 -0000 2.37
--- tree-outof-ssa.c 29 Dec 2004 19:13:54 -0000
*************** remove_ssa_form (FILE *dump, var_map map
*** 2362,2367 ****
--- 2362,2456 ----
dump_file = save;
}
+ /* Search every PHI node for arguments associated with backedges which
+ we can trivially determine will need a copy (the argument is either
+ not an SSA_NAME or the argument has a different underlying variable
+ than the PHI result).
+
+ Insert a copy from the PHI argument to a new destination at the
+ end of the block with the backedge to the top of the loop. Update
+ the PHI argument to reference this new destination. */
+
+ static void
+ insert_backedge_copies (void)
+ {
+ basic_block bb;
+
+ FOR_EACH_BB (bb)
+ {
+ tree phi;
+
+ for (phi = phi_nodes (bb); phi; phi = PHI_CHAIN (phi))
+ {
+ tree result = PHI_RESULT (phi);
+ tree result_var;
+ int i;
+
+ if (!is_gimple_reg (result))
+ continue;
+
+ result_var = SSA_NAME_VAR (result);
+ for (i = 0; i < PHI_NUM_ARGS (phi); i++)
+ {
+ tree arg = PHI_ARG_DEF (phi, i);
+ edge e = PHI_ARG_EDGE (phi, i);
+
+ /* If the argument is not an SSA_NAME, then we will
+ need a constant initialization. If the argument is
+ an SSA_NAME with a different underlying variable and
+ we are not combining temporaries, then we will
+ need a copy statement. */
+ if ((e->flags & EDGE_DFS_BACK)
+ && (TREE_CODE (arg) != SSA_NAME
+ || (!flag_tree_combine_temps
+ && SSA_NAME_VAR (arg) != result_var)))
+ {
+ tree stmt, name, last = NULL;
+ block_stmt_iterator bsi;
+
+ bsi = bsi_last (PHI_ARG_EDGE (phi, i)->src);
+ if (!bsi_end_p (bsi))
+ last = bsi_stmt (bsi);
+
+ /* In theory the only way we ought to get back to the
+ start of a loop should be with a COND_EXPR or GOTO_EXPR.
+ However, better safe than sorry.
+
+ If the block ends with a control statment or
+ something that might throw, then we have to
+ insert this assignment before the last
+ statement. Else insert it after the last statement. */
+ if (last && stmt_ends_bb_p (last))
+ {
+ /* If the last statement in the block is the definition
+ site of the PHI argument, then we can't insert
+ anything after it. */
+ if (TREE_CODE (arg) == SSA_NAME
+ && SSA_NAME_DEF_STMT (arg) == last)
+ continue;
+ }
+
+ /* Create a new instance of the underlying
+ variable of the PHI result. */
+ stmt = build (MODIFY_EXPR, TREE_TYPE (result_var),
+ NULL, PHI_ARG_DEF (phi, i));
+ name = make_ssa_name (result_var, stmt);
+ TREE_OPERAND (stmt, 0) = name;
+
+ /* Insert the new statement into the block and update
+ the PHI node. */
+ if (last && stmt_ends_bb_p (last))
+ bsi_insert_before (&bsi, stmt, BSI_NEW_STMT);
+ else
+ bsi_insert_after (&bsi, stmt, BSI_NEW_STMT);
+ modify_stmt (stmt);
+ SET_PHI_ARG_DEF (phi, i, name);
+ }
+ }
+ }
+ }
+ }
+
/* Take the current function out of SSA form, as described in
R. Morgan, ``Building an Optimizing Compiler'',
Butterworth-Heinemann, Boston, MA, 1998. pp 176-186. */
*************** rewrite_out_of_ssa (void)
*** 2373,2378 ****
--- 2462,2475 ----
int var_flags = 0;
int ssa_flags = (SSANORM_REMOVE_ALL_PHIS | SSANORM_USE_COALESCE_LIST);
+ /* If elimination of a PHI requires inserting a copy on a backedge,
+ then we will have to split the backedge which has numerous
+ undesirable performance effects.
+
+ A significant number of such cases can be handled here by inserting
+ copies into the loop itself. */
+ insert_backedge_copies ();
+
if (!flag_tree_live_range_split)
ssa_flags |= SSANORM_COALESCE_PARTITIONS;
More information about the Gcc-patches
mailing list