[Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

Jeff Law law@redhat.com
Wed Aug 19 21:53:00 GMT 2015


On 08/15/2015 11:01 AM, Ajit Kumar Agarwal wrote:
>
>
>  From cf2b64cc1d6623424d770f2a9ea257eb7e58e887 Mon Sep 17 00:00:00 2001
> From: Ajit Kumar Agarwal<ajitkum@xilix.com>
> Date: Sat, 15 Aug 2015 18:19:14 +0200
> Subject: [PATCH] [Patch,tree-optimization]: Add new path Splitting pass on
>   tree ssa representation.
>
> Added a new pass on path splitting on tree SSA representation. The path
> splitting optimization does the CFG transformation of join block of the
> if-then-else same as the loop latch node is moved and merged with the
> predecessor blocks after preserving the SSA representation.
>
> ChangeLog:
> 2015-08-15  Ajit Agarwal<ajitkum@xilinx.com>
>
> 	* gcc/Makefile.in: Add the build of the new file
> 	tree-ssa-path-split.c
Instead:

	* Makefile.in (OBJS): Add tree-ssa-path-split.o.


> 	* gcc/opts.c (OPT_ftree_path_split) : Add an entry for
> 	Path splitting pass with optimization flag greater and
> 	equal to O2.

	* opts.c (default_options_table): Add entry for path splitting
	optimization at -O2 and above.



> 	* gcc/passes.def (path_split): add new path splitting pass.
Capitalize "add".




> 	* gcc/tree-ssa-path-split.c: New.
Use "New file".

> 	* gcc/tracer.c (transform_duplicate): New.
Use "New function".

> 	* gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c: New.
> 	* gcc/testsuite/gcc.dg/path-split-1.c: New.
These belong in gcc/testsuite/ChangeLog and remove the "gcc/testsuite" 
prefix.

> 	* gcc/doc/invoke.texi
> 	(ftree-path-split): Document.
> 	(fdump-tree-path_split): Document.
Should just be two lines instead of three.

And more generally, there's no need to prefix ChangeLog entries with "gcc/".

Now that the ChangeLog nits are out of the way, let's get to stuff 
that's more interesting.



>
> Signed-off-by:Ajit Agarwalajitkum@xilinx.com
> ---
>   gcc/Makefile.in                              |   1 +
>   gcc/common.opt                               |   4 +
>   gcc/doc/invoke.texi                          |  16 +-
>   gcc/opts.c                                   |   1 +
>   gcc/passes.def                               |   1 +
>   gcc/testsuite/gcc.dg/path-split-1.c          |  65 ++++++
>   gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c |  60 +++++
>   gcc/timevar.def                              |   1 +
>   gcc/tracer.c                                 |  37 +--
>   gcc/tree-pass.h                              |   1 +
>   gcc/tree-ssa-path-split.c                    | 330 +++++++++++++++++++++++++++
>   11 files changed, 503 insertions(+), 14 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.dg/path-split-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c
>   create mode 100644 gcc/tree-ssa-path-split.c
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index e80eadf..1d02582 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2378,6 +2378,10 @@ ftree-vrp
>   Common Report Var(flag_tree_vrp) Init(0) Optimization
>   Perform Value Range Propagation on trees
>
> +ftree-path-split
> +Common Report Var(flag_tree_path_split) Init(0) Optimization
> +Perform Path Splitting
Maybe "Perform Path Splitting for loop backedges" or something which is 
a little more descriptive.  The above isn't exactly right, so don't use 
it as-is.



> @@ -9068,6 +9075,13 @@ enabled by default at @option{-O2} and higher.  Null pointer check
>   elimination is only done if @option{-fdelete-null-pointer-checks} is
>   enabled.
>
> +@item -ftree-path-split
> +@opindex ftree-path-split
> +Perform Path Splitting  on trees.  The join blocks of IF-THEN-ELSE same
> +as loop latch node is moved to its predecessor and the loop latch node
> +will be forwarding block.  This is enabled by default at @option{-O2}
> +and higher.
Needs some work.  Maybe something along the lines of

When two paths of execution merge immediately before a loop latch node, 
try to duplicate the merge node into the two paths.

> diff --git a/gcc/passes.def b/gcc/passes.def
> index 6b66f8f..20ddf3d 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.  If not see
>   	  NEXT_PASS (pass_ccp);
>   	  /* After CCP we rewrite no longer addressed locals into SSA
>   	     form if possible.  */
> +          NEXT_PASS (pass_path_split);
>   	  NEXT_PASS (pass_forwprop);
>   	  NEXT_PASS (pass_sra_early);
I can't recall if we've discussed the location of the pass at all.  I'm 
not objecting to this location, but would like to hear why you chose 
this particular location in the optimization pipeline.

>   	  /* pass_build_ealias is a dummy pass that ensures that we
> diff --git a/gcc/testsuite/gcc.dg/path-split-1.c b/gcc/testsuite/gcc.dg/path-split-1.c
ISTM the two tests should be combined into a single test.  I didn't see 
a functional difference in the test() function between those two tests.

I believe you can still create/scan debugging dumps with dg-do run test.


> +DEFTIMEVAR (TV_TREE_PATH_SPLIT  , "tree path_split")
tree path split rather than using underscores

> diff --git a/gcc/tracer.c b/gcc/tracer.c
> index cad7ab1..206692f 100644
> --- a/gcc/tracer.c
> +++ b/gcc/tracer.c
> @@ -58,11 +58,13 @@
>   #include "fibonacci_heap.h"
>
>   static int count_insns (basic_block);
> -static bool ignore_bb_p (const_basic_block);
> +bool ignore_bb_p (const_basic_block);
>   static bool better_p (const_edge, const_edge);
>   static edge find_best_successor (basic_block);
>   static edge find_best_predecessor (basic_block);
>   static int find_trace (basic_block, basic_block *);
> +basic_block transform_duplicate(basic_block bb,
> +                                basic_block  bb2);
Please create a tracer.h and put the newly exported prototypes in 
tracer.h.  Then include tracer.h in tracer.c and tree-ssa-path-split.c.

> @@ -224,6 +226,24 @@ find_trace (basic_block bb, basic_block *trace)
>     return i;
>   }
>
> +/* Transform the block that needs to be duplicated.  */
> +
> +basic_block
> +transform_duplicate(basic_block bb,
> +                    basic_block  bb2)
Space between the name of the function and first paren.  It looks like 
these two lines should be joined.  Single space between the type and the 
name of the argument.

Ultimately there's not a lot of shared code between the tracer and path 
splitting, which is basically what I expected.  Nevertheless, sharing a 
single implementation of those routines is probably wise.


> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "cfghooks.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "rtl.h"
> +#include "ssa.h"
> +#include "flags.h"
> +#include "alias.h"
> +#include "fold-const.h"
> +#include "stor-layout.h"
> +#include "calls.h"
> +#include "cfganal.h"
> +#include "internal-fn.h"
> +#include "gimple-fold.h"
> +#include "tree-eh.h"
> +#include "gimple-iterator.h"
> +#include "gimple-walk.h"
> +#include "tree-cfg.h"
> +#include "tree-ssa-loop-manip.h"
> +#include "tree-ssa-loop-niter.h"
> +#include "tree-ssa-loop.h"
> +#include "tree-into-ssa.h"
> +#include "tree-ssa.h"
> +#include "tree-pass.h"
> +#include "tree-dump.h"
> +#include "gimple-pretty-print.h"
> +#include "diagnostic-core.h"
> +#include "intl.h"
> +#include "cfgloop.h"
> +#include "tree-scalar-evolution.h"
> +#include "tree-ssa-propagate.h"
> +#include "tree-chrec.h"
> +#include "insn-config.h"
> +#include "expmed.h"
> +#include "dojump.h"
> +#include "explow.h"
> +#include "emit-rtl.h"
> +#include "varasm.h"
> +#include "stmt.h"
> +#include "expr.h"
> +#include "insn-codes.h"
> +#include "optabs.h"
> +#include "fibonacci_heap.h"
I'm having a hard time seeing how all these are needed.  Especially the 
RTL related includes.  These really need to be trimmed.


> +
> +extern basic_block transform_duplicate(basic_block bb, basic_block  bb2);
> +extern bool ignore_bb_p (const_basic_block bb);
With the prototypes moved to tracer.h, you won't want the extern 
declarations in here.

> +
> +/* This function gets the join blocks same as the source
> +   node of the loop latch nodes and form the trace with
> +   the join block and its predecessor.  */
I'm having a hard time parsing this comment. Please try to rewrite it to 
be clearer.

I suspect this routine is more complicated than it needs to be because 
of how you're searching for our subgraphs.


> +
> +static int
> +find_trace_loop_latch_same_as_join_blk (basic_block bb,
> +                                        basic_block *trace)
> +{
> +  vec<basic_block> bbs;
> +  basic_block bb1;
> +  unsigned int i;
> +  edge_iterator ei;
> +  edge e1;
> +  bool found = false;
> +  int n = 0;
> +
> +  bbs = get_all_dominated_blocks (CDI_DOMINATORS,
> +                                  bb );
> +  FOR_EACH_VEC_ELT (bbs, i, bb1)
> +  {
> +    FOR_EACH_EDGE (e1, ei, bb->succs)
> +    {
> +      if ( bb1 == e1->dest)
> +        {
> +          found = true;
> +        }
> +    }
> +
> +    if (!found && bb1 != bb)
> +      {
> +        found = false;
> +        FOR_EACH_EDGE (e1, ei, bb1->succs)
> +        {
> +          if (e1->flags & (EDGE_DFS_BACK))
> +            {
> +              trace[1] = e1->src;
> +              n++;
> +              found = true;
> +            }
> +        }
It seems to me all this can be changed to look for the backedges via the 
loop tree.



> +
> +        if (found && EDGE_COUNT(bb1->preds) == 2)
Space between EDGE_COUNT and the open paren.

You'd want to keep this test, then do some kind of dominance test like 
we talked about earlier in the discussion of this patch.

Jeff
> +          {
> +            FOR_EACH_EDGE (e1, ei, bb1->preds)
> +            {
> +              if (single_succ_p(e1->src) &&
> +                  single_succ_edge (e1->src)->flags & EDGE_FALLTHRU)
And you'd keep these tests in some form.

> +
> +/* This function performs the feasibility tests for path splitting
> +   to perform. Return false if the feasibility for path splitting
> +   is not done and returns true if the feasbility for path splitting
> +   is done. Following feasibility tests are performed.
> +
> +   1. Return false if the join block has rhs casting for assign
> +      gimple statements.
> +   2. If the number of phis is greater than 1 or the phi node in
> +      the join block has virtual operand return false.
> +   3. Return true if the number of sequential statements is
> +      greater than 2.
> +   4. If the phi result in the phi node of the join block is not
> +      used inside the same join block return false.
> +   7. Otherwise returns true.  */
These seem totally arbitrary.  What's the reason behind each of these 
restrictions?  None should be a correctness requirement AFAICT.  Others 
(like the number of statements) should probably be conditionalized on 
size vs speed optimizations.



  The join
> +   same as the source of the loop latch node is identified along
> +   with their predecessors.
I couldn't parse this sentence.



Based on the feasibility tests for
> +   path splitting the path splitting is performed by adding the
> +   join blocks into the predecessors after propagating the phi
> +   result with the corresponding phi arguments for the predecessors.
> +   The  tree-cfg-cleanup will merge the blocks in the predecessors
> +   path and the update-ssa will update the ssa representation after
> +   the path splitting is performed.  */
It would probably help to show a visual representation of what's 
happening.  ie, show a little CFG before and after.



> +
> +static bool
> +perform_path_splitting ()
> +{
> +  bool changed = false;
> +  basic_block trace[2] = {NULL,NULL};
> +  basic_block bb;
> +  auto_vec<fibonacci_node<long, basic_block_def>*> blocks;
> +  blocks.safe_grow_cleared (last_basic_block_for_fn (cfun));
> +  fibonacci_heap<long, basic_block_def> heap (LONG_MIN);
> +
> +  initialize_original_copy_tables();
> +  calculate_dominance_info (CDI_DOMINATORS);

> +
> +  FOR_EACH_BB_FN (bb, cfun)
> +  {
> +    if (!ignore_bb_p (bb))
> +      blocks[bb->index] = heap.insert (-bb->frequency, bb);
> +  }
> +
> +  while (!heap.empty())
Wouldn't it be easier to walk the loop tree?  We've got a fairly 
specific subgraph that we're looking for.  Why not walk the loop tree, 
using the latch as a point to search backwards for the shape of the 
subgraph we're interested in transforming?

That would seem to be a lot less expensive than looking at every block 
looking for the start of the subgraphs we're interested in.  That'd also 
seem to eliminate all the fibheap stuff.

> +
> +static unsigned int
> +execute_path_split (void)
> +{
> +  bool changed;
> +
> +  if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1)
> +      return 0;
> +
> +  mark_dfs_back_edges ();
ISTM you could check the return value from mark_dfs_back_edges and do 
nothing if no back edges were found.

> +
> +static bool
> +gate_path_split(void)
> +{
> +  return flag_tree_path_split != 0;
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_path_split =
> +{
> +  GIMPLE_PASS, /* type */
> +  "path_split", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_TREE_PATH_SPLIT, /* tv_id */
> +  0, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_update_ssa, /* todo_flags_finish */
It seems to me that you're probably missing some properties_required. 
You depend on a proper CFG and SSA graph, so at the least PROP_cfg and 
PROP_ssa.

Presumably you don't set TODO_cleanup_cfg so that you can avoid the cfg 
cleanup if nothing gets changed.  Is there some reason you don't do the 
same for the SSA graph update?

Jeff





More information about the Gcc-patches mailing list