This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation


On 08/15/2015 11:01 AM, Ajit Kumar Agarwal wrote:


 From cf2b64cc1d6623424d770f2a9ea257eb7e58e887 Mon Sep 17 00:00:00 2001
From: Ajit Kumar Agarwal<ajitkum@xilix.com>
Date: Sat, 15 Aug 2015 18:19:14 +0200
Subject: [PATCH] [Patch,tree-optimization]: Add new path Splitting pass on
  tree ssa representation.

Added a new pass on path splitting on tree SSA representation. The path
splitting optimization does the CFG transformation of join block of the
if-then-else same as the loop latch node is moved and merged with the
predecessor blocks after preserving the SSA representation.

ChangeLog:
2015-08-15  Ajit Agarwal<ajitkum@xilinx.com>

	* gcc/Makefile.in: Add the build of the new file
	tree-ssa-path-split.c
Instead:

	* Makefile.in (OBJS): Add tree-ssa-path-split.o.


	* gcc/opts.c (OPT_ftree_path_split) : Add an entry for
	Path splitting pass with optimization flag greater and
	equal to O2.

	* opts.c (default_options_table): Add entry for path splitting
	optimization at -O2 and above.



	* gcc/passes.def (path_split): add new path splitting pass.
Capitalize "add".




	* gcc/tree-ssa-path-split.c: New.
Use "New file".

	* gcc/tracer.c (transform_duplicate): New.
Use "New function".

	* gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c: New.
	* gcc/testsuite/gcc.dg/path-split-1.c: New.
These belong in gcc/testsuite/ChangeLog and remove the "gcc/testsuite" prefix.

	* gcc/doc/invoke.texi
	(ftree-path-split): Document.
	(fdump-tree-path_split): Document.
Should just be two lines instead of three.

And more generally, there's no need to prefix ChangeLog entries with "gcc/".

Now that the ChangeLog nits are out of the way, let's get to stuff that's more interesting.




Signed-off-by:Ajit Agarwalajitkum@xilinx.com
---
  gcc/Makefile.in                              |   1 +
  gcc/common.opt                               |   4 +
  gcc/doc/invoke.texi                          |  16 +-
  gcc/opts.c                                   |   1 +
  gcc/passes.def                               |   1 +
  gcc/testsuite/gcc.dg/path-split-1.c          |  65 ++++++
  gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c |  60 +++++
  gcc/timevar.def                              |   1 +
  gcc/tracer.c                                 |  37 +--
  gcc/tree-pass.h                              |   1 +
  gcc/tree-ssa-path-split.c                    | 330 +++++++++++++++++++++++++++
  11 files changed, 503 insertions(+), 14 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/path-split-1.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c
  create mode 100644 gcc/tree-ssa-path-split.c

diff --git a/gcc/common.opt b/gcc/common.opt
index e80eadf..1d02582 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2378,6 +2378,10 @@ ftree-vrp
  Common Report Var(flag_tree_vrp) Init(0) Optimization
  Perform Value Range Propagation on trees

+ftree-path-split
+Common Report Var(flag_tree_path_split) Init(0) Optimization
+Perform Path Splitting
Maybe "Perform Path Splitting for loop backedges" or something which is a little more descriptive. The above isn't exactly right, so don't use it as-is.



@@ -9068,6 +9075,13 @@ enabled by default at @option{-O2} and higher.  Null pointer check
  elimination is only done if @option{-fdelete-null-pointer-checks} is
  enabled.

+@item -ftree-path-split
+@opindex ftree-path-split
+Perform Path Splitting  on trees.  The join blocks of IF-THEN-ELSE same
+as loop latch node is moved to its predecessor and the loop latch node
+will be forwarding block.  This is enabled by default at @option{-O2}
+and higher.
Needs some work.  Maybe something along the lines of

When two paths of execution merge immediately before a loop latch node, try to duplicate the merge node into the two paths.

diff --git a/gcc/passes.def b/gcc/passes.def
index 6b66f8f..20ddf3d 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.  If not see
  	  NEXT_PASS (pass_ccp);
  	  /* After CCP we rewrite no longer addressed locals into SSA
  	     form if possible.  */
+          NEXT_PASS (pass_path_split);
  	  NEXT_PASS (pass_forwprop);
  	  NEXT_PASS (pass_sra_early);
I can't recall if we've discussed the location of the pass at all. I'm not objecting to this location, but would like to hear why you chose this particular location in the optimization pipeline.

  	  /* pass_build_ealias is a dummy pass that ensures that we
diff --git a/gcc/testsuite/gcc.dg/path-split-1.c b/gcc/testsuite/gcc.dg/path-split-1.c
ISTM the two tests should be combined into a single test. I didn't see a functional difference in the test() function between those two tests.

I believe you can still create/scan debugging dumps with dg-do run test.


+DEFTIMEVAR (TV_TREE_PATH_SPLIT  , "tree path_split")
tree path split rather than using underscores

diff --git a/gcc/tracer.c b/gcc/tracer.c
index cad7ab1..206692f 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -58,11 +58,13 @@
  #include "fibonacci_heap.h"

  static int count_insns (basic_block);
-static bool ignore_bb_p (const_basic_block);
+bool ignore_bb_p (const_basic_block);
  static bool better_p (const_edge, const_edge);
  static edge find_best_successor (basic_block);
  static edge find_best_predecessor (basic_block);
  static int find_trace (basic_block, basic_block *);
+basic_block transform_duplicate(basic_block bb,
+                                basic_block  bb2);
Please create a tracer.h and put the newly exported prototypes in tracer.h. Then include tracer.h in tracer.c and tree-ssa-path-split.c.

@@ -224,6 +226,24 @@ find_trace (basic_block bb, basic_block *trace)
    return i;
  }

+/* Transform the block that needs to be duplicated.  */
+
+basic_block
+transform_duplicate(basic_block bb,
+                    basic_block  bb2)
Space between the name of the function and first paren. It looks like these two lines should be joined. Single space between the type and the name of the argument.

Ultimately there's not a lot of shared code between the tracer and path splitting, which is basically what I expected. Nevertheless, sharing a single implementation of those routines is probably wise.


+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "cfghooks.h"
+#include "tree.h"
+#include "gimple.h"
+#include "rtl.h"
+#include "ssa.h"
+#include "flags.h"
+#include "alias.h"
+#include "fold-const.h"
+#include "stor-layout.h"
+#include "calls.h"
+#include "cfganal.h"
+#include "internal-fn.h"
+#include "gimple-fold.h"
+#include "tree-eh.h"
+#include "gimple-iterator.h"
+#include "gimple-walk.h"
+#include "tree-cfg.h"
+#include "tree-ssa-loop-manip.h"
+#include "tree-ssa-loop-niter.h"
+#include "tree-ssa-loop.h"
+#include "tree-into-ssa.h"
+#include "tree-ssa.h"
+#include "tree-pass.h"
+#include "tree-dump.h"
+#include "gimple-pretty-print.h"
+#include "diagnostic-core.h"
+#include "intl.h"
+#include "cfgloop.h"
+#include "tree-scalar-evolution.h"
+#include "tree-ssa-propagate.h"
+#include "tree-chrec.h"
+#include "insn-config.h"
+#include "expmed.h"
+#include "dojump.h"
+#include "explow.h"
+#include "emit-rtl.h"
+#include "varasm.h"
+#include "stmt.h"
+#include "expr.h"
+#include "insn-codes.h"
+#include "optabs.h"
+#include "fibonacci_heap.h"
I'm having a hard time seeing how all these are needed. Especially the RTL related includes. These really need to be trimmed.


+
+extern basic_block transform_duplicate(basic_block bb, basic_block  bb2);
+extern bool ignore_bb_p (const_basic_block bb);
With the prototypes moved to tracer.h, you won't want the extern declarations in here.

+
+/* This function gets the join blocks same as the source
+   node of the loop latch nodes and form the trace with
+   the join block and its predecessor.  */
I'm having a hard time parsing this comment. Please try to rewrite it to be clearer.

I suspect this routine is more complicated than it needs to be because of how you're searching for our subgraphs.


+
+static int
+find_trace_loop_latch_same_as_join_blk (basic_block bb,
+                                        basic_block *trace)
+{
+  vec<basic_block> bbs;
+  basic_block bb1;
+  unsigned int i;
+  edge_iterator ei;
+  edge e1;
+  bool found = false;
+  int n = 0;
+
+  bbs = get_all_dominated_blocks (CDI_DOMINATORS,
+                                  bb );
+  FOR_EACH_VEC_ELT (bbs, i, bb1)
+  {
+    FOR_EACH_EDGE (e1, ei, bb->succs)
+    {
+      if ( bb1 == e1->dest)
+        {
+          found = true;
+        }
+    }
+
+    if (!found && bb1 != bb)
+      {
+        found = false;
+        FOR_EACH_EDGE (e1, ei, bb1->succs)
+        {
+          if (e1->flags & (EDGE_DFS_BACK))
+            {
+              trace[1] = e1->src;
+              n++;
+              found = true;
+            }
+        }
It seems to me all this can be changed to look for the backedges via the loop tree.



+
+        if (found && EDGE_COUNT(bb1->preds) == 2)
Space between EDGE_COUNT and the open paren.

You'd want to keep this test, then do some kind of dominance test like we talked about earlier in the discussion of this patch.

Jeff
+          {
+            FOR_EACH_EDGE (e1, ei, bb1->preds)
+            {
+              if (single_succ_p(e1->src) &&
+                  single_succ_edge (e1->src)->flags & EDGE_FALLTHRU)
And you'd keep these tests in some form.

+
+/* This function performs the feasibility tests for path splitting
+   to perform. Return false if the feasibility for path splitting
+   is not done and returns true if the feasbility for path splitting
+   is done. Following feasibility tests are performed.
+
+   1. Return false if the join block has rhs casting for assign
+      gimple statements.
+   2. If the number of phis is greater than 1 or the phi node in
+      the join block has virtual operand return false.
+   3. Return true if the number of sequential statements is
+      greater than 2.
+   4. If the phi result in the phi node of the join block is not
+      used inside the same join block return false.
+   7. Otherwise returns true.  */
These seem totally arbitrary. What's the reason behind each of these restrictions? None should be a correctness requirement AFAICT. Others (like the number of statements) should probably be conditionalized on size vs speed optimizations.



 The join
+   same as the source of the loop latch node is identified along
+   with their predecessors.
I couldn't parse this sentence.



Based on the feasibility tests for
+   path splitting the path splitting is performed by adding the
+   join blocks into the predecessors after propagating the phi
+   result with the corresponding phi arguments for the predecessors.
+   The  tree-cfg-cleanup will merge the blocks in the predecessors
+   path and the update-ssa will update the ssa representation after
+   the path splitting is performed.  */
It would probably help to show a visual representation of what's happening. ie, show a little CFG before and after.



+
+static bool
+perform_path_splitting ()
+{
+  bool changed = false;
+  basic_block trace[2] = {NULL,NULL};
+  basic_block bb;
+  auto_vec<fibonacci_node<long, basic_block_def>*> blocks;
+  blocks.safe_grow_cleared (last_basic_block_for_fn (cfun));
+  fibonacci_heap<long, basic_block_def> heap (LONG_MIN);
+
+  initialize_original_copy_tables();
+  calculate_dominance_info (CDI_DOMINATORS);

+
+  FOR_EACH_BB_FN (bb, cfun)
+  {
+    if (!ignore_bb_p (bb))
+      blocks[bb->index] = heap.insert (-bb->frequency, bb);
+  }
+
+  while (!heap.empty())
Wouldn't it be easier to walk the loop tree? We've got a fairly specific subgraph that we're looking for. Why not walk the loop tree, using the latch as a point to search backwards for the shape of the subgraph we're interested in transforming?

That would seem to be a lot less expensive than looking at every block looking for the start of the subgraphs we're interested in. That'd also seem to eliminate all the fibheap stuff.

+
+static unsigned int
+execute_path_split (void)
+{
+  bool changed;
+
+  if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1)
+      return 0;
+
+  mark_dfs_back_edges ();
ISTM you could check the return value from mark_dfs_back_edges and do nothing if no back edges were found.

+
+static bool
+gate_path_split(void)
+{
+  return flag_tree_path_split != 0;
+}
+
+namespace {
+
+const pass_data pass_data_path_split =
+{
+  GIMPLE_PASS, /* type */
+  "path_split", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_TREE_PATH_SPLIT, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_update_ssa, /* todo_flags_finish */
It seems to me that you're probably missing some properties_required. You depend on a proper CFG and SSA graph, so at the least PROP_cfg and PROP_ssa.

Presumably you don't set TODO_cleanup_cfg so that you can avoid the cfg cleanup if nothing gets changed. Is there some reason you don't do the same for the SSA graph update?

Jeff




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]