This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

From: Jeff Law <law at redhat dot com>
To: Ajit Kumar Agarwal <ajit dot kumar dot agarwal at xilinx dot com>, Richard Biener <richard dot guenther at gmail dot com>
Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Vinod Kathail <vinodk at xilinx dot com>, Shail Aditya Gupta <shailadi at xilinx dot com>, Vidhumouli Hunsigida <vidhum at xilinx dot com>, Nagaraju Mekala <nmekala at xilinx dot com>
Date: Wed, 19 Aug 2015 15:46:09 -0600
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation
Authentication-results: sourceware.org; auth=none
References: <37378DC5BCD0EE48BA4B082E0B55DFAA41F3F56C at XAP-PVEXMBX02 dot xlnx dot xilinx dot com> <CAFiYyc1wXMQ_69b=Q9V=bz5KAiLAp8C9VugpfyOZqt9iiDMzXQ at mail dot gmail dot com> <37378DC5BCD0EE48BA4B082E0B55DFAA41F41150 at XAP-PVEXMBX02 dot xlnx dot xilinx dot com> <CAFiYyc0T+8CeeruKp-S2byhyBXW41ub7esqin+OuAYAy8q1WJA at mail dot gmail dot com> <37378DC5BCD0EE48BA4B082E0B55DFAA41F42541 at XAP-PVEXMBX02 dot xlnx dot xilinx dot com> <CAFiYyc23ZRxFv4S=XKQUX2VgT113GDZVVjOYViSoYy4yP11cdg at mail dot gmail dot com> <37378DC5BCD0EE48BA4B082E0B55DFAA4295155D at XAP-PVEXMBX02 dot xlnx dot xilinx dot com> <37378DC5BCD0EE48BA4B082E0B55DFAA4295ADCB at XAP-PVEXMBX02 dot xlnx dot xilinx dot com>

On 08/15/2015 11:01 AM, Ajit Kumar Agarwal wrote:

 From cf2b64cc1d6623424d770f2a9ea257eb7e58e887 Mon Sep 17 00:00:00 2001
From: Ajit Kumar Agarwal<ajitkum@xilix.com>
Date: Sat, 15 Aug 2015 18:19:14 +0200
Subject: [PATCH] [Patch,tree-optimization]: Add new path Splitting pass on
  tree ssa representation.

Added a new pass on path splitting on tree SSA representation. The path
splitting optimization does the CFG transformation of join block of the
if-then-else same as the loop latch node is moved and merged with the
predecessor blocks after preserving the SSA representation.

ChangeLog:
2015-08-15  Ajit Agarwal<ajitkum@xilinx.com>

	* gcc/Makefile.in: Add the build of the new file
	tree-ssa-path-split.c

Instead:

	* Makefile.in (OBJS): Add tree-ssa-path-split.o.

	* gcc/opts.c (OPT_ftree_path_split) : Add an entry for
	Path splitting pass with optimization flag greater and
	equal to O2.


	* opts.c (default_options_table): Add entry for path splitting
	optimization at -O2 and above.

	* gcc/passes.def (path_split): add new path splitting pass.

Capitalize "add".

	* gcc/tree-ssa-path-split.c: New.

Use "New file".

	* gcc/tracer.c (transform_duplicate): New.

Use "New function".

	* gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c: New.
	* gcc/testsuite/gcc.dg/path-split-1.c: New.

These belong in gcc/testsuite/ChangeLog and remove the "gcc/testsuite"prefix.

	* gcc/doc/invoke.texi
	(ftree-path-split): Document.
	(fdump-tree-path_split): Document.

Should just be two lines instead of three.

And more generally, there's no need to prefix ChangeLog entries with "gcc/".

Now that the ChangeLog nits are out of the way, let's get to stuffthat's more interesting.


Signed-off-by:Ajit Agarwalajitkum@xilinx.com
---
  gcc/Makefile.in                              |   1 +
  gcc/common.opt                               |   4 +
  gcc/doc/invoke.texi                          |  16 +-
  gcc/opts.c                                   |   1 +
  gcc/passes.def                               |   1 +
  gcc/testsuite/gcc.dg/path-split-1.c          |  65 ++++++
  gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c |  60 +++++
  gcc/timevar.def                              |   1 +
  gcc/tracer.c                                 |  37 +--
  gcc/tree-pass.h                              |   1 +
  gcc/tree-ssa-path-split.c                    | 330 +++++++++++++++++++++++++++
  11 files changed, 503 insertions(+), 14 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/path-split-1.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c
  create mode 100644 gcc/tree-ssa-path-split.c

diff --git a/gcc/common.opt b/gcc/common.opt
index e80eadf..1d02582 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2378,6 +2378,10 @@ ftree-vrp
  Common Report Var(flag_tree_vrp) Init(0) Optimization
  Perform Value Range Propagation on trees

+ftree-path-split
+Common Report Var(flag_tree_path_split) Init(0) Optimization
+Perform Path Splitting

Maybe "Perform Path Splitting for loop backedges" or something which isa little more descriptive. The above isn't exactly right, so don't useit as-is.

@@ -9068,6 +9075,13 @@ enabled by default at @option{-O2} and higher.  Null pointer check
  elimination is only done if @option{-fdelete-null-pointer-checks} is
  enabled.

+@item -ftree-path-split
+@opindex ftree-path-split
+Perform Path Splitting  on trees.  The join blocks of IF-THEN-ELSE same
+as loop latch node is moved to its predecessor and the loop latch node
+will be forwarding block.  This is enabled by default at @option{-O2}
+and higher.

Needs some work.  Maybe something along the lines of

When two paths of execution merge immediately before a loop latch node,try to duplicate the merge node into the two paths.

diff --git a/gcc/passes.def b/gcc/passes.def
index 6b66f8f..20ddf3d 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.  If not see
  	  NEXT_PASS (pass_ccp);
  	  /* After CCP we rewrite no longer addressed locals into SSA
  	     form if possible.  */
+          NEXT_PASS (pass_path_split);
  	  NEXT_PASS (pass_forwprop);
  	  NEXT_PASS (pass_sra_early);

I can't recall if we've discussed the location of the pass at all. I'mnot objecting to this location, but would like to hear why you chosethis particular location in the optimization pipeline.

  	  /* pass_build_ealias is a dummy pass that ensures that we
diff --git a/gcc/testsuite/gcc.dg/path-split-1.c b/gcc/testsuite/gcc.dg/path-split-1.c

ISTM the two tests should be combined into a single test. I didn't seea functional difference in the test() function between those two tests.


I believe you can still create/scan debugging dumps with dg-do run test.

+DEFTIMEVAR (TV_TREE_PATH_SPLIT  , "tree path_split")

tree path split rather than using underscores

diff --git a/gcc/tracer.c b/gcc/tracer.c
index cad7ab1..206692f 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -58,11 +58,13 @@
  #include "fibonacci_heap.h"

  static int count_insns (basic_block);
-static bool ignore_bb_p (const_basic_block);
+bool ignore_bb_p (const_basic_block);
  static bool better_p (const_edge, const_edge);
  static edge find_best_successor (basic_block);
  static edge find_best_predecessor (basic_block);
  static int find_trace (basic_block, basic_block *);
+basic_block transform_duplicate(basic_block bb,
+                                basic_block  bb2);

Please create a tracer.h and put the newly exported prototypes intracer.h. Then include tracer.h in tracer.c and tree-ssa-path-split.c.

@@ -224,6 +226,24 @@ find_trace (basic_block bb, basic_block *trace)
    return i;
  }

+/* Transform the block that needs to be duplicated.  */
+
+basic_block
+transform_duplicate(basic_block bb,
+                    basic_block  bb2)

Space between the name of the function and first paren. It looks likethese two lines should be joined. Single space between the type and thename of the argument.

Ultimately there's not a lot of shared code between the tracer and pathsplitting, which is basically what I expected. Nevertheless, sharing asingle implementation of those routines is probably wise.

+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "cfghooks.h"
+#include "tree.h"
+#include "gimple.h"
+#include "rtl.h"
+#include "ssa.h"
+#include "flags.h"
+#include "alias.h"
+#include "fold-const.h"
+#include "stor-layout.h"
+#include "calls.h"
+#include "cfganal.h"
+#include "internal-fn.h"
+#include "gimple-fold.h"
+#include "tree-eh.h"
+#include "gimple-iterator.h"
+#include "gimple-walk.h"
+#include "tree-cfg.h"
+#include "tree-ssa-loop-manip.h"
+#include "tree-ssa-loop-niter.h"
+#include "tree-ssa-loop.h"
+#include "tree-into-ssa.h"
+#include "tree-ssa.h"
+#include "tree-pass.h"
+#include "tree-dump.h"
+#include "gimple-pretty-print.h"
+#include "diagnostic-core.h"
+#include "intl.h"
+#include "cfgloop.h"
+#include "tree-scalar-evolution.h"
+#include "tree-ssa-propagate.h"
+#include "tree-chrec.h"
+#include "insn-config.h"
+#include "expmed.h"
+#include "dojump.h"
+#include "explow.h"
+#include "emit-rtl.h"
+#include "varasm.h"
+#include "stmt.h"
+#include "expr.h"
+#include "insn-codes.h"
+#include "optabs.h"
+#include "fibonacci_heap.h"

I'm having a hard time seeing how all these are needed. Especially theRTL related includes. These really need to be trimmed.

+
+extern basic_block transform_duplicate(basic_block bb, basic_block  bb2);
+extern bool ignore_bb_p (const_basic_block bb);

With the prototypes moved to tracer.h, you won't want the externdeclarations in here.

+
+/* This function gets the join blocks same as the source
+   node of the loop latch nodes and form the trace with
+   the join block and its predecessor.  */

I'm having a hard time parsing this comment. Please try to rewrite it tobe clearer.

I suspect this routine is more complicated than it needs to be becauseof how you're searching for our subgraphs.

+
+static int
+find_trace_loop_latch_same_as_join_blk (basic_block bb,
+                                        basic_block *trace)
+{
+  vec<basic_block> bbs;
+  basic_block bb1;
+  unsigned int i;
+  edge_iterator ei;
+  edge e1;
+  bool found = false;
+  int n = 0;
+
+  bbs = get_all_dominated_blocks (CDI_DOMINATORS,
+                                  bb );
+  FOR_EACH_VEC_ELT (bbs, i, bb1)
+  {
+    FOR_EACH_EDGE (e1, ei, bb->succs)
+    {
+      if ( bb1 == e1->dest)
+        {
+          found = true;
+        }
+    }
+
+    if (!found && bb1 != bb)
+      {
+        found = false;
+        FOR_EACH_EDGE (e1, ei, bb1->succs)
+        {
+          if (e1->flags & (EDGE_DFS_BACK))
+            {
+              trace[1] = e1->src;
+              n++;
+              found = true;
+            }
+        }

It seems to me all this can be changed to look for the backedges via theloop tree.

+
+        if (found && EDGE_COUNT(bb1->preds) == 2)

Space between EDGE_COUNT and the open paren.

You'd want to keep this test, then do some kind of dominance test likewe talked about earlier in the discussion of this patch.


Jeff

+          {
+            FOR_EACH_EDGE (e1, ei, bb1->preds)
+            {
+              if (single_succ_p(e1->src) &&
+                  single_succ_edge (e1->src)->flags & EDGE_FALLTHRU)

And you'd keep these tests in some form.

+
+/* This function performs the feasibility tests for path splitting
+   to perform. Return false if the feasibility for path splitting
+   is not done and returns true if the feasbility for path splitting
+   is done. Following feasibility tests are performed.
+
+   1. Return false if the join block has rhs casting for assign
+      gimple statements.
+   2. If the number of phis is greater than 1 or the phi node in
+      the join block has virtual operand return false.
+   3. Return true if the number of sequential statements is
+      greater than 2.
+   4. If the phi result in the phi node of the join block is not
+      used inside the same join block return false.
+   7. Otherwise returns true.  */

These seem totally arbitrary. What's the reason behind each of theserestrictions? None should be a correctness requirement AFAICT. Others(like the number of statements) should probably be conditionalized onsize vs speed optimizations.




 The join

+   same as the source of the loop latch node is identified along
+   with their predecessors.

I couldn't parse this sentence.



Based on the feasibility tests for

+   path splitting the path splitting is performed by adding the
+   join blocks into the predecessors after propagating the phi
+   result with the corresponding phi arguments for the predecessors.
+   The  tree-cfg-cleanup will merge the blocks in the predecessors
+   path and the update-ssa will update the ssa representation after
+   the path splitting is performed.  */

It would probably help to show a visual representation of what'shappening. ie, show a little CFG before and after.

+
+static bool
+perform_path_splitting ()
+{
+  bool changed = false;
+  basic_block trace[2] = {NULL,NULL};
+  basic_block bb;
+  auto_vec<fibonacci_node<long, basic_block_def>*> blocks;
+  blocks.safe_grow_cleared (last_basic_block_for_fn (cfun));
+  fibonacci_heap<long, basic_block_def> heap (LONG_MIN);
+
+  initialize_original_copy_tables();
+  calculate_dominance_info (CDI_DOMINATORS);

+
+  FOR_EACH_BB_FN (bb, cfun)
+  {
+    if (!ignore_bb_p (bb))
+      blocks[bb->index] = heap.insert (-bb->frequency, bb);
+  }
+
+  while (!heap.empty())

Wouldn't it be easier to walk the loop tree? We've got a fairlyspecific subgraph that we're looking for. Why not walk the loop tree,using the latch as a point to search backwards for the shape of thesubgraph we're interested in transforming?

That would seem to be a lot less expensive than looking at every blocklooking for the start of the subgraphs we're interested in. That'd alsoseem to eliminate all the fibheap stuff.

+
+static unsigned int
+execute_path_split (void)
+{
+  bool changed;
+
+  if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1)
+      return 0;
+
+  mark_dfs_back_edges ();

ISTM you could check the return value from mark_dfs_back_edges and donothing if no back edges were found.

+
+static bool
+gate_path_split(void)
+{
+  return flag_tree_path_split != 0;
+}
+
+namespace {
+
+const pass_data pass_data_path_split =
+{
+  GIMPLE_PASS, /* type */
+  "path_split", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_TREE_PATH_SPLIT, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_update_ssa, /* todo_flags_finish */

It seems to me that you're probably missing some properties_required.You depend on a proper CFG and SSA graph, so at the least PROP_cfg andPROP_ssa.

Presumably you don't set TODO_cleanup_cfg so that you can avoid the cfgcleanup if nothing gets changed. Is there some reason you don't do thesame for the SSA graph update?


Jeff

Follow-Ups:
- RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation
  - From: Ajit Kumar Agarwal

References:
- RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation
  - From: Ajit Kumar Agarwal

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]