This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[patch] jump threading multiple paths that start from the same BB


[One more time, but without rejected HTML mail, because apparently this is my first post to gcc-patches *ever* ;-)].

Howdy!

While poking around in the backwards threader I noticed that we bail if we have already seen a starting BB.

      /* Do not jump-thread twice from the same block.  */
      if (bitmap_bit_p (threaded_blocks, entry->src->index)

This limitation discards paths that are sub-paths of paths that have already been threaded.

The following patch scans the remaining to-be-threaded paths to identify if any of them start from the same point, and are thus sub-paths of the just-threaded path. By removing the common prefix of blocks in upcoming threadable paths, and then rewiring first non-common block appropriately, we expose new threading opportunities, since we are no longer starting from the same BB. We also simplify the would-be threaded paths, because we don't duplicate already duplicated paths.

This sounds confusing, but the documentation for the entry point to my patch (adjust_paths_after_duplication) shows an actual example:

+/* After an FSM path has been jump threaded, adjust the remaining FSM
+   paths that are subsets of this path, so these paths can be safely
+   threaded within the context of the new threaded path.
+
+   For example, suppose we have just threaded:
+
+   5 -> 6 -> 7 -> 8 -> 12      =>      5 -> 6' -> 7' -> 8' -> 12'
+
+   And we have an upcoming threading candidate:
+   5 -> 6 -> 7 -> 8 -> 15 -> 20
+
+   This function adjusts the upcoming path into:
+   8' -> 15 -> 20

Notice that we will no longer have two paths that start from the same BB. One will start with bb5, while the adjusted path will start with bb8'. With this we kill two birds-- we are able to thread more paths, and these paths will avoid duplicating a whole mess of things that have already been threaded.

The attached patch is a subset of some upcoming work that can live on its own. It bootstraps and regtests. Also, by running it on a handful of .ii files, I can see that we are able to thread sub-paths that we previously dropped on the floor. More is better, right? :)

To test this, I stole Jeff's method of using cachegrind to benchmark instruction counts and conditional branches (https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02434.html).

Basically, I bootstrapped two compilers, with and without improvements, and used each to build a stage1 trunk. Each of these stage1-trunks were used on 246 .ii GCC files I have lying around from a bootstrap some random point last year. I used no special flags on builds apart from --enable-languages=c,c++.

Although I would've wished a larger improvement, this works comes for free, as it's just a subset of other work I'm hacking on.

Without further ado, here are my monumental, earth shattering improvements:

Conditional branches
   Without patch: 411846839709
   With    patch: 411831330316
        %changed: -0.0037660%

Number of instructions
   Without patch: 2271844650634
   With    patch: 2271761153578
        %changed: -0.0036754%


OK for trunk?
Aldy

p.s. There is a lot of debugging/dumping code in my patch, which I can gladly remove if/when approved. It helps keep my head straight while looking at this spaghetti :).
gcc/

	* tree-ssa-threadupdate.c (mark_threaded_blocks): Avoid
	dereferencing path[] beyond its length.
	(debug_path): New.
	(debug_all_paths): New.
	(rewire_first_differing_edge): New.
	(adjust_paths_after_duplication): New.
	(duplicate_thread_path): Call adjust_paths_after_duplication.
	Add new argument.
	(thread_through_all_blocks): Add new argument to
	duplicate_thread_path.

diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 1dab0f1fab4..53ac7181b4b 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -1763,7 +1763,8 @@ mark_threaded_blocks (bitmap threaded_blocks)
     {
       vec<jump_thread_edge *> *path = paths[i];
 
-      if ((*path)[1]->type != EDGE_COPY_SRC_JOINER_BLOCK)
+      if (path->length () > 1
+	  && (*path)[1]->type != EDGE_COPY_SRC_JOINER_BLOCK)
 	{
 	  edge e = (*path)[0]->e;
 	  e->aux = (void *)path;
@@ -1783,7 +1784,8 @@ mark_threaded_blocks (bitmap threaded_blocks)
     {
       vec<jump_thread_edge *> *path = paths[i];
 
-      if ((*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK)
+      if (path->length () > 1
+	  && (*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK)
 	{
 	  /* Attach the path to the starting edge if none is yet recorded.  */
 	  if ((*path)[0]->e->aux == NULL)
@@ -1812,7 +1814,8 @@ mark_threaded_blocks (bitmap threaded_blocks)
       vec<jump_thread_edge *> *path = paths[i];
       edge e = (*path)[0]->e;
 
-      if ((*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK && e->aux == path)
+      if (path->length () > 1
+	  && (*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK && e->aux == path)
 	{
 	  unsigned int j;
 	  for (j = 1; j < path->length (); j++)
@@ -1978,6 +1981,169 @@ bb_in_bbs (basic_block bb, basic_block *bbs, int n)
   return false;
 }
 
+DEBUG_FUNCTION void
+debug_path (FILE *dump_file, int pathno)
+{
+  vec<jump_thread_edge *> *p = paths[pathno];
+  fprintf (dump_file, "path: ");
+  for (unsigned i = 0; i < p->length (); ++i)
+    fprintf (dump_file, "%d -> %d, ",
+	     (*p)[i]->e->src->index, (*p)[i]->e->dest->index);
+  fprintf (dump_file, "\n");
+}
+
+DEBUG_FUNCTION void
+debug_all_paths ()
+{
+  for (unsigned i = 0; i < paths.length (); ++i)
+    debug_path (stderr, i);
+}
+
+/* Rewire a jump_thread_edge so that the source block is now a
+   threaded source block.
+
+   PATH_NUM is an index into the global path table PATHS.
+   EDGE_NUM is the jump thread edge number into said path.
+
+   Returns TRUE if we were able to successfully rewire the edge.  */
+
+static bool
+rewire_first_differing_edge (unsigned path_num, unsigned edge_num)
+{
+  vec<jump_thread_edge *> *path = paths[path_num];
+  edge &e = (*path)[edge_num]->e;
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "rewiring edge candidate: %d -> %d\n",
+	     e->src->index, e->dest->index);
+  basic_block src_copy = get_bb_copy (e->src);
+  if (src_copy == NULL)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "ignoring candidate: there is no src COPY\n");
+      return false;
+    }
+  edge new_edge = find_edge (src_copy, e->dest);
+  /* If the previously threaded paths created a flow graph where we
+     can no longer figure out where to go, give up.  */
+  if (new_edge == NULL)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "ignoring candidate: we lost our way\n");
+      return false;
+    }
+  e = new_edge;
+  return true;
+}
+
+/* After an FSM path has been jump threaded, adjust the remaining FSM
+   paths that are subsets of this path, so these paths can be safely
+   threaded within the context of the new threaded path.
+
+   For example, suppose we have just threaded:
+
+   5 -> 6 -> 7 -> 8 -> 12	=>	5 -> 6' -> 7' -> 8' -> 12'
+
+   And we have an upcoming threading candidate:
+   5 -> 6 -> 7 -> 8 -> 15 -> 20
+
+   This function adjusts the upcoming path into:
+   8' -> 15 -> 20
+
+   CURR_PATH_NUM is an index into the global paths table.  It
+   specifies the path that was just threaded.  */
+
+static void
+adjust_paths_after_duplication (unsigned curr_path_num)
+{
+  vec<jump_thread_edge *> *curr_path = paths[curr_path_num];
+  gcc_assert ((*curr_path)[0]->type == EDGE_FSM_THREAD);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "just threaded: ");
+      debug_path (dump_file, curr_path_num);
+    }
+
+  /* Iterate through all the other paths and adjust them.  */
+  for (unsigned cand_path_num = 0; cand_path_num < paths.length (); )
+    {
+      if (cand_path_num == curr_path_num)
+	{
+	  ++cand_path_num;
+	  continue;
+	}
+      /* Make sure the candidate to adjust starts with the same path
+	 as the recently threaded path and is an FSM thread.  */
+      vec<jump_thread_edge *> *cand_path = paths[cand_path_num];
+      if ((*cand_path)[0]->type != EDGE_FSM_THREAD
+	  || (*cand_path)[0]->e != (*curr_path)[0]->e)
+	{
+	  ++cand_path_num;
+	  continue;
+	}
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "adjusting candidate: ");
+	  debug_path (dump_file, cand_path_num);
+	}
+
+      /* Chop off from the candidate path any prefix it shares with
+	 the recently threaded path.  */
+      unsigned minlength = MIN (curr_path->length (), cand_path->length ());
+      unsigned j;
+      for (j = 0; j < minlength; ++j)
+	{
+	  edge cand_edge = (*cand_path)[j]->e;
+	  edge curr_edge = (*curr_path)[j]->e;
+
+	  /* Once the prefix no longer matches, adjust the first
+	     non-matching edge to point from an adjusted edge to
+	     wherever it was going.  */
+	  if (cand_edge != curr_edge)
+	    {
+	      gcc_assert (cand_edge->src == curr_edge->src);
+	      if (!rewire_first_differing_edge (cand_path_num, j))
+		goto remove_candidate_from_list;
+	      break;
+	    }
+	}
+      if (j == minlength)
+	{
+	  /* If we consumed the max subgraph we could look at, and
+	     still didn't find any different edges, it's the
+	     last edge after MINLENGTH.  */
+	  if (cand_path->length () > minlength)
+	    {
+	      if (!rewire_first_differing_edge (cand_path_num, j))
+		goto remove_candidate_from_list;
+	    }
+	  else if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, "adjusting first edge after MINLENGTH.\n");
+	}
+      if (j > 0)
+	{
+	  /* If we are removing everything, delete the entire candidate.  */
+	  if (j == cand_path->length ())
+	    {
+	    remove_candidate_from_list:
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "adjusted candidate: [EMPTY]\n");
+	      delete_jump_thread_path (cand_path);
+	      paths.unordered_remove (cand_path_num);
+	      continue;
+	    }
+	  /* Otherwise, just remove the redundant sub-path.  */
+	  cand_path->block_remove (0, j);
+	}
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "adjusted candidate: ");
+	  debug_path (dump_file, cand_path_num);
+	}
+      ++cand_path_num;
+    }
+}
+
 /* Duplicates a jump-thread path of N_REGION basic blocks.
    The ENTRY edge is redirected to the duplicate of the region.
 
@@ -1985,11 +2151,14 @@ bb_in_bbs (basic_block bb, basic_block *bbs, int n)
    and create a single fallthru edge pointing to the same destination as the
    EXIT edge.
 
+   CURRENT_PATH_NO is an index into the global paths[] table
+   specifying the jump-thread path.
+
    Returns false if it is unable to copy the region, true otherwise.  */
 
 static bool
 duplicate_thread_path (edge entry, edge exit, basic_block *region,
-		       unsigned n_region)
+		       unsigned n_region, unsigned current_path_no)
 {
   unsigned i;
   struct loop *loop = entry->dest->loop_father;
@@ -2000,6 +2169,12 @@ duplicate_thread_path (edge entry, edge exit, basic_block *region,
   if (!can_copy_bbs_p (region, n_region))
     return false;
 
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "\nabout to thread: ");
+      debug_path (dump_file, current_path_no);
+    }
+
   /* Some sanity checking.  Note that we do not check for all possible
      missuses of the functions.  I.e. if you ask to copy something weird,
      it will work, but the state of structures probably will not be
@@ -2128,6 +2303,8 @@ duplicate_thread_path (edge entry, edge exit, basic_block *region,
 
   free (region_copy);
 
+  adjust_paths_after_duplication (current_path_no);
+
   free_original_copy_tables ();
   return true;
 }
@@ -2251,7 +2428,7 @@ thread_through_all_blocks (bool may_peel_loop_headers)
       for (unsigned int j = 0; j < len - 1; j++)
 	region[j] = (*path)[j]->e->dest;
 
-      if (duplicate_thread_path (entry, exit, region, len - 1))
+      if (duplicate_thread_path (entry, exit, region, len - 1, i))
 	{
 	  /* We do not update dominance info.  */
 	  free_dominance_info (CDI_DOMINATORS);

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]