This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Enable multiple duplicated blocks on threading path



So when we first started looking at the FSA/FSM optimization for coremark, I speculated that it could be done in the jump threader and that by doing so we'd see benefits in more general codes.

This patch enables the code to allow multiple duplicated blocks on a threading path. It won't catch the coremark case as-is because of the loop issues (which I'll be returning to immediately).

We can measure the benefits quite easily using valgrind/cachegrind.

I've used the trunk compiler with and without this patch to build gcc-4.7.3 (no special options, so it includes checking). I then run .i files through that 4.7.3 compiler under the watchful eye of valgrind.

Valgrind gives me nice data about the number of dynamic branches, # instructions and the like. Some of you may recall similar tests in 2011 :-)

This patch allows GCC to eliminate .34% of the dynamic conditional branches and .26% of the total number of dynamic instructions. Furthermore, if you compare the total number of dynamic branches eliminated vs total number of instructions eliminated, you'd find that for each dynamic branch eliminated, we also eliminate 3.3 other instructions.

To compare, the 2011 testing saw a .63% reduction in dynamic branches and a .50% reduction in total dynamic instructions eliminated. We eliminated 3.4 other instructions for each dynamic branch eliminated.

So this picks up fewer cases that the 2011 changes. But it's still significant. The secondary effects are consistent, which isn't a great surprise I suppose.

There's still some obvious improvements that can be made here, but this code is far enough along that it should go in now so that focus can move to the loop issues to enable the FSA/FSM coremark optimization.



Bootstrapped and regression tested on x86_64-unknown-linux-gnu. Installed on the trunk.

	* tree-ssa-threadupdate.c: Fix trailing whitespace.
	* tree-ssa-threadupdate.h: Likewise.

diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index afd7ac4..1b7c73d 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -119,7 +119,7 @@ struct redirection_data : typed_free_remove<redirection_data>
      and wire up its single remaining outgoing edge to the thread path.
 
      The other is a joiner block where we leave the control statement
-     in place, but wire one of the outgoing edges to a thread path. 
+     in place, but wire one of the outgoing edges to a thread path.
 
      In theory we could have multiple block duplicates in a jump
      threading path, but I haven't tried that.
@@ -470,7 +470,7 @@ ssa_fix_duplicate_block_edges (struct redirection_data *rd,
   vec<jump_thread_edge *> *path = THREAD_PATH (e);
 
   for (unsigned int count = 0, i = 1; i < path->length (); i++)
-    { 
+    {
       /* If we were threading through an joiner block, then we want
 	 to keep its control statement and redirect an outgoing edge.
 	 Else we want to remove the control statement & edges, then create
@@ -549,10 +549,10 @@ ssa_create_duplicates (struct redirection_data **slot,
   struct redirection_data *rd = *slot;
 
   /* The second duplicated block in a jump threading path is specific
-     to the path.  So it gets stored in RD rather than in LOCAL_DATA. 
+     to the path.  So it gets stored in RD rather than in LOCAL_DATA.
 	
      Each time we're called, we have to look through the path and see
-     if a second block needs to be duplicated. 
+     if a second block needs to be duplicated.
 
      Note the search starts with the third edge on the path.  The first
      edge is the incoming edge, the second edge always has its source
@@ -567,7 +567,7 @@ ssa_create_duplicates (struct redirection_data **slot,
 	  break;
 	}
     }
-  
+
   /* Create a template block if we have not done so already.  Otherwise
      use the template to create a new block.  */
   if (local_info->template_block == NULL)
@@ -732,7 +732,7 @@ redirection_block_p (basic_block bb)
    the appropriate duplicate of BB.
 
    If NOLOOP_ONLY is true, we only perform the threading as long as it
-   does not affect the structure of the loops in a nontrivial way. 
+   does not affect the structure of the loops in a nontrivial way.
 
    If JOINERS is true, then thread through joiner blocks as well.  */
 
@@ -892,7 +892,7 @@ thread_block_1 (basic_block bb, bool noloop_only, bool joiners)
    By doing things this way we can be as aggressive as possible and
    not worry that copying a joiner block will create a jump threading
    opportunity.  */
-  
+
 static bool
 thread_block (basic_block bb, bool noloop_only)
 {
@@ -1591,7 +1591,7 @@ thread_through_all_blocks (bool may_peel_loop_headers)
     }
 
   /* Assume we had a jump thread path which went from the latch to the exit
-     and a path which goes from outside to inside the same loop.  
+     and a path which goes from outside to inside the same loop.
 
      If the latch to exit was handled first, we will thread it and clear
      loop->header.
diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h
index 4617b9c..4950170 100644
--- a/gcc/tree-ssa-threadupdate.h
+++ b/gcc/tree-ssa-threadupdate.h
@@ -1,5 +1,5 @@
 /* Communication between registering jump thread requests and
-   updating the SSA/CFG for jump threading. 
+   updating the SSA/CFG for jump threading.
    Copyright (C) 2013 Free Software Foundation, Inc.
 
 This file is part of GCC.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]