This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] move superblock formation to Tree-SSA

From: Robert Kidd <rkidd at crhc dot uiuc dot edu>
To: Steven Bosscher <stevenb dot gcc at gmail dot com>
Cc: gcc-patches at gcc dot gnu dot org
Date: Wed, 3 Jan 2007 13:36:20 -0600
Subject: Re: [PATCH] move superblock formation to Tree-SSA
References: <7CA33963-EED3-4025-A2BA-1AA881F8F060@crhc.uiuc.edu> <200612271815.32780.steven.bosscher@gmail.com>

On Dec 27, 2006, at 11:15 AM, Steven Bosscher wrote:

Hi Bob,

On Tuesday 26 December 2006 18:15, Robert Kidd wrote:

This patch moves Superblock formation from RTL to the Tree-SSA
representation.  This patch does not cause a significant change in
performance in itself,


Can you quantify the performance changes with this patch, both of
the generated code and of compilation time?

I don't have precise numbers for the change in compilation time, but it seems to be negligible. I'll collect some more detailed numbers. I haven't yet run a rigorous spec run for this patch. I ran a spec run on a four processor Itanium, but I ran four instances in parallel to quickly test for regressions. The change for this run was in the noise, but this was admittedly a pretty noisy run. I'll run a more rigorous test.

What happened with the twolf regression you had before with this
patch?

The twolf regression was caused when the tracer pass duplicated loop headers as it formed Superblocks. I removed that code from this version of the patch. Loop header duplication should happen as a natural side effect of branch target expansion. IIRC, duplication is also done elsewhere in GCC, which may allow loop unrolling to process the Superblocks generated by this patch.

It may be helpful to schedule the pass after one round of CCP and
DCE.  The amount of dead code just after rewriting into SSA form is
sometimes very high.

That makes sense. I'll take a look.

but it is a first step to performing high
level structural optimizations.
Anything specific in mind, or in the works?

I have a branch target expansion pass in the works. This pass enlarges superblocks by targeting frequently executed CFG edges that flow from one superblock to another and duplicating the target.

-/* Return true if BB has been seen - it is connected to some trace
-   already.  */
+/* A bit BB->index is set if BB has already been seen, i.e. it is
+   connected to some trace already.  */
+bitmap bb_seen;


You really want an sbitmap here.  You'll be setting this for all basic
blocks IIUC, and bitmap is a potentially quadratic structure.  You can
use an initially oversized sbitmap instead, and when you have created
more new basic blocks than the number of extra bits you asked for, you
can sbitmap_resize the thing.

I'm assuming you always immediately mark newly created basic blocks as
visited.  If not, you have to watch that you don't query bits beyond
the size of the sbitmap.

It is the case that I mark duplicated blocks as visited. I'll look into sbitmap.

- if (seen (bb2) || (e->flags & (EDGE_DFS_BACK | EDGE_COMPLEX)) - || find_best_successor (bb2) != e) + if (bb_seen_p (bb2) || (e->flags & (EDGE_DFS_BACK | EDGE_COMPLEX)) + || find_best_successor (bb2) != e)
I don't see a change between the two find_best_successor lines.
Accidental whitespace change there???

Yep, I'll fix that.

+#ifdef ENABLE_CHECKING
+      gcc_assert (!bb_seen_p (bb));
+#endif

No need for the ENABLE_CHECKING test.

 /* Main entry point to this file.  FLAGS is the set of flags to pass
    to cfg_layout_initialize().  */

-void
-tracer (unsigned int flags)
+static unsigned int
+tracer (void)
 {
   if (n_basic_blocks <= NUM_FIXED_BLOCKS + 1)
-    return;
+    return 0;

-  cfg_layout_initialize (flags);
   mark_dfs_back_edges ();


Need to update the comment before this function.  It has no FLAGS
argument so it does not pass it to cfg_layout_initialize(), in fact
it doesn't even have to call that function anymore...

+ /* FIXME: We really only need to do this when we know tail duplication + has altered the CFG. */ + free_dominance_info (CDI_DOMINATORS);


Alternatively, you could update dominance information on the fly.  I
have no idea whether this difficult with the CFG transformations that
you do.

+/* The container for a group of structural transformation and
+   optimization */

Nit: Comments in GCC should end with a period followed by two spaces.

-  0,                                    /* todo_flags_start */
-  TODO_dump_func,                       /* todo_flags_finish */
-  'T'                                   /* letter */
+  TODO_dump_func,                       /* todo_flags_start */
+  TODO_dump_func
+    | TODO_update_ssa
+    | TODO_verify_ssa,                  /* todo_flags_finish */
+  0                                     /* letter */
 };


Not sure if we want to dump before and after the transformations.
Usually you can just look at the dump of the previous pass.  I don't
think we have many passes that dump before and after.  Following the
common practice is usually a good idea ;-)

Indeed.

Regarding Honza's comments about the placement in the pass queue:
Although it's hard to tell without numbers, I don't think the
placement of this pass currently matters very much, because:

1. the heuristics to pick the code to duplicate is pretty arbitrary
2. by running tracer earlier we expose optimization opportunities
   earlier.

We should look at the numbers, but I wouldn't expect much of a
difference.

I would be surprised if forming Superblocks at both the Tree and RTL levels gained much speed over Tree only. Inlining, loop unrolling, and Superblock formation may add error to the profile information to the point where it becomes difficult to do profile based optimizations that late.

Thanks
Robert

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]