This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Gimple loop splitting


On 11/12/2015 09:52 AM, Michael Matz wrote:
Hello,

this new pass implements loop iteration space splitting for loops that
contain a conditional that's always true for one part of the iteration
space and false for the other, i.e. such situations:
FWIW, Ajit suggested the same transformation earlier this year. During that discussion Richi indicated that for hmmer this transformation would enable vectorization.


This transformation is in itself a good one but can also be an enabler for
the vectorizer.
Agreed.


  It does increase code size, when the loop body contains
also unconditional code (that one is duplicated), so we only transform hot
loops.
Probably ought to be disabled when we're not optimizing for speed as well.




  I'm a bit unsure of the placement of the new pass, or if it should
be an own pass at all.  Right now I've placed it after unswitching and
scev_cprop, before loop distribution.  Ideally I think all three, together
with loop fusion and an gimple unroller should be integrated into one loop
nest optimizer, alas, we aren't there yet.
Given its impact on the looping structure, I'd think early in the loop optimizer. Given the similarities with unswitching, I think before/after unswitching is a natural first cut. We can always iterate if it looks like putting it elsewhere would make sense.



I've regstrapped this pass enabled with -O2 on x86-64-linux, without
regressions.  I've also checked cpu2006 (the non-fortran part) for
correctness, not yet for performance.  In the end it should probably only
be enabled for -O3+ (although if the whole loop body is conditional it
makes sense to also have it with -O2 because code growth is very small
then).
Very curious on the performance side, so if you could get some #s on that, it'd be greatly appreciated.

I'd be comfortable with this at -O2, but won't object if you'd prefer -O3.



So, okay for trunk?


Ciao,
Michael.
	* passes.def (pass_loop_split): Add.
	* timevar.def (TV_LOOP_SPLIT): Add.
	* tree-pass.h (make_pass_loop_split): Declare.
	* tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
	* tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
	cfganal.h, tree-chrec.h, tree-affine.h, tree-scalar-evolution.h,
	gimple-pretty-print.h, gimple-fold.h, gimplify-me.h.
	(split_at_bb_p, patch_loop_exit, find_or_create_guard_phi,
	split_loop, tree_ssa_split_loops,
	make_pass_loop_split): New functions.
	(pass_data_loop_split): New.
	(pass_loop_split): New.

testsuite/
	* gcc.dg/loop-split.c: New test.
Please clean up the #if 0/#if 1 code in the new tests. You might also want to clean out the TRACE stuff. Essentially the tests look like you just dropped in a test you'd been running by hand until now :-)

I don't see any negative tests -- ie tests that should not be split due to boundary conditions. Do you have any from development? If so it'd be good to have those too.


Index: tree-ssa-loop-manip.h
===================================================================
--- tree-ssa-loop-manip.h	(revision 229763)
+++ tree-ssa-loop-manip.h	(working copy)
@@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc

  extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
  		       bool, tree *, tree *);
+extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
+					    struct loop *);
  extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
  extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
  extern void verify_loop_closed_ssa (bool);
Index: tree-ssa-loop-unswitch.c
===================================================================
--- tree-ssa-loop-unswitch.c	(revision 229763)
+++ tree-ssa-loop-unswitch.c	(working copy)
Given the amount of new code, unless there's a strong need, I'd prefer this transformation to be implemented in its own file.



+
+/* Give an induction variable GUARD_IV, and its affine descriptor IV,
+   find the loop phi node in LOOP defining it directly, or create
+   such phi node.  Return that phi node.  */
+
+static gphi *
+find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv * /*iv*/)
+{
+  gimple *def = SSA_NAME_DEF_STMT (guard_iv);
+  gphi *phi;
+  if ((phi = dyn_cast <gphi *> (def))
+      && gimple_bb (phi) == loop->header)
+    return phi;
+
+  /* XXX Create the PHI instead.  */
+  return NULL;
So right now we just punt if we need to create the PHI? Does that happen with any kind of regularity in practice?


+}
+
+/* Checks if LOOP contains an conditional block whose condition
+   depends on which side in the iteration space it is, and if so
+   splits the iteration space into two loops.  Returns true if the
+   loop was split.  NITER must contain the iteration descriptor for the
+   single exit of LOOP.  */
+
+static bool
+split_loop (struct loop *loop, struct tree_niter_desc *niter)
This should probably be broken up a bit more.  It's loooong as-is.

Without looking at how much stuff would have to be passed around, diddling the exit edge of the first loop, phi updates for the 2nd loop, fix iteration space of 2nd loop, exit block fixup might be a good initial cut at breaking this down into something of manageable size. Not sure if the setup and initial versioning should be broken out or not.


+	initialize_original_copy_tables ();
+	basic_block cond_bb;
+	struct loop *floop = loop_version (loop, cond, &cond_bb,
+					   REG_BR_PROB_BASE, REG_BR_PROB_BASE,
+					   REG_BR_PROB_BASE, false);
+	gcc_assert (floop);
+	update_ssa (TODO_update_ssa);
+
+	/* Now diddle the exit edge of the first loop (floop->join in the
+	   above) to either go to the common exit (join) or to the second
+	   loop, depending on if there are still iterations left, or not.
+	   We split the floop exit edge and insert a copy of the
+	   original exit expression into the new block, that either
+	   skips the second loop or goes to it.  */
So after diddling, haven't we mucked up the dominator tree and the SSA graph? You're iterating over each PHI in two loop headers and fixing the SSA graph by hand AFAICT. But ISTM the dominator tree is still mucked up, right? I'm thinking specifically about the 2nd loop. Though perhaps it just works since after all your transformations it'll still be immediately dominated by the same block as before your transformations.

Overall I think this looks real good. THe biggest problem IMHO is breaking down that monster function a bit. I'm a bit concerned by the dominator tree state. Worst case is we have to rebuild the dominators before ensuring we're LCSSA form, and even that doesn't seem too bad. As I mentioned, it may actually be the case that we're OK on the dominator tree, kindof by accident more than design.


Jeff




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]