[PATCH 8/9] shrink-wrap: shrink-wrapping for separate concerns

Segher Boessenkool segher@kernel.crashing.org
Mon Jul 18 16:34:00 GMT 2016


Hi Bernd,

Thanks for the review.


On Fri, Jul 15, 2016 at 02:42:24PM +0200, Bernd Schmidt wrote:
> I still have misgivings about all the changes needed to the following 
> passes, but I guess there's no choice but to live with it. So, I'm 
> trying to look at this patch, but I'm finding it fairly impenetrable and 
> underdocumented.

I'll add some more comments with a fresh eye.

> >+  /* The concerns for which we want a prologue placed on this BB.  */
> >+  sbitmap pro_concern;
> >+
> >+  /* The concerns for which we placed code at the start of the BB.  */
> >+  sbitmap head_concern;
> 
> What's the difference?

Concerns in head_concern have the prologue code placed at the start
of the bb; concerns in pro_concern have that code placed before the
existing code in this bb, but after the code in any predecessor bb.
It will be inserted on an edge, unless it can be a head_concern or a
tail_concern.

head_concern and tail_concern reduce code size, in the (quite frequent)
cases where cross jumping does a sub-par job.  Originally I didn't have
these; it seems I didn't document them well enough.

> >+  /* The frequency of executing the prologue for this BB and all BBs
> >+     dominated by it.  */
> >+  gcov_type cost;
> 
> Is this frequency consideration the only thing that attempts to prevent 
> placing prologue insns into loops?

Yes.  The algorithm makes sure the prologues are executed as infrequently
as possible.  If a block that would get a prologue has the same frequency
as a predecessor does, and that predecessor always has that first block as
eventual successor, the prologue is moved to the earlier block (this
handles the case where both have a frequency of zero, and other cases
where the range of freq is too limited).

> >+/* Destroy the pass-specific data.  */
> >+static void
> >+fini_separate_shrink_wrap (void)
> >+{
> >+  basic_block bb;
> >+  FOR_ALL_BB_FN (bb, cfun)
> >+    if (bb->aux)
> >+      {
> >+	sbitmap_free (SW (bb)->has_concern);
> >+	sbitmap_free (SW (bb)->pro_concern);
> >+	sbitmap_free (SW (bb)->head_concern);
> >+	sbitmap_free (SW (bb)->tail_concern);
> >+	free (bb->aux);
> >+	bb->aux = 0;
> >+      }
> >+}
> 
> Almost makes me want to ask for an sbitmap variant allocated on obstacks.

Heh, yes.  I'll have a look.

> >+      /* If this block does have the concern itself, or it is cheaper to
> >+         put the prologue here than in all the descendants that need it,
> >+	 mark it so.  If it is the same cost, put it here if there is no
> >+	 block reachable from this block that does not need the prologue.
> >+	 The actual test is a bit more stringent but catches most cases.  */
> 
> There's some oddness here with the leading whitespace.

Will fix.

> >+/* Mark HAS_CONCERN for every block dominated by at least one block with
> >+   PRO_CONCERN set, starting at HEAD.  */
> 
> I see a lot of code dealing with the placement of prologue 
> parts/concerns/components, but very little dealing with how to place 
> epilogues, leading me to wonder whether we could do better wrt the 
> latter. Shouldn't there be some mirror symmetry, i.e. 
> spread_concerns_backwards?

That is unfortunately harder to do (the "global" prologue block dominates
everywhere we could put a prologue component, but this is not true for
epilogues -- there can be more exits).

It is also true the epilogues already are sort of optimal in execution
cost: the epilogues are executed at most as often as the prologues,
which are placed optimally by construction.  The win from placing the
epilogues better is from infinite loops and non-returning abnormal
exits; but also you can get somewhat smaller code.

So I left this as a future improvement.

> >+    {
> >+      if (first_visit)
> >+	{
> >+	  bitmap_ior (SW (bb)->has_concern, SW (bb)->pro_concern, concern);
> >+
> >+	  if (first_dom_son (CDI_DOMINATORS, bb))
> >+	    {
> >+	      concern = SW (bb)->has_concern;
> >+	      bb = first_dom_son (CDI_DOMINATORS, bb);
> >+	      continue;
> >+	    }
> 
> Calling first_dom_son twice with the same args?

I thought it was more readable this way.  It's two derefs and an add or
two.  Maybe we should make it an inline function?

> More importantly, this 
> first_visit business seems very confusing. I'd try to find a way to 
> merge this if with the places that set first_visit to true.

That will break the early-out optimisation I think?  I'll have to look
again.  All the loops here are O(n) (with n the number of edges, or
blocks); but the place_prologues loop is called once for every component,
as well.  So the early-out helps quite a lot here.

> Also - 
> instead of having a "continue;" here it seems the code inside the if 
> represents an inner loop that should be written explicitly. There are 
> two loops with such a structure.

This "loop" is just a non-recursive tree traversal.  The complicated
part is doing the early-out at just the right time.

I'll document it better.

> >+/* If we cannot handle placing some concern's prologues or epilogues where
> >+   we decided we should place them, unmark that concern in CONCERNS so
> >+   that it is not wrapped separately.  */
> >+static void
> >+disqualify_problematic_concerns (sbitmap concerns)
> >+{
> >+  sbitmap pro = sbitmap_alloc (SBITMAP_SIZE (concerns));
> >+  sbitmap epi = sbitmap_alloc (SBITMAP_SIZE (concerns));
> >+
> >+  basic_block bb;
> >+  FOR_EACH_BB_FN (bb, cfun)
> >+    {
> >+      edge e;
> >+      edge_iterator ei;
> >+      FOR_EACH_EDGE (e, ei, bb->succs)
> >+	{
> >+	  bitmap_and_compl (epi, SW (e->src)->has_concern,
> >+			    SW (e->dest)->has_concern);
> >+	  bitmap_and_compl (pro, SW (e->dest)->has_concern,
> >+			    SW (e->src)->has_concern);
> 
> What is the purpose of this?

Of the and_compl's?  epi is those components that need an epilogue on
this edge, pro is those that need a prologue on this edge.  I.e. epi
is those components that are at src but not at dest, and pro is the
other way around.

> >+/* Place code for prologues and epilogues for CONCERNS where we can put
> >+   that code at the start of basic blocks.  */
> >+static void
> >+do_common_heads_for_concerns (sbitmap concerns)
> 
> The function name should probably have some combination of the strings 
> emit_ and _at or _into to make it clear what it's doing. This and the 
> following function have some logical operations on the bitmaps which are 
> not explained anywhere. In general a much better overview of the 
> intended operation of this pass is needed.

It's hard to think of good names.  I'll try harder.

> >+	{
> >+	  bitmap_and_compl (epi, SW (e->src)->has_concern,
> >+			    SW (e->dest)->has_concern);
> >+	  bitmap_and_compl (pro, SW (e->dest)->has_concern,
> >+			    SW (e->src)->has_concern);

epi/pro are those concerns that need an epilogue resp. prologue ...

> >+	  bitmap_and (epi, epi, concerns);
> >+	  bitmap_and (pro, pro, concerns);

... but only handle those concerns we still consider (i.e. those that
are not disqualified one way or another) ...

> >+	  bitmap_and_compl (epi, epi, SW (e->dest)->head_concern);
> >+	  bitmap_and_compl (pro, pro, SW (e->dest)->head_concern);

... don't put the *logues we already put in a head on the edge as well ...

> >+	  bitmap_and_compl (epi, epi, SW (e->src)->tail_concern);
> >+	  bitmap_and_compl (pro, pro, SW (e->src)->tail_concern);

... and likewise for the tail.


Segher



More information about the Gcc-patches mailing list