[PATCH] Restrict LOOP_ALIGN to loop headers only.

Richard Biener richard.guenther@gmail.com
Tue Jul 9 10:39:00 GMT 2019


On Tue, Jul 9, 2019 at 12:23 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Tue, Jul 9, 2019 at 12:22 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Tue, Jul 9, 2019 at 11:56 AM Jan Hubicka <hubicka@ucw.cz> wrote:
> > >
> > > > Hi.
> > > >
> > > > I'm suggesting to restrict LOOP_ALIGN to only loop headers. That are the
> > > > basic blocks for which it makes the biggest sense. I quite some binary
> > > > size reductions on SPEC2006 and SPEC2017. Speed numbers are also slightly
> > > > positive.
> > > >
> > > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> > > >
> > > > Ready to be installed?
> > > The original idea of distinction between jump alignment and loop
> > > alignment was that they have two basic meanings:
> > >  1) jump alignment is there to avoid jumping just to the end of decode
> > >  window (if the window is aligned) so CPU will get stuck after reaching
> > >  the jump and also to possibly reduce code cache polution by populating
> > >  by code that is not executed
> > >  2) loop alignment aims to fit loop in as few cache windows as possible
> > >
> > > Now if you have loop laid in a way that header of loop is not first
> > > basic block, 2) IMO still apply.  I.e.
> > >
> > >                 jump loop
> > > :loopback
> > >                 loop body
> > > :loop
> > >                 if cond jump to loopback
> > >
> > > So dropping loop alignment for those does not seem to make much sense
> > > from high level.  We may want to have differnt alignment for loops
> > > starting by header and loops starting in the middle, but I still liked
> > > more your patch which did bundles for loops.
> > >
> > > modern x86 chips are not very good testing targets on it.  I guess
> > > generic changes to alignment needs to be tested on other chips too.
> > >
> > > Honza
> > > > Thanks,
> > > > Martin
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 2019-07-09  Martin Liska  <mliska@suse.cz>
> > > >
> > > >       * final.c (compute_alignments): Apply the LOOP_ALIGN only
> > > >       to basic blocks that all loop headers.
> > > > ---
> > > >  gcc/final.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > >
> > > >
> > >
> > > > diff --git a/gcc/final.c b/gcc/final.c
> > > > index fefc4874b24..ce2678da988 100644
> > > > --- a/gcc/final.c
> > > > +++ b/gcc/final.c
> > > > @@ -739,6 +739,7 @@ compute_alignments (void)
> > > >        if (has_fallthru
> > > >         && !(single_succ_p (bb)
> > > >              && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun))
> > > > +       && bb->loop_father->header == bb
> >
> > I agree that the above is the wrong condition - but I'm not sure we
> > only end up using LOOP_ALIGN for blocks reached by a DFS_BACK
> > edge.  Note that DFS_BACK would have to be applied considering
> > the current CFG layout, simply doing mark_dfs_back_edges doesn't
> > work (we're in CFG layout mode here, no?).
>
> So a "backedge" in this sense would be e->dest->index < e->src->index.
> No?

To me the following would make sense.

Index: gcc/final.c
===================================================================
--- gcc/final.c (revision 273294)
+++ gcc/final.c (working copy)
@@ -669,6 +669,7 @@ compute_alignments (void)
     {
       rtx_insn *label = BB_HEAD (bb);
       bool has_fallthru = 0;
+      bool has_backedge = 0;
       edge e;
       edge_iterator ei;

@@ -693,6 +694,8 @@ compute_alignments (void)
            has_fallthru = 1, fallthru_count += e->count ();
          else
            branch_count += e->count ();
+         if (e->src->index > bb->index)
+           has_backedge = 1;
        }
       if (dump_file)
        {
@@ -736,7 +739,7 @@ compute_alignments (void)
        }
       /* In case block is frequent and reached mostly by non-fallthru edge,
         align it.  It is most likely a first block of loop.  */
-      if (has_fallthru
+      if (has_backedge
          && !(single_succ_p (bb)
               && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun))
          && optimize_bb_for_speed_p (bb)



More information about the Gcc-patches mailing list