This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH] Modulo-scheduling improvements. Patch 2 of 2.

From: Zdenek Dvorak <rakdver at kam dot mff dot cuni dot cz>
To: Ayal Zaks <ZAKS at il dot ibm dot com>
Cc: Revital1 Eres <ERES at il dot ibm dot com>, abel at ispras dot ru, andrew_pinski at playstation dot sony dot com, dje at watson dot ibm dot com, gcc-patches at gcc dot gnu dot org, Mircea Namolaru <NAMOLARU at il dot ibm dot com>
Date: Sat, 18 Aug 2007 16:04:52 +0200
Subject: Re: [PATCH] Modulo-scheduling improvements. Patch 2 of 2.
References: <OFD551E32A.189B4CD3-ONC2257331.00526D01-C2257338.005EA442@LocalDomain> <OFED032723.E54FA73C-ONC2257338.00711120-C225733A.007F913F@il.ibm.com>
> Revital1 Eres/Haifa/IBM wrote on 15/08/2007 20:13:44:
> 
> > Hello,
> >
> > This is a re-submission of patch 2 of 2
> > (http://gcc.gnu.org/ml/gcc-patches/2007-01/msg01470.html).
> > It includes the comments received in -
> > (*) http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01508.html,
> > http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01582.html,
> > http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01747.html and
> > http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01674.html.
> >
> > (*)  I've missed the last two comments made by Ayal regarding the
> > formatting of the code; If the rest of the patch is OK I would ask to
> > submit those in a follow-up patch.
> >
> > This patch also contains a fix to an ICE reported by Andrey Belevantsev
> > while testing sms on ia64. The ICE caused by mishandling of subregs
> > that appears in lhs of single set insns.  It also appears on spu.
> > Such instructions can indicate that there is more than one def to a reg
> > and we choose to avoid those cases for now.  Relevant testcase was added
> > for this bug.
> >
> > The documentation that was added to modulo-sched.c regarding do-loop
> > was taken from http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01774.html.
> >
> > Thanks to Andrey for testing the current version of trunk with sms flags
> > on ia64.
> >
> > Also thanks to Zdenek for his help.
> >
> > This patch was bootstrapped and tested on PPC and x86_64; with no new
> > regressions, and make-check with and without the patch without new
> > regressions for the C testsuite on spu.
> >
> > OK for mainline?
> >
> 
> In addition to the changes requested by Andrey and Zdenek, here are my
> comments.
> 
> Please add a note some place (e.g. when deciding where to put the branch)
> to revisit the issue of scheduling the insns of the control part relative
> to the branch, now that there's more to this part than a single parallel
> branch-on-count insn.
> 
> Ayal.
> 
> 
> > Thanks,
> > Revital
> >
> > 2007-08-15  Mircea Namolaru  <namolaru@il.ibm.com>
> >             Vladimir Yanovsky  <yanov@il.ibm.com>
> >             Revital Eres  <eres@il.ibm.com>
> >
> >         * config/spu/spu.md: Recognize doloop pattern when -fmodulo-sched
> >         is set.
> >         * modulo-sched.c: Add documentation regarding do-loop.
> >         (doloop_register_get): Support for the new doloop pattern.
> >         (duplicate_insns_of_cycles): Do not duplicate the insn which
> >         changes count_reg as it is already adjusted.
> >         (generate_prolog_epilog): New argument to support the new
> >         loop pattern.
> >         (sms_schedule): Support for the new pattern and do not handle
> >         loops single sets insns with subreg in their lhs or when
> COUNT_REG
> >         has no other occurences in the loop until the decrement.
> >         * loop-doloop.c (doloop_optimize): Support for the a doloop
>                                                             ^^^
> 
> >         pattern.
> >         (doloop_condition_get): Gets an instruction instead of a pattern
> >         and change the return condition when the doloop pattern is
> >         not parallel.
> >         * ddg.c (create_ddg_dep_from_intra_loop_link): Handle only reg
> >         deps when considering to not create edges.
> >
> >         * gcc.dg/sms-1.c: New test.
> >
> 
> > Index: ddg.c
> > ===================================================================
> > --- ddg.c (revision 127390)
> > +++ ddg.c (working copy)
> > @@ -176,13 +176,16 @@
> >        rtx set;
> >
> >        set = single_set (dest_node->insn);
> > -      if (set)
> > +      /* TODO: Handle regsiters that REG_P is not true for them.  */
>                          ^^^^^^^^^                                , i.e.
> subregs.  */
> 
> 
> >
> > +      if (set && REG_P (SET_DEST (set)))
> >          {
> >            int regno = REGNO (SET_DEST (set));
> > -          struct df_ref *first_def =
> > -            df_bb_regno_first_def_find (g->bb, regno);
> > +          struct df_ref *first_def;
> >            struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g->bb);
> >
> > +          first_def = df_bb_regno_first_def_find (g->bb, regno);
> > +          gcc_assert (first_def);
> > +
> >            if (bitmap_bit_p (bb_info->gen, first_def->id))
> >              return;
> >          }
> > Index: modulo-sched.c
> > ===================================================================
> > --- modulo-sched.c (revision 127390)
> > +++ modulo-sched.c (working copy)
> > @@ -82,8 +82,20 @@
> >        perform modulo variable expansion for live ranges that span more
> than
> >        II cycles (i.e. use register copies to prevent a def from
> overwriting
> >        itself before reaching the use).
> > -*/
> >
> > +   SMS works with countable loops (1) whose control part can be easily
> > +   decoupled from the rest of the loop and (2) whose loop count can
> > +   be easily adjusted.  This is because we peel a constant number of
> > +   iterations into a prologue and epilogue for which we want to avoid
> > +   emitting the control part, and a kernel which is to iterate that
> > +   constant number of iterations less than the original loop.  So the
> > +   control part should be a set of insns clearly identified and having
> > +   its own iv, not otherwise used in the loop (at-least for now), which
> > +   initializes a register before the loop to the number of iterations.
> > +   Currently SMS relay on the do-loop pattern to recognize such loops.
>                     ^^^^^
>                     relies
> , where (1) the control part comprises of all insns defining and/or using a
> certain 'count' register and (2) the loop count can be adjusted by
> modifying this register prior to the loop.
> 
> > +   TODO: Relay on cfgloop analysis instead.  */
>             ^^^^^
>             Rely
> 
> > +
> > +
> >
> >  /* This page defines partial-schedule structures and functions for
> >     modulo scheduling.  */
> > @@ -182,10 +194,11 @@
> >  static void set_node_sched_params (ddg_ptr);
> >  static partial_schedule_ptr sms_schedule_by_order (ddg_ptr, int, int,
> int *);
> >  static void permute_partial_schedule (partial_schedule_ptr ps, rtx
> last);
> > -static void generate_prolog_epilog (partial_schedule_ptr ,struct loop *
> loop, rtx);
> > +static void generate_prolog_epilog (partial_schedule_ptr, struct loop
> *loop,
> > +                                    rtx, rtx);
> >  static void duplicate_insns_of_cycles (partial_schedule_ptr ps,
> >             int from_stage, int to_stage,
> > -           int is_prolog);
> > +           int is_prolog, rtx count_reg);
> >
> >  #define SCHED_ASAP(x) (((node_sched_params_ptr)(x)->aux.info)->asap)
> >  #define SCHED_TIME(x) (((node_sched_params_ptr)(x)->aux.info)->time)
> > @@ -268,13 +281,29 @@
> >  doloop_register_get (rtx insn ATTRIBUTE_UNUSED)
> >  {
> >  #ifdef HAVE_doloop_end
> > -  rtx pattern, reg, condition;
> > +  rtx reg, condition;
> >
> > -  if (! JUMP_P (insn))
> > +  /* The canonical doloop pattern we expect has one of the following
> > +     forms:
> > +
> > +     1)  (parallel [(set (pc) (if_then_else (condition)
> > +                                            (label_ref (label))
> > +                                            (pc)))
> > +                     (set (reg) (plus (reg) (const_int -1)))
> > +                     (additional clobbers and uses)])
> > +
> > +     2)  (set (reg) (plus (reg) (const_int -1))
> > +         (set (pc) (if_then_else (reg != 0)
> > +                                 (label_ref (label))
> > +                                 (pc))).  */
> > +
> > +  if (!JUMP_P (insn))
> >      return NULL_RTX;
> >
> > -  pattern = PATTERN (insn);
> > -  condition = doloop_condition_get (pattern);
> > +  if (!INSN_P (PREV_INSN (insn)))
> > +    return NULL_RTX;
> > +
> 
> Why do we need to check the insn preceding the (non-parallel) jump?  Looks
> like this check belongs inside doloop_condition_get.
> 
> 
> > +  condition = doloop_condition_get (insn);
> >    if (! condition)
> >      return NULL_RTX;
> >
> > @@ -401,6 +430,7 @@
> >     nreg_moves = ----------------------------------- + 1 - {
> dependence.
> >                              ii                          { 1 if not.
> >  */
> > +
> >  static struct undo_replace_buff_elem *
> >  generate_reg_moves (partial_schedule_ptr ps, bool rescan)
> >  {
> > @@ -583,7 +613,7 @@
> >
> >  static void
> >  duplicate_insns_of_cycles (partial_schedule_ptr ps, int from_stage,
> > -      int to_stage, int for_prolog)
> > +      int to_stage, int for_prolog, rtx count_reg)
> >  {
> >    int row;
> >    ps_insn_ptr ps_ij;
> > @@ -595,6 +625,12 @@
> >   int j, i_reg_moves;
> >   rtx reg_move = NULL_RTX;
> >
> > +        /* Do not duplicate the insn which changes count_reg as it is
> > +           already adjusted.   */
> > +        if (INSN_P (u_node->insn) && single_set (u_node->insn)
> > +            && rtx_equal_p (count_reg, SET_DEST (single_set
> (u_node->insn))))
> 
> Why not check if reg_mentioned_p? We want to avoid duplicating insns using
> count_reg, not only those defining it (as we avoid duplicating the latter).
> Also, the control part could more generally be composed of a
> decrement-compare-branch sequence of insns which define and use count_reg.
> 
> 
> > +          continue;
> > +
> >   if (for_prolog)
> >     {
> >       /* SCHED_STAGE (u_node) >= from_stage == 0.  Generate increasing
> > @@ -643,23 +679,26 @@
> >
> >  /* Generate the instructions (including reg_moves) for prolog & epilog.
> */
> >  static void
> > -generate_prolog_epilog (partial_schedule_ptr ps, struct loop * loop, rtx
> count_reg)
> > +generate_prolog_epilog (partial_schedule_ptr ps, struct loop *loop,
> > +                        rtx count_reg, rtx count_init)
> >  {
> >    int i;
> >    int last_stage = PS_STAGE_COUNT (ps) - 1;
> >    edge e;
> > -
> > +
> >    /* Generate the prolog, inserting its insns on the loop-entry edge.
> */
> >    start_sequence ();
> >
> > -  if (count_reg)
> > -   /* Generate a subtract instruction at the beginning of the prolog to
> > -      adjust the loop count by STAGE_COUNT.  */
> > -   emit_insn (gen_sub2_insn (count_reg, GEN_INT (last_stage)));
> > +  if (count_reg && !count_init)
>          ^^^^^^^^^
> This is redundant, as we assert earlier that count_reg is not NULL.
> 
> >
> > +    /* Generate a subtract instruction at the beginning of the prolog to
> > +       adjust the loop count by STAGE_COUNT.  */
> 
> Also add to comment that if loop count is constant (count_init), this
> constant is adjusted by STAGE_COUNT elsewhere -- by the caller to
> generate_prolog_epilog.
> 
> > +    emit_move_insn (count_reg, gen_rtx_PLUS (GET_MODE (count_reg),
> > +                                             count_reg,
> > +                                             GEN_INT (-last_stage)));
> >
> 
> 
> Consider adjusting here the loop count also when it's a constant (instead
> of above suggested addition to comment), i.e.
> 
>     else
>       /* count_init, set new constant iteration count of loop kernel.  */
>       SET_SRC (single_set (count_init)) = GEN_INT (loop_count - stage_count
> + 1);
> 
> 
> >    for (i = 0; i < last_stage; i++)
> > -    duplicate_insns_of_cycles (ps, 0, i, 1);
> > -
> > +    duplicate_insns_of_cycles (ps, 0, i, 1, count_reg);
> > +
> >    /* Put the prolog on the entry edge.  */
> >    e = loop_preheader_edge (loop);
> >    split_edge_and_insert (e, get_insns ());
> > @@ -670,8 +709,8 @@
> >    start_sequence ();
> >
> >    for (i = 0; i < last_stage; i++)
> > -    duplicate_insns_of_cycles (ps, i + 1, last_stage, 0);
> > -
> > +    duplicate_insns_of_cycles (ps, i + 1, last_stage, 0, count_reg);
> > +
> >    /* Put the epilogue on the exit edge.  */
> >    gcc_assert (single_exit (loop));
> >    e = single_exit (loop);
> > @@ -910,18 +949,42 @@
> >        if ( !(count_reg = doloop_register_get (tail)))
> >   continue;
> >
> > +      if (GET_CODE (PATTERN (tail)) != PARALLEL)
> > +      {
> > +        /* Check that the COUNT_REG has no other occurences in the loop
> > +           until the decrement.  */
> > +        bool found = false;
> > +
> > +        for (insn = head; insn != PREV_INSN (tail); insn = NEXT_INSN
> (insn))
> > +          {
> > +            if (!reg_mentioned_p (count_reg, insn))
> > +              continue;
> > +
> > +            found = true;
> > +            break;
> > +          }
> 
> Can simplify to
>          for (insn = head; insn != PREV_INSN (tail); insn = NEXT_INSN
> (insn))
>            if (found = reg_mentioned_p (count_reg, insn))
>              break;
> 
> > +        if (found)
> > +          continue;
> > +      }
> > +
> 
> Fold the above into doloop_register_get(): it should return a valid
> count_reg iff the loop has a control part comprising of all insns defining
> and using this register, whose value dictates the number of loop
> iterations. Not sure why this check for no uses of count_reg inside loop
> applies only to non-parallel loop branch. Also please add a debug dump upon
> failure before continuing.
> 
> 
> >       /* Don't handle BBs with calls or barriers, or !single_set insns,
> >           or auto-increment insns (to avoid creating invalid reg-moves
> >           for the auto-increment insns).
> >           ??? Should handle auto-increment insns.  */
> > -      for (insn = head; insn != NEXT_INSN (tail); insn = NEXT_INSN
> (insn))
> > - if (CALL_P (insn)
> > -     || BARRIER_P (insn)
> > -     || (INSN_P (insn) && !JUMP_P (insn)
> > -  && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE)
> > -            || (FIND_REG_INC_NOTE (insn, NULL_RTX) != 0))
> > -   break;
> > +     for (insn = head; insn != NEXT_INSN (tail); insn = NEXT_INSN
> (insn))
> > +      {
> > +         rtx set;
> >
> > +        if (CALL_P (insn)
> > +            || BARRIER_P (insn)
> > +            || (INSN_P (insn) && !JUMP_P (insn)
> > +                && !single_set (insn) && GET_CODE (PATTERN (insn)) !=
> USE)
> > +            || (FIND_REG_INC_NOTE (insn, NULL_RTX) != 0)
> > +            || (INSN_P (insn) && (set = single_set (insn))
> > +                && GET_CODE (SET_DEST (set)) == SUBREG))
> > +        break;
> > +      }
> > +
> >        if (insn != NEXT_INSN (tail))
> >   {
> >     if (dump_file)
> > @@ -932,8 +995,11 @@
> >    fprintf (dump_file, "SMS loop-with-barrier\n");
> >                else if (FIND_REG_INC_NOTE (insn, NULL_RTX) != 0)
> >                  fprintf (dump_file, "SMS reg inc\n");
> > -       else
> > -  fprintf (dump_file, "SMS loop-with-not-single-set\n");
> > +              else if ((INSN_P (insn) && !JUMP_P (insn)
> > +                && !single_set (insn) && GET_CODE (PATTERN (insn)) !=
> USE))
> > +                fprintf (dump_file, "SMS loop-with-not-single-set\n");
> > +              else
> > +               fprintf (dump_file, "SMS loop with subreg in lhs\n");
> >         print_rtl_single (dump_file, insn);
> >       }
> >
> > @@ -1112,9 +1178,9 @@
> >       print_node_sched_params (dump_file, g->num_nodes, g);
> >     /* Generate prolog and epilog.  */
> >
> >     if (count_reg && !count_init)
> > -     generate_prolog_epilog (ps, loop, count_reg);
> > +     generate_prolog_epilog  (ps, loop, count_reg, NULL_RTX);
> >     else
> > -     generate_prolog_epilog (ps, loop, NULL_RTX);
> > +     generate_prolog_epilog (ps, loop, count_reg, count_init);
> 
> Why not simplify into a single:
>       generate_prolog_epilog (ps, loop, count_reg, count_init);
> 
> 
> >
> >     free_undo_replace_buff (reg_move_replaces);
> >   }
> > Index: loop-doloop.c
> > ===================================================================
> > --- loop-doloop.c (revision 127390)
> > +++ loop-doloop.c (working copy)
> > @@ -69,36 +69,54 @@
> >     if it is not a decrement and branch jump insn.  */
> >
> >  rtx
> > -doloop_condition_get (rtx pattern)
> > +doloop_condition_get (rtx doloop_pat)
> >  {
> >    rtx cmp;
> >    rtx inc;
> >    rtx reg;
> >    rtx inc_src;
> >    rtx condition;
> > +  rtx pattern;
> >
> > -  /* The canonical doloop pattern we expect is:
> > +  /* The canonical doloop pattern we expect has one of the following
> > +     forms:
> >
> > -     (parallel [(set (pc) (if_then_else (condition)
> > -                                        (label_ref (label))
> > -                                        (pc)))
> > -                (set (reg) (plus (reg) (const_int -1)))
> > -                (additional clobbers and uses)])
> > +     1)  (parallel [(set (pc) (if_then_else (condition)
> > +                  (label_ref (label))
> > +                (pc)))
> > +              (set (reg) (plus (reg) (const_int -1)))
> > +              (additional clobbers and uses)])
> >
> > -     Some targets (IA-64) wrap the set of the loop counter in
> > -     an if_then_else too.
> > +     The branch must be the first entry of the parallel (also required
> > +     by jump.c), and the second entry of the parallel must be a set of
> > +     the loop counter register.  Some targets (IA-64) wrap the set of
> > +     the loop counter in an if_then_else too.
> >
> > -     In summary, the branch must be the first entry of the
> > -     parallel (also required by jump.c), and the second
> > -     entry of the parallel must be a set of the loop counter
> > -     register.  */
> > +     2)  (set (reg) (plus (reg) (const_int -1))
> > +         (set (pc) (if_then_else (reg != 0)
> > +            (label_ref (label))
> > +            (pc))).  */
> >
> > +  pattern = PATTERN (doloop_pat);
> > +
> >    if (GET_CODE (pattern) != PARALLEL)
> > -    return 0;
> > +    {
> > +      rtx cond;
> >
> > -  cmp = XVECEXP (pattern, 0, 0);
> > -  inc = XVECEXP (pattern, 0, 1);
> > +      cmp = pattern;
> > +      inc = PATTERN (PREV_INSN (doloop_pat));
> 
> This is probably where we need to check that
> +  if (INSN_P (PREV_INSN (doloop_pat)))

it won't harm (and also checking that PREV_INSN (doloop_pat) != NULL_RTX);
however, since doloop_pat comes from the machine description, we could
also just trust it.

Zdenek
Follow-Ups:
- Re: [PATCH] Modulo-scheduling improvements. Patch 2 of 2.
  - From: Revital1 Eres
References:
- Re: [PATCH] Modulo-scheduling improvements. Patch 2 of 2.
  - From: Ayal Zaks
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]