This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Modulo-scheduling improvements. Patch 2 of 2.


Revital1 Eres/Haifa/IBM wrote on 15/08/2007 20:13:44:

> Hello,
>
> This is a re-submission of patch 2 of 2
> (http://gcc.gnu.org/ml/gcc-patches/2007-01/msg01470.html).
> It includes the comments received in -
> (*) http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01508.html,
> http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01582.html,
> http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01747.html and
> http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01674.html.
>
> (*)  I've missed the last two comments made by Ayal regarding the
> formatting of the code; If the rest of the patch is OK I would ask to
> submit those in a follow-up patch.
>
> This patch also contains a fix to an ICE reported by Andrey Belevantsev
> while testing sms on ia64. The ICE caused by mishandling of subregs
> that appears in lhs of single set insns.  It also appears on spu.
> Such instructions can indicate that there is more than one def to a reg
> and we choose to avoid those cases for now.  Relevant testcase was added
> for this bug.
>
> The documentation that was added to modulo-sched.c regarding do-loop
> was taken from http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01774.html.
>
> Thanks to Andrey for testing the current version of trunk with sms flags
> on ia64.
>
> Also thanks to Zdenek for his help.
>
> This patch was bootstrapped and tested on PPC and x86_64; with no new
> regressions, and make-check with and without the patch without new
> regressions for the C testsuite on spu.
>
> OK for mainline?
>

In addition to the changes requested by Andrey and Zdenek, here are my
comments.

Please add a note some place (e.g. when deciding where to put the branch)
to revisit the issue of scheduling the insns of the control part relative
to the branch, now that there's more to this part than a single parallel
branch-on-count insn.

Ayal.


> Thanks,
> Revital
>
> 2007-08-15  Mircea Namolaru  <namolaru@il.ibm.com>
>             Vladimir Yanovsky  <yanov@il.ibm.com>
>             Revital Eres  <eres@il.ibm.com>
>
>         * config/spu/spu.md: Recognize doloop pattern when -fmodulo-sched
>         is set.
>         * modulo-sched.c: Add documentation regarding do-loop.
>         (doloop_register_get): Support for the new doloop pattern.
>         (duplicate_insns_of_cycles): Do not duplicate the insn which
>         changes count_reg as it is already adjusted.
>         (generate_prolog_epilog): New argument to support the new
>         loop pattern.
>         (sms_schedule): Support for the new pattern and do not handle
>         loops single sets insns with subreg in their lhs or when
COUNT_REG
>         has no other occurences in the loop until the decrement.
>         * loop-doloop.c (doloop_optimize): Support for the a doloop
                                                            ^^^

>         pattern.
>         (doloop_condition_get): Gets an instruction instead of a pattern
>         and change the return condition when the doloop pattern is
>         not parallel.
>         * ddg.c (create_ddg_dep_from_intra_loop_link): Handle only reg
>         deps when considering to not create edges.
>
>         * gcc.dg/sms-1.c: New test.
>

> Index: ddg.c
> ===================================================================
> --- ddg.c (revision 127390)
> +++ ddg.c (working copy)
> @@ -176,13 +176,16 @@
>        rtx set;
>
>        set = single_set (dest_node->insn);
> -      if (set)
> +      /* TODO: Handle regsiters that REG_P is not true for them.  */
                         ^^^^^^^^^                                , i.e.
subregs.  */


>
> +      if (set && REG_P (SET_DEST (set)))
>          {
>            int regno = REGNO (SET_DEST (set));
> -          struct df_ref *first_def =
> -            df_bb_regno_first_def_find (g->bb, regno);
> +          struct df_ref *first_def;
>            struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g->bb);
>
> +          first_def = df_bb_regno_first_def_find (g->bb, regno);
> +          gcc_assert (first_def);
> +
>            if (bitmap_bit_p (bb_info->gen, first_def->id))
>              return;
>          }
> Index: modulo-sched.c
> ===================================================================
> --- modulo-sched.c (revision 127390)
> +++ modulo-sched.c (working copy)
> @@ -82,8 +82,20 @@
>        perform modulo variable expansion for live ranges that span more
than
>        II cycles (i.e. use register copies to prevent a def from
overwriting
>        itself before reaching the use).
> -*/
>
> +   SMS works with countable loops (1) whose control part can be easily
> +   decoupled from the rest of the loop and (2) whose loop count can
> +   be easily adjusted.  This is because we peel a constant number of
> +   iterations into a prologue and epilogue for which we want to avoid
> +   emitting the control part, and a kernel which is to iterate that
> +   constant number of iterations less than the original loop.  So the
> +   control part should be a set of insns clearly identified and having
> +   its own iv, not otherwise used in the loop (at-least for now), which
> +   initializes a register before the loop to the number of iterations.
> +   Currently SMS relay on the do-loop pattern to recognize such loops.
                    ^^^^^
                    relies
, where (1) the control part comprises of all insns defining and/or using a
certain 'count' register and (2) the loop count can be adjusted by
modifying this register prior to the loop.

> +   TODO: Relay on cfgloop analysis instead.  */
            ^^^^^
            Rely

> +
> +
>
>  /* This page defines partial-schedule structures and functions for
>     modulo scheduling.  */
> @@ -182,10 +194,11 @@
>  static void set_node_sched_params (ddg_ptr);
>  static partial_schedule_ptr sms_schedule_by_order (ddg_ptr, int, int,
int *);
>  static void permute_partial_schedule (partial_schedule_ptr ps, rtx
last);
> -static void generate_prolog_epilog (partial_schedule_ptr ,struct loop *
loop, rtx);
> +static void generate_prolog_epilog (partial_schedule_ptr, struct loop
*loop,
> +                                    rtx, rtx);
>  static void duplicate_insns_of_cycles (partial_schedule_ptr ps,
>             int from_stage, int to_stage,
> -           int is_prolog);
> +           int is_prolog, rtx count_reg);
>
>  #define SCHED_ASAP(x) (((node_sched_params_ptr)(x)->aux.info)->asap)
>  #define SCHED_TIME(x) (((node_sched_params_ptr)(x)->aux.info)->time)
> @@ -268,13 +281,29 @@
>  doloop_register_get (rtx insn ATTRIBUTE_UNUSED)
>  {
>  #ifdef HAVE_doloop_end
> -  rtx pattern, reg, condition;
> +  rtx reg, condition;
>
> -  if (! JUMP_P (insn))
> +  /* The canonical doloop pattern we expect has one of the following
> +     forms:
> +
> +     1)  (parallel [(set (pc) (if_then_else (condition)
> +                                            (label_ref (label))
> +                                            (pc)))
> +                     (set (reg) (plus (reg) (const_int -1)))
> +                     (additional clobbers and uses)])
> +
> +     2)  (set (reg) (plus (reg) (const_int -1))
> +         (set (pc) (if_then_else (reg != 0)
> +                                 (label_ref (label))
> +                                 (pc))).  */
> +
> +  if (!JUMP_P (insn))
>      return NULL_RTX;
>
> -  pattern = PATTERN (insn);
> -  condition = doloop_condition_get (pattern);
> +  if (!INSN_P (PREV_INSN (insn)))
> +    return NULL_RTX;
> +

Why do we need to check the insn preceding the (non-parallel) jump?  Looks
like this check belongs inside doloop_condition_get.


> +  condition = doloop_condition_get (insn);
>    if (! condition)
>      return NULL_RTX;
>
> @@ -401,6 +430,7 @@
>     nreg_moves = ----------------------------------- + 1 - {
dependence.
>                              ii                          { 1 if not.
>  */
> +
>  static struct undo_replace_buff_elem *
>  generate_reg_moves (partial_schedule_ptr ps, bool rescan)
>  {
> @@ -583,7 +613,7 @@
>
>  static void
>  duplicate_insns_of_cycles (partial_schedule_ptr ps, int from_stage,
> -      int to_stage, int for_prolog)
> +      int to_stage, int for_prolog, rtx count_reg)
>  {
>    int row;
>    ps_insn_ptr ps_ij;
> @@ -595,6 +625,12 @@
>   int j, i_reg_moves;
>   rtx reg_move = NULL_RTX;
>
> +        /* Do not duplicate the insn which changes count_reg as it is
> +           already adjusted.   */
> +        if (INSN_P (u_node->insn) && single_set (u_node->insn)
> +            && rtx_equal_p (count_reg, SET_DEST (single_set
(u_node->insn))))

Why not check if reg_mentioned_p? We want to avoid duplicating insns using
count_reg, not only those defining it (as we avoid duplicating the latter).
Also, the control part could more generally be composed of a
decrement-compare-branch sequence of insns which define and use count_reg.


> +          continue;
> +
>   if (for_prolog)
>     {
>       /* SCHED_STAGE (u_node) >= from_stage == 0.  Generate increasing
> @@ -643,23 +679,26 @@
>
>  /* Generate the instructions (including reg_moves) for prolog & epilog.
*/
>  static void
> -generate_prolog_epilog (partial_schedule_ptr ps, struct loop * loop, rtx
count_reg)
> +generate_prolog_epilog (partial_schedule_ptr ps, struct loop *loop,
> +                        rtx count_reg, rtx count_init)
>  {
>    int i;
>    int last_stage = PS_STAGE_COUNT (ps) - 1;
>    edge e;
> -
> +
>    /* Generate the prolog, inserting its insns on the loop-entry edge.
*/
>    start_sequence ();
>
> -  if (count_reg)
> -   /* Generate a subtract instruction at the beginning of the prolog to
> -      adjust the loop count by STAGE_COUNT.  */
> -   emit_insn (gen_sub2_insn (count_reg, GEN_INT (last_stage)));
> +  if (count_reg && !count_init)
         ^^^^^^^^^
This is redundant, as we assert earlier that count_reg is not NULL.

>
> +    /* Generate a subtract instruction at the beginning of the prolog to
> +       adjust the loop count by STAGE_COUNT.  */

Also add to comment that if loop count is constant (count_init), this
constant is adjusted by STAGE_COUNT elsewhere -- by the caller to
generate_prolog_epilog.

> +    emit_move_insn (count_reg, gen_rtx_PLUS (GET_MODE (count_reg),
> +                                             count_reg,
> +                                             GEN_INT (-last_stage)));
>


Consider adjusting here the loop count also when it's a constant (instead
of above suggested addition to comment), i.e.

    else
      /* count_init, set new constant iteration count of loop kernel.  */
      SET_SRC (single_set (count_init)) = GEN_INT (loop_count - stage_count
+ 1);


>    for (i = 0; i < last_stage; i++)
> -    duplicate_insns_of_cycles (ps, 0, i, 1);
> -
> +    duplicate_insns_of_cycles (ps, 0, i, 1, count_reg);
> +
>    /* Put the prolog on the entry edge.  */
>    e = loop_preheader_edge (loop);
>    split_edge_and_insert (e, get_insns ());
> @@ -670,8 +709,8 @@
>    start_sequence ();
>
>    for (i = 0; i < last_stage; i++)
> -    duplicate_insns_of_cycles (ps, i + 1, last_stage, 0);
> -
> +    duplicate_insns_of_cycles (ps, i + 1, last_stage, 0, count_reg);
> +
>    /* Put the epilogue on the exit edge.  */
>    gcc_assert (single_exit (loop));
>    e = single_exit (loop);
> @@ -910,18 +949,42 @@
>        if ( !(count_reg = doloop_register_get (tail)))
>   continue;
>
> +      if (GET_CODE (PATTERN (tail)) != PARALLEL)
> +      {
> +        /* Check that the COUNT_REG has no other occurences in the loop
> +           until the decrement.  */
> +        bool found = false;
> +
> +        for (insn = head; insn != PREV_INSN (tail); insn = NEXT_INSN
(insn))
> +          {
> +            if (!reg_mentioned_p (count_reg, insn))
> +              continue;
> +
> +            found = true;
> +            break;
> +          }

Can simplify to
         for (insn = head; insn != PREV_INSN (tail); insn = NEXT_INSN
(insn))
           if (found = reg_mentioned_p (count_reg, insn))
             break;

> +        if (found)
> +          continue;
> +      }
> +

Fold the above into doloop_register_get(): it should return a valid
count_reg iff the loop has a control part comprising of all insns defining
and using this register, whose value dictates the number of loop
iterations. Not sure why this check for no uses of count_reg inside loop
applies only to non-parallel loop branch. Also please add a debug dump upon
failure before continuing.


>       /* Don't handle BBs with calls or barriers, or !single_set insns,
>           or auto-increment insns (to avoid creating invalid reg-moves
>           for the auto-increment insns).
>           ??? Should handle auto-increment insns.  */
> -      for (insn = head; insn != NEXT_INSN (tail); insn = NEXT_INSN
(insn))
> - if (CALL_P (insn)
> -     || BARRIER_P (insn)
> -     || (INSN_P (insn) && !JUMP_P (insn)
> -  && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE)
> -            || (FIND_REG_INC_NOTE (insn, NULL_RTX) != 0))
> -   break;
> +     for (insn = head; insn != NEXT_INSN (tail); insn = NEXT_INSN
(insn))
> +      {
> +         rtx set;
>
> +        if (CALL_P (insn)
> +            || BARRIER_P (insn)
> +            || (INSN_P (insn) && !JUMP_P (insn)
> +                && !single_set (insn) && GET_CODE (PATTERN (insn)) !=
USE)
> +            || (FIND_REG_INC_NOTE (insn, NULL_RTX) != 0)
> +            || (INSN_P (insn) && (set = single_set (insn))
> +                && GET_CODE (SET_DEST (set)) == SUBREG))
> +        break;
> +      }
> +
>        if (insn != NEXT_INSN (tail))
>   {
>     if (dump_file)
> @@ -932,8 +995,11 @@
>    fprintf (dump_file, "SMS loop-with-barrier\n");
>                else if (FIND_REG_INC_NOTE (insn, NULL_RTX) != 0)
>                  fprintf (dump_file, "SMS reg inc\n");
> -       else
> -  fprintf (dump_file, "SMS loop-with-not-single-set\n");
> +              else if ((INSN_P (insn) && !JUMP_P (insn)
> +                && !single_set (insn) && GET_CODE (PATTERN (insn)) !=
USE))
> +                fprintf (dump_file, "SMS loop-with-not-single-set\n");
> +              else
> +               fprintf (dump_file, "SMS loop with subreg in lhs\n");
>         print_rtl_single (dump_file, insn);
>       }
>
> @@ -1112,9 +1178,9 @@
>       print_node_sched_params (dump_file, g->num_nodes, g);
>     /* Generate prolog and epilog.  */
>
>     if (count_reg && !count_init)
> -     generate_prolog_epilog (ps, loop, count_reg);
> +     generate_prolog_epilog  (ps, loop, count_reg, NULL_RTX);
>     else
> -     generate_prolog_epilog (ps, loop, NULL_RTX);
> +     generate_prolog_epilog (ps, loop, count_reg, count_init);

Why not simplify into a single:
      generate_prolog_epilog (ps, loop, count_reg, count_init);


>
>     free_undo_replace_buff (reg_move_replaces);
>   }
> Index: loop-doloop.c
> ===================================================================
> --- loop-doloop.c (revision 127390)
> +++ loop-doloop.c (working copy)
> @@ -69,36 +69,54 @@
>     if it is not a decrement and branch jump insn.  */
>
>  rtx
> -doloop_condition_get (rtx pattern)
> +doloop_condition_get (rtx doloop_pat)
>  {
>    rtx cmp;
>    rtx inc;
>    rtx reg;
>    rtx inc_src;
>    rtx condition;
> +  rtx pattern;
>
> -  /* The canonical doloop pattern we expect is:
> +  /* The canonical doloop pattern we expect has one of the following
> +     forms:
>
> -     (parallel [(set (pc) (if_then_else (condition)
> -                                        (label_ref (label))
> -                                        (pc)))
> -                (set (reg) (plus (reg) (const_int -1)))
> -                (additional clobbers and uses)])
> +     1)  (parallel [(set (pc) (if_then_else (condition)
> +                  (label_ref (label))
> +                (pc)))
> +              (set (reg) (plus (reg) (const_int -1)))
> +              (additional clobbers and uses)])
>
> -     Some targets (IA-64) wrap the set of the loop counter in
> -     an if_then_else too.
> +     The branch must be the first entry of the parallel (also required
> +     by jump.c), and the second entry of the parallel must be a set of
> +     the loop counter register.  Some targets (IA-64) wrap the set of
> +     the loop counter in an if_then_else too.
>
> -     In summary, the branch must be the first entry of the
> -     parallel (also required by jump.c), and the second
> -     entry of the parallel must be a set of the loop counter
> -     register.  */
> +     2)  (set (reg) (plus (reg) (const_int -1))
> +         (set (pc) (if_then_else (reg != 0)
> +            (label_ref (label))
> +            (pc))).  */
>
> +  pattern = PATTERN (doloop_pat);
> +
>    if (GET_CODE (pattern) != PARALLEL)
> -    return 0;
> +    {
> +      rtx cond;
>
> -  cmp = XVECEXP (pattern, 0, 0);
> -  inc = XVECEXP (pattern, 0, 1);
> +      cmp = pattern;
> +      inc = PATTERN (PREV_INSN (doloop_pat));

This is probably where we need to check that
+  if (INSN_P (PREV_INSN (doloop_pat)))


> +      /* We expect the condition to be of the form (reg != 0)  */
> +      cond = XEXP (SET_SRC (cmp), 0);
> +      if (GET_CODE (cond) != NE || XEXP (cond, 1) != const0_rtx)
> +        return 0;
>
> +    }
> +  else
> +    {
> +      cmp = XVECEXP (pattern, 0, 0);
> +      inc = XVECEXP (pattern, 0, 1);
> +    }
> +
>    /* Check for (set (reg) (something)).  */
>    if (GET_CODE (inc) != SET)
>      return 0;
> @@ -139,7 +157,13 @@
>    if ((XEXP (condition, 0) == reg)
>        || (GET_CODE (XEXP (condition, 0)) == PLUS
>       && XEXP (XEXP (condition, 0), 0) == reg))
> +   {
> +     if (GET_CODE (pattern) != PARALLEL)
> +     /* Generate (reg != 1) condition instead.  */
> +        condition = gen_rtx_fmt_ee (NE, VOIDmode, inc_src, const1_rtx);
> +
>      return condition;
> +   }
>
>    /* ??? If a machine uses a funny comparison, we could return a
>       canonicalized form here.  */
> @@ -597,9 +621,7 @@
>      {
>        while (NEXT_INSN (doloop_pat) != NULL_RTX)
>   doloop_pat = NEXT_INSN (doloop_pat);
> -      if (JUMP_P (doloop_pat))
> - doloop_pat = PATTERN (doloop_pat);
> -      else
> +      if (!JUMP_P (doloop_pat))
>   doloop_pat = NULL_RTX;
>      }
>
> Index: config/spu/spu.md
> ===================================================================
> --- config/spu/spu.md (revision 127390)
> +++ config/spu/spu.md (working copy)
> @@ -3887,6 +3887,43 @@
>    [(set_attr "type" "br")])
>
>
> +
> + ;; Define the subtract-one-and-jump insns so loop.c
> + ;; knows what to generate.
> + (define_expand "doloop_end"
> +   [(use (match_operand 0 "" ""))      ; loop pseudo
> +    (use (match_operand 1 "" ""))      ; iterations; zero if unknown
> +    (use (match_operand 2 "" ""))      ; max iterations
> +    (use (match_operand 3 "" ""))      ; loop level
> +    (use (match_operand 4 "" ""))]     ; label
> +   ""
> +   "
> + {
> +   /* ??? Enable doloop only if -fmodulo-sched is set.  */
> +   if (optimize > 0 && flag_modulo_sched)
> +   {
> +     rtx s0;
> +     rtx bcomp;
> +     rtx loc_ref;
> +
> +     /* Only use this on innermost loops.  */
> +     if (INTVAL (operands[3]) > 1)
> +       FAIL;
> +     if (GET_MODE (operands[0]) != SImode)
> +       FAIL;
> +
> +     s0 = operands [0];
> +     emit_move_insn (s0, gen_rtx_PLUS (SImode, s0, GEN_INT (-1)));
> +     bcomp = gen_rtx_NE(SImode, s0, const0_rtx);
> +     loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [4]);
> +     emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx,
> +                                  gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp,
> +                                                        loc_ref,
pc_rtx)));
> +
> +     DONE;
> +   }
> + }")
> +
>  ;; convert between any two modes, avoiding any GCC assumptions
>  (define_expand "spu_convert"
>    [(set (match_operand 0 "spu_reg_operand" "")
>
> > [attachment "sms-1.c.txt" deleted by Ayal Zaks/Haifa/IBM]


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]