This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH] Modulo-scheduling improvements. Patch 2 of 2. (re-submission)

From: Ayal Zaks <ZAKS at il dot ibm dot com>
To: Revital1 Eres <ERES at il dot ibm dot com>
Cc: abel at ispras dot ru, andrew_pinski at playstation dot sony dot com, dje at watson dot ibm dot com, gcc-patches at gcc dot gnu dot org, Mircea Namolaru <NAMOLARU at il dot ibm dot com>, Zdenek Dvorak <rakdver at kam dot mff dot cuni dot cz>, Trevor_Smigiel at playstation dot sony dot com
Date: Sun, 26 Aug 2007 02:10:19 +0300
Subject: Re: [PATCH] Modulo-scheduling improvements. Patch 2 of 2. (re-submission)
Revital1 Eres/Haifa/IBM wrote on 22/08/2007 17:12:42:

> Hello,
>
> Following the last submission -
> http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00943.html; This patch
> includes the comments made by Andrey, Ayal and Zdenek.
>
> Thanks again to Andrey for testing the patch on ia64; reporting no
> new regressions.
>
> This patch was bootstrapped and tested also on PPC and x86_64; with no
> new regressions, and make-check with and without the patch without new
> regressions for the C testsuite on spu.

> OK for mainline?
>


The doloop.c part is ok, having addressed Zdenek's comments (
http://gcc.gnu.org/ml/gcc-patches/2007-08/msg01115.html).

The modulo-sched.c and ddg.c parts are ok, with minor typo's and comments
below.

Andrew/Trevor, is the spu.md part at the bottom ok? Following the earlier
discussion (http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01738.html, we'd
like to move forward as there are dependent patches pending, and we plan to
revisit and free sms's dependence on doloop_condition_get. Revital, please
make a note of it.

Ayal.

> Thanks,
> Revital
>
> 2007-08-22  Mircea Namolaru  <namolaru@il.ibm.com>
>             Vladimir Yanovsky  <yanov@il.ibm.com>
>             Revital Eres  <eres@il.ibm.com>
>             Andrey Belevantsev  <abel@ispras.ru>
>
>         * config/spu/spu.md: Recognize doloop pattern when -fmodulo-sched
>         is set.
>         * modulo-sched.c: Add documentation regarding do-loop.
>         (doloop_register_get): Change number of argumants to support
                                                  ^^^^^e^^^

>         the new do-loop pattern and check whether COUNT_REG has no other
>         occurences in the loop besides in the control part.
>         (duplicate_insns_of_cycles): Do not duplicate the insn which
>         changes count_reg as it is already adjusted.
>         (generate_prolog_epilog): New argument to support the new
>         do-loop pattern.  Change the subtract instruction to use
>         expand_simple_binop.  Call duplicate_insns_of_cycles with new
>         argument.
>         (sms_schedule): Call doloop_register_get and
>         generate_prolog_epilog with new argumant.  Do not handle loops
                                          ^^^^^e^^

>         with single sets insns with subreg in their lhs.
>         * loop-doloop.c (doloop_optimize): Support for the a do-loop
                                                         ^^^^^
                                                         another

>         pattern.
>         (doloop_condition_get): Gets an instruction instead of a pattern
>         and change the return condition when the do-loop pattern is
>         not parallel.
>         * ddg.c (create_ddg_dep_from_intra_loop_link): Handle only reg
>         deps when considering to not create edges.
>
>         * gcc.dg/sms-1.c: New test.
>



> Index: ddg.c
> ===================================================================
> --- ddg.c   (revision 127558)
> +++ ddg.c   (working copy)
> @@ -176,13 +176,17 @@
>        rtx set;
>
>        set = single_set (dest_node->insn);
> -      if (set)
> +      /* TODO: Handle regsiters that REG_P is not true for them, i.e.
                         ^^^is^^^^

> +         subregs and special registers.  */
> +      if (set && REG_P (SET_DEST (set)))
>          {
>            int regno = REGNO (SET_DEST (set));
> -          struct df_ref *first_def =
> -            df_bb_regno_first_def_find (g->bb, regno);
> +          struct df_ref *first_def;
>            struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g->bb);
>
> +          first_def = df_bb_regno_first_def_find (g->bb, regno);
> +          gcc_assert (first_def);
> +
>            if (bitmap_bit_p (bb_info->gen, first_def->id))
>              return;
>          }
> Index: modulo-sched.c
> ===================================================================
> --- modulo-sched.c   (revision 127558)
> +++ modulo-sched.c   (working copy)
> @@ -82,8 +82,21 @@
>        perform modulo variable expansion for live ranges that span more
than
>        II cycles (i.e. use register copies to prevent a def from
overwriting
>        itself before reaching the use).
> -*/
>
> +    SMS works with countable loops (1) whose control part can be easily
> +    decoupled from the rest of the loop and (2) whose loop count can
> +    be easily adjusted.  This is because we peel a constant number of
> +    iterations into a prologue and epilogue for which we want to avoid
> +    emitting the control part, and a kernel which is to iterate that
> +    constant number of iterations less than the original loop.  So the
> +    control part should be a set of insns clearly identified and having
> +    its own iv, not otherwise used in the loop (at-least for now), which
> +    initializes a register before the loop to the number of iterations.
> +    Currently SMS relies on the do-loop pattern to recognize such loops,
> +    where (1) the control part comprises of all insns defining and/or
> +    using a certain 'count' register and (2) the loop count can be
> +    adjusted by modifying this register prior to the loop.
> +    TODO: Rely on cfgloop analysis instead.  */
>
>  /* This page defines partial-schedule structures and functions for
>     modulo scheduling.  */
> @@ -182,10 +195,11 @@
>  static void set_node_sched_params (ddg_ptr);
>  static partial_schedule_ptr sms_schedule_by_order (ddg_ptr, int, int,
int *);
>  static void permute_partial_schedule (partial_schedule_ptr ps, rtx
last);
> -static void generate_prolog_epilog (partial_schedule_ptr ,struct loop *
loop, rtx);
> +static void generate_prolog_epilog (partial_schedule_ptr, struct loop
*loop,
> +                                    rtx, rtx);
>  static void duplicate_insns_of_cycles (partial_schedule_ptr ps,
>                     int from_stage, int to_stage,
> -                   int is_prolog);
> +                   int is_prolog, rtx count_reg);
>
>  #define SCHED_ASAP(x) (((node_sched_params_ptr)(x)->aux.info)->asap)
>  #define SCHED_TIME(x) (((node_sched_params_ptr)(x)->aux.info)->time)
> @@ -261,20 +275,21 @@
>  };
>
>
> -/* Return the register decremented and tested in INSN,
> -   or zero if it is not a decrement-and-branch insn.  */
> -
> +/* Given HEAD and TAIL which are the first and last insns in a loop;
> +   return the register which controls the loop.  Return zero if it has
> +   more than one occurrence in the loop besides the control part or the
> +   do-loop pattern is not of the form we expect.  */
>  static rtx
> -doloop_register_get (rtx insn ATTRIBUTE_UNUSED)
> +doloop_register_get (rtx head ATTRIBUTE_UNUSED, rtx tail
ATTRIBUTE_UNUSED)
>  {
>  #ifdef HAVE_doloop_end
> -  rtx pattern, reg, condition;
> +  rtx reg, condition, insn;
> +  bool found = false;
>
> -  if (! JUMP_P (insn))
> +  if (!JUMP_P (tail))
>      return NULL_RTX;
>
> -  pattern = PATTERN (insn);
> -  condition = doloop_condition_get (pattern);
> +  condition = doloop_condition_get (tail);
>    if (! condition)
>      return NULL_RTX;
>
> @@ -286,6 +301,29 @@
>    else
>      gcc_unreachable ();
>
> +  /* Check that the COUNT_REG has no other occurrences in the loop
> +     until the decrement.  */

This assumes the control part consists of either a single (parallel)
branch-on-count or a (non-parallel) branch immediately preceded by a single
(decrement) insn. Please document this.
Suggest to combine the two checks, e.g. by:

+  first_insn_not_to_check = (GET_CODE (PATTERN (tail)) == PARALLEL ? tail
: PREV_INSN (tail));
+
+  for (insn = head; insn != first_insn_not_to_check; insn = NEXT_INSN
(insn))
...

> +  for (insn = head; insn != PREV_INSN (tail); insn = NEXT_INSN (insn))
> +    if ((found = reg_mentioned_p (reg, insn)) == true)
                                                 ^^^^^^^ redundant

> +      break;
> +  if (found)
> +    {
> +      if (dump_file)
> +        fprintf (dump_file, "SMS more than one occurrence"
                                    ^^^^^^^^^^^^^^^^^^^^^^^^
                                    count_reg found outside control
part\n");

> +                 " in the loop of count_reg\n");
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> +
> +      return NULL_RTX;
> +    }
> +  /* One last check in case the do-loop pattern is parallel.  */
> +  if (GET_CODE (PATTERN (tail)) == PARALLEL)
> +    if (reg_mentioned_p (reg, PREV_INSN (tail)))
> +      {
> +        if (dump_file)
> +          fprintf (dump_file, "SMS more than one occurrence"
> +                   " in the loop of count_reg\n");
> +
> +        return NULL_RTX;
> +      }
>    return reg;
>  #else
>    return NULL_RTX;
> @@ -583,7 +621,7 @@
>
>  static void
>  duplicate_insns_of_cycles (partial_schedule_ptr ps, int from_stage,
> -            int to_stage, int for_prolog)
> +            int to_stage, int for_prolog, rtx count_reg)
>  {
>    int row;
>    ps_insn_ptr ps_ij;
> @@ -595,6 +633,13 @@
>     int j, i_reg_moves;
>     rtx reg_move = NULL_RTX;
>
> +        /* Do not duplicate the insn which changes count_reg as it is
                               any            refers to count_reg as it
belongs to the control part.

> +           already adjusted.
> +           TODO: This should be done by analyzing the control part of
> +           the loop.  */
> +        if (reg_mentioned_p (count_reg, u_node->insn))
> +          continue;
> +
>     if (for_prolog)
>       {
>         /* SCHED_STAGE (u_node) >= from_stage == 0.  Generate increasing
> @@ -643,23 +688,35 @@
>
>  /* Generate the instructions (including reg_moves) for prolog & epilog.
*/
>  static void
> -generate_prolog_epilog (partial_schedule_ptr ps, struct loop * loop, rtx
count_reg)
> +generate_prolog_epilog (partial_schedule_ptr ps, struct loop *loop,
> +                        rtx count_reg, rtx count_init)
>  {
>    int i;
>    int last_stage = PS_STAGE_COUNT (ps) - 1;
>    edge e;
> -
> +
>    /* Generate the prolog, inserting its insns on the loop-entry edge.
*/
>    start_sequence ();
>
> -  if (count_reg)
> -   /* Generate a subtract instruction at the beginning of the prolog to
> -      adjust the loop count by STAGE_COUNT.  */
> -   emit_insn (gen_sub2_insn (count_reg, GEN_INT (last_stage)));
> +  if (!count_init)
> +    {
> +      /* Generate a subtract instruction at the beginning of the prolog
to
                     ^^^^ instructions ^^^^

> +         adjust the loop count by STAGE_COUNT.  If loop count is
constant
> +         (count_init), this constant is adjusted by STAGE_COUNT in
> +         generate_prolog_epilog function.  */
> +      rtx sub_reg = NULL_RTX;
>
> +      sub_reg = expand_simple_binop (GET_MODE (count_reg), MINUS,
> +                                     count_reg, GEN_INT (last_stage),
> +                                     count_reg, 1, OPTAB_DIRECT);
> +      gcc_assert (REG_P (sub_reg));
> +      if (REGNO (sub_reg) != REGNO (count_reg))
> +        emit_move_insn (count_reg, sub_reg);
> +    }
> +
>    for (i = 0; i < last_stage; i++)
> -    duplicate_insns_of_cycles (ps, 0, i, 1);
> -
> +    duplicate_insns_of_cycles (ps, 0, i, 1, count_reg);
> +
>    /* Put the prolog on the entry edge.  */
>    e = loop_preheader_edge (loop);
>    split_edge_and_insert (e, get_insns ());
> @@ -670,8 +727,8 @@
>    start_sequence ();
>
>    for (i = 0; i < last_stage; i++)
> -    duplicate_insns_of_cycles (ps, i + 1, last_stage, 0);
> -
> +    duplicate_insns_of_cycles (ps, i + 1, last_stage, 0, count_reg);
> +
>    /* Put the epilogue on the exit edge.  */
>    gcc_assert (single_exit (loop));
>    e = single_exit (loop);
> @@ -907,21 +964,27 @@
>          }
>
>        /* Make sure this is a doloop.  */
> -      if ( !(count_reg = doloop_register_get (tail)))
> +      if ( !(count_reg = doloop_register_get (head, tail)))
>     continue;
>
>        /* Don't handle BBs with calls or barriers, or !single_set insns,
>           or auto-increment insns (to avoid creating invalid reg-moves
>           for the auto-increment insns).
>           ??? Should handle auto-increment insns.  */

Please add  ??? Should handle insns defining subregs.  */

> -      for (insn = head; insn != NEXT_INSN (tail); insn = NEXT_INSN
(insn))
> -   if (CALL_P (insn)
> -       || BARRIER_P (insn)
> -       || (INSN_P (insn) && !JUMP_P (insn)
> -      && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE)
> -            || (FIND_REG_INC_NOTE (insn, NULL_RTX) != 0))
> -     break;
> +     for (insn = head; insn != NEXT_INSN (tail); insn = NEXT_INSN
(insn))
> +      {
> +         rtx set;
>
> +        if (CALL_P (insn)
> +            || BARRIER_P (insn)
> +            || (INSN_P (insn) && !JUMP_P (insn)
> +                && !single_set (insn) && GET_CODE (PATTERN (insn)) !=
USE)
> +            || (FIND_REG_INC_NOTE (insn, NULL_RTX) != 0)
> +            || (INSN_P (insn) && (set = single_set (insn))
> +                && GET_CODE (SET_DEST (set)) == SUBREG))
> +        break;
> +      }
> +
>        if (insn != NEXT_INSN (tail))
>     {
>       if (dump_file)
> @@ -932,8 +995,11 @@
>        fprintf (dump_file, "SMS loop-with-barrier\n");
>                else if (FIND_REG_INC_NOTE (insn, NULL_RTX) != 0)
>                  fprintf (dump_file, "SMS reg inc\n");
> -         else
> -      fprintf (dump_file, "SMS loop-with-not-single-set\n");
> +              else if ((INSN_P (insn) && !JUMP_P (insn)
> +                && !single_set (insn) && GET_CODE (PATTERN (insn)) !=
USE))
> +                fprintf (dump_file, "SMS loop-with-not-single-set\n");
> +              else
> +               fprintf (dump_file, "SMS loop with subreg in lhs\n");
>           print_rtl_single (dump_file, insn);
>         }
>
> @@ -998,7 +1064,7 @@
>        /* In case of th loop have doloop register it gets special
>      handling.  */
>        count_init = NULL_RTX;
> -      if ((count_reg = doloop_register_get (tail)))
> +      if ((count_reg = doloop_register_get (head, tail)))
>     {
>       basic_block pre_header;
>
> @@ -1072,7 +1138,10 @@
>          the closing_branch was scheduled and should appear in the last
(ii-1)
>          row.  Otherwise, we are free to schedule the branch, and we let
nodes
>          that were scheduled at the first PS_MIN_CYCLE cycle appear in
the first
> -        row; this should reduce stage_count to minimum.  */
> +        row; this should reduce stage_count to minimum.
> +             TODO: Revisit the issue of scheduling the insns of the
> +             control part relative to the branch when the control part
> +             has more than one insn.  */
>       normalize_sched_times (ps);
>       rotate_partial_schedule (ps, PS_MIN_CYCLE (ps));
>       set_columns_for_ps (ps);
> @@ -1111,11 +1180,8 @@
>       if (dump_file)
>         print_node_sched_params (dump_file, g->num_nodes, g);
>       /* Generate prolog and epilog.  */
> -     if (count_reg && !count_init)
> -       generate_prolog_epilog (ps, loop, count_reg);
> -     else
> -       generate_prolog_epilog (ps, loop, NULL_RTX);
> -
> +          generate_prolog_epilog (ps, loop, count_reg, count_init);
> +
>       free_undo_replace_buff (reg_move_replaces);
>     }
>
> Index: loop-doloop.c
> ===================================================================
> --- loop-doloop.c   (revision 127558)
> +++ loop-doloop.c   (working copy)
> @@ -69,36 +69,58 @@
>     if it is not a decrement and branch jump insn.  */
>
>  rtx
> -doloop_condition_get (rtx pattern)
> +doloop_condition_get (rtx doloop_pat)
>  {
>    rtx cmp;
>    rtx inc;
>    rtx reg;
>    rtx inc_src;
>    rtx condition;
> +  rtx pattern;
>
> -  /* The canonical doloop pattern we expect is:
> +  /* The canonical doloop pattern we expect has one of the following
> +     forms:
>
> -     (parallel [(set (pc) (if_then_else (condition)
> -                                        (label_ref (label))
> -                                        (pc)))
> -                (set (reg) (plus (reg) (const_int -1)))
> -                (additional clobbers and uses)])
> +     1)  (parallel [(set (pc) (if_then_else (condition)
> +                          (label_ref (label))
> +                        (pc)))
> +                (set (reg) (plus (reg) (const_int -1)))
> +                (additional clobbers and uses)])
>
> -     Some targets (IA-64) wrap the set of the loop counter in
> -     an if_then_else too.
> +     The branch must be the first entry of the parallel (also required
> +     by jump.c), and the second entry of the parallel must be a set of
> +     the loop counter register.  Some targets (IA-64) wrap the set of
> +     the loop counter in an if_then_else too.
>
> -     In summary, the branch must be the first entry of the
> -     parallel (also required by jump.c), and the second
> -     entry of the parallel must be a set of the loop counter
> -     register.  */
> +     2)  (set (reg) (plus (reg) (const_int -1))
> +         (set (pc) (if_then_else (reg != 0)
> +                            (label_ref (label))
> +                  (pc))).  */
>
> +  pattern = PATTERN (doloop_pat);
> +
>    if (GET_CODE (pattern) != PARALLEL)
> -    return 0;
> +    {
> +      rtx cond;
>
> -  cmp = XVECEXP (pattern, 0, 0);
> -  inc = XVECEXP (pattern, 0, 1);

Please add here: /* We expect the decrement to immediately precede the
branch.  */

> +      if ((PREV_INSN (doloop_pat) == NULL_RTX)
> +          || !INSN_P (PREV_INSN (doloop_pat)))
> +        return 0;
>
> +      cmp = pattern;
> +      inc = PATTERN (PREV_INSN (doloop_pat));
> +      /* We expect the condition to be of the form (reg != 0)  */
> +      cond = XEXP (SET_SRC (cmp), 0);
> +      if (GET_CODE (cond) != NE || XEXP (cond, 1) != const0_rtx)
> +        return 0;
> +
> +    }
> +  else
> +    {
> +      cmp = XVECEXP (pattern, 0, 0);
> +      inc = XVECEXP (pattern, 0, 1);
> +    }
> +
>    /* Check for (set (reg) (something)).  */
>    if (GET_CODE (inc) != SET)
>      return 0;
> @@ -139,7 +161,29 @@
>    if ((XEXP (condition, 0) == reg)
>        || (GET_CODE (XEXP (condition, 0)) == PLUS
>           && XEXP (XEXP (condition, 0), 0) == reg))
> +   {
> +     if (GET_CODE (pattern) != PARALLEL)
> +     /*  The second form we expect:
> +
> +         (set (reg) (plus (reg) (const_int -1))
> +         (set (pc) (if_then_else (reg != 0)
> +                                 (label_ref (label))
> +                                 (pc))).
> +
> +         is equivalent to the following:
> +
> +         (parallel [(set (pc) (if_then_else (reg != 1)
> +                                            (label_ref (label))
> +                                            (pc)))
> +                     (set (reg) (plus (reg) (const_int -1)))
> +                     (additional clobbers and uses)])
> +
> +         So we return that form instead.
> +     */
> +        condition = gen_rtx_fmt_ee (NE, VOIDmode, inc_src, const1_rtx);
> +
>      return condition;
> +   }
>
>    /* ??? If a machine uses a funny comparison, we could return a
>       canonicalized form here.  */
> @@ -597,9 +641,7 @@
>      {
>        while (NEXT_INSN (doloop_pat) != NULL_RTX)
>     doloop_pat = NEXT_INSN (doloop_pat);
> -      if (JUMP_P (doloop_pat))
> -   doloop_pat = PATTERN (doloop_pat);
> -      else
> +      if (!JUMP_P (doloop_pat))
>     doloop_pat = NULL_RTX;
>      }
>
> Index: config/spu/spu.md
> ===================================================================
> --- config/spu/spu.md   (revision 127558)
> +++ config/spu/spu.md   (working copy)
> @@ -3887,6 +3887,48 @@
>    [(set_attr "type" "br")])
>
>
> +
> + ;; Define the subtract-one-and-jump insns so loop.c
> + ;; knows what to generate.
> + (define_expand "doloop_end"
> +   [(use (match_operand 0 "" ""))      ; loop pseudo
> +    (use (match_operand 1 "" ""))      ; iterations; zero if unknown
> +    (use (match_operand 2 "" ""))      ; max iterations
> +    (use (match_operand 3 "" ""))      ; loop level
> +    (use (match_operand 4 "" ""))]     ; label
> +   ""
> +   "
> + {
> +   /* Currently SMS relies on the do-loop pattern to recognize loops
> +      where (1) the control part comprises of all insns defining and/or
> +      using a certain 'count' register and (2) the loop count can be
> +      adjusted by modifying this register prior to the loop.
> +.     ??? Introducing such do-loop pattern may increase register
> +      pressure by adding an extra IV.  */
> +   if (optimize > 0 && flag_modulo_sched)
> +   {
> +     rtx s0;
> +     rtx bcomp;
> +     rtx loc_ref;
> +
> +     /* Only use this on innermost loops.  */
> +     if (INTVAL (operands[3]) > 1)
> +       FAIL;
> +     if (GET_MODE (operands[0]) != SImode)
> +       FAIL;
> +
> +     s0 = operands [0];
> +     emit_move_insn (s0, gen_rtx_PLUS (SImode, s0, GEN_INT (-1)));
> +     bcomp = gen_rtx_NE(SImode, s0, const0_rtx);
> +     loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [4]);
> +     emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx,
> +                                  gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp,
> +                                                        loc_ref,
pc_rtx)));
> +
> +     DONE;
> +   }
> + }")
> +
>  ;; convert between any two modes, avoiding any GCC assumptions
>  (define_expand "spu_convert"
>    [(set (match_operand 0 "spu_reg_operand" "")


> [attachment "patch_2_of_2_21.txt" deleted by Ayal Zaks/Haifa/IBM]
[attachment
> "sms-1.c.txt" deleted by Ayal Zaks/Haifa/IBM]
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]