This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Modulo-scheduling improvements. Patch 2 of 2.
- From: Zdenek Dvorak <rakdver at kam dot mff dot cuni dot cz>
- To: Ayal Zaks <ZAKS at il dot ibm dot com>
- Cc: Revital1 Eres <ERES at il dot ibm dot com>, abel at ispras dot ru, andrew_pinski at playstation dot sony dot com, dje at watson dot ibm dot com, gcc-patches at gcc dot gnu dot org, Mircea Namolaru <NAMOLARU at il dot ibm dot com>
- Date: Sat, 18 Aug 2007 16:04:52 +0200
- Subject: Re: [PATCH] Modulo-scheduling improvements. Patch 2 of 2.
- References: <OFD551E32A.189B4CD3-ONC2257331.00526D01-C2257338.005EA442@LocalDomain> <OFED032723.E54FA73C-ONC2257338.00711120-C225733A.007F913F@il.ibm.com>
> Revital1 Eres/Haifa/IBM wrote on 15/08/2007 20:13:44:
>
> > Hello,
> >
> > This is a re-submission of patch 2 of 2
> > (http://gcc.gnu.org/ml/gcc-patches/2007-01/msg01470.html).
> > It includes the comments received in -
> > (*) http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01508.html,
> > http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01582.html,
> > http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01747.html and
> > http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01674.html.
> >
> > (*) I've missed the last two comments made by Ayal regarding the
> > formatting of the code; If the rest of the patch is OK I would ask to
> > submit those in a follow-up patch.
> >
> > This patch also contains a fix to an ICE reported by Andrey Belevantsev
> > while testing sms on ia64. The ICE caused by mishandling of subregs
> > that appears in lhs of single set insns. It also appears on spu.
> > Such instructions can indicate that there is more than one def to a reg
> > and we choose to avoid those cases for now. Relevant testcase was added
> > for this bug.
> >
> > The documentation that was added to modulo-sched.c regarding do-loop
> > was taken from http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01774.html.
> >
> > Thanks to Andrey for testing the current version of trunk with sms flags
> > on ia64.
> >
> > Also thanks to Zdenek for his help.
> >
> > This patch was bootstrapped and tested on PPC and x86_64; with no new
> > regressions, and make-check with and without the patch without new
> > regressions for the C testsuite on spu.
> >
> > OK for mainline?
> >
>
> In addition to the changes requested by Andrey and Zdenek, here are my
> comments.
>
> Please add a note some place (e.g. when deciding where to put the branch)
> to revisit the issue of scheduling the insns of the control part relative
> to the branch, now that there's more to this part than a single parallel
> branch-on-count insn.
>
> Ayal.
>
>
> > Thanks,
> > Revital
> >
> > 2007-08-15 Mircea Namolaru <namolaru@il.ibm.com>
> > Vladimir Yanovsky <yanov@il.ibm.com>
> > Revital Eres <eres@il.ibm.com>
> >
> > * config/spu/spu.md: Recognize doloop pattern when -fmodulo-sched
> > is set.
> > * modulo-sched.c: Add documentation regarding do-loop.
> > (doloop_register_get): Support for the new doloop pattern.
> > (duplicate_insns_of_cycles): Do not duplicate the insn which
> > changes count_reg as it is already adjusted.
> > (generate_prolog_epilog): New argument to support the new
> > loop pattern.
> > (sms_schedule): Support for the new pattern and do not handle
> > loops single sets insns with subreg in their lhs or when
> COUNT_REG
> > has no other occurences in the loop until the decrement.
> > * loop-doloop.c (doloop_optimize): Support for the a doloop
> ^^^
>
> > pattern.
> > (doloop_condition_get): Gets an instruction instead of a pattern
> > and change the return condition when the doloop pattern is
> > not parallel.
> > * ddg.c (create_ddg_dep_from_intra_loop_link): Handle only reg
> > deps when considering to not create edges.
> >
> > * gcc.dg/sms-1.c: New test.
> >
>
> > Index: ddg.c
> > ===================================================================
> > --- ddg.c (revision 127390)
> > +++ ddg.c (working copy)
> > @@ -176,13 +176,16 @@
> > rtx set;
> >
> > set = single_set (dest_node->insn);
> > - if (set)
> > + /* TODO: Handle regsiters that REG_P is not true for them. */
> ^^^^^^^^^ , i.e.
> subregs. */
>
>
> >
> > + if (set && REG_P (SET_DEST (set)))
> > {
> > int regno = REGNO (SET_DEST (set));
> > - struct df_ref *first_def =
> > - df_bb_regno_first_def_find (g->bb, regno);
> > + struct df_ref *first_def;
> > struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g->bb);
> >
> > + first_def = df_bb_regno_first_def_find (g->bb, regno);
> > + gcc_assert (first_def);
> > +
> > if (bitmap_bit_p (bb_info->gen, first_def->id))
> > return;
> > }
> > Index: modulo-sched.c
> > ===================================================================
> > --- modulo-sched.c (revision 127390)
> > +++ modulo-sched.c (working copy)
> > @@ -82,8 +82,20 @@
> > perform modulo variable expansion for live ranges that span more
> than
> > II cycles (i.e. use register copies to prevent a def from
> overwriting
> > itself before reaching the use).
> > -*/
> >
> > + SMS works with countable loops (1) whose control part can be easily
> > + decoupled from the rest of the loop and (2) whose loop count can
> > + be easily adjusted. This is because we peel a constant number of
> > + iterations into a prologue and epilogue for which we want to avoid
> > + emitting the control part, and a kernel which is to iterate that
> > + constant number of iterations less than the original loop. So the
> > + control part should be a set of insns clearly identified and having
> > + its own iv, not otherwise used in the loop (at-least for now), which
> > + initializes a register before the loop to the number of iterations.
> > + Currently SMS relay on the do-loop pattern to recognize such loops.
> ^^^^^
> relies
> , where (1) the control part comprises of all insns defining and/or using a
> certain 'count' register and (2) the loop count can be adjusted by
> modifying this register prior to the loop.
>
> > + TODO: Relay on cfgloop analysis instead. */
> ^^^^^
> Rely
>
> > +
> > +
> >
> > /* This page defines partial-schedule structures and functions for
> > modulo scheduling. */
> > @@ -182,10 +194,11 @@
> > static void set_node_sched_params (ddg_ptr);
> > static partial_schedule_ptr sms_schedule_by_order (ddg_ptr, int, int,
> int *);
> > static void permute_partial_schedule (partial_schedule_ptr ps, rtx
> last);
> > -static void generate_prolog_epilog (partial_schedule_ptr ,struct loop *
> loop, rtx);
> > +static void generate_prolog_epilog (partial_schedule_ptr, struct loop
> *loop,
> > + rtx, rtx);
> > static void duplicate_insns_of_cycles (partial_schedule_ptr ps,
> > int from_stage, int to_stage,
> > - int is_prolog);
> > + int is_prolog, rtx count_reg);
> >
> > #define SCHED_ASAP(x) (((node_sched_params_ptr)(x)->aux.info)->asap)
> > #define SCHED_TIME(x) (((node_sched_params_ptr)(x)->aux.info)->time)
> > @@ -268,13 +281,29 @@
> > doloop_register_get (rtx insn ATTRIBUTE_UNUSED)
> > {
> > #ifdef HAVE_doloop_end
> > - rtx pattern, reg, condition;
> > + rtx reg, condition;
> >
> > - if (! JUMP_P (insn))
> > + /* The canonical doloop pattern we expect has one of the following
> > + forms:
> > +
> > + 1) (parallel [(set (pc) (if_then_else (condition)
> > + (label_ref (label))
> > + (pc)))
> > + (set (reg) (plus (reg) (const_int -1)))
> > + (additional clobbers and uses)])
> > +
> > + 2) (set (reg) (plus (reg) (const_int -1))
> > + (set (pc) (if_then_else (reg != 0)
> > + (label_ref (label))
> > + (pc))). */
> > +
> > + if (!JUMP_P (insn))
> > return NULL_RTX;
> >
> > - pattern = PATTERN (insn);
> > - condition = doloop_condition_get (pattern);
> > + if (!INSN_P (PREV_INSN (insn)))
> > + return NULL_RTX;
> > +
>
> Why do we need to check the insn preceding the (non-parallel) jump? Looks
> like this check belongs inside doloop_condition_get.
>
>
> > + condition = doloop_condition_get (insn);
> > if (! condition)
> > return NULL_RTX;
> >
> > @@ -401,6 +430,7 @@
> > nreg_moves = ----------------------------------- + 1 - {
> dependence.
> > ii { 1 if not.
> > */
> > +
> > static struct undo_replace_buff_elem *
> > generate_reg_moves (partial_schedule_ptr ps, bool rescan)
> > {
> > @@ -583,7 +613,7 @@
> >
> > static void
> > duplicate_insns_of_cycles (partial_schedule_ptr ps, int from_stage,
> > - int to_stage, int for_prolog)
> > + int to_stage, int for_prolog, rtx count_reg)
> > {
> > int row;
> > ps_insn_ptr ps_ij;
> > @@ -595,6 +625,12 @@
> > int j, i_reg_moves;
> > rtx reg_move = NULL_RTX;
> >
> > + /* Do not duplicate the insn which changes count_reg as it is
> > + already adjusted. */
> > + if (INSN_P (u_node->insn) && single_set (u_node->insn)
> > + && rtx_equal_p (count_reg, SET_DEST (single_set
> (u_node->insn))))
>
> Why not check if reg_mentioned_p? We want to avoid duplicating insns using
> count_reg, not only those defining it (as we avoid duplicating the latter).
> Also, the control part could more generally be composed of a
> decrement-compare-branch sequence of insns which define and use count_reg.
>
>
> > + continue;
> > +
> > if (for_prolog)
> > {
> > /* SCHED_STAGE (u_node) >= from_stage == 0. Generate increasing
> > @@ -643,23 +679,26 @@
> >
> > /* Generate the instructions (including reg_moves) for prolog & epilog.
> */
> > static void
> > -generate_prolog_epilog (partial_schedule_ptr ps, struct loop * loop, rtx
> count_reg)
> > +generate_prolog_epilog (partial_schedule_ptr ps, struct loop *loop,
> > + rtx count_reg, rtx count_init)
> > {
> > int i;
> > int last_stage = PS_STAGE_COUNT (ps) - 1;
> > edge e;
> > -
> > +
> > /* Generate the prolog, inserting its insns on the loop-entry edge.
> */
> > start_sequence ();
> >
> > - if (count_reg)
> > - /* Generate a subtract instruction at the beginning of the prolog to
> > - adjust the loop count by STAGE_COUNT. */
> > - emit_insn (gen_sub2_insn (count_reg, GEN_INT (last_stage)));
> > + if (count_reg && !count_init)
> ^^^^^^^^^
> This is redundant, as we assert earlier that count_reg is not NULL.
>
> >
> > + /* Generate a subtract instruction at the beginning of the prolog to
> > + adjust the loop count by STAGE_COUNT. */
>
> Also add to comment that if loop count is constant (count_init), this
> constant is adjusted by STAGE_COUNT elsewhere -- by the caller to
> generate_prolog_epilog.
>
> > + emit_move_insn (count_reg, gen_rtx_PLUS (GET_MODE (count_reg),
> > + count_reg,
> > + GEN_INT (-last_stage)));
> >
>
>
> Consider adjusting here the loop count also when it's a constant (instead
> of above suggested addition to comment), i.e.
>
> else
> /* count_init, set new constant iteration count of loop kernel. */
> SET_SRC (single_set (count_init)) = GEN_INT (loop_count - stage_count
> + 1);
>
>
> > for (i = 0; i < last_stage; i++)
> > - duplicate_insns_of_cycles (ps, 0, i, 1);
> > -
> > + duplicate_insns_of_cycles (ps, 0, i, 1, count_reg);
> > +
> > /* Put the prolog on the entry edge. */
> > e = loop_preheader_edge (loop);
> > split_edge_and_insert (e, get_insns ());
> > @@ -670,8 +709,8 @@
> > start_sequence ();
> >
> > for (i = 0; i < last_stage; i++)
> > - duplicate_insns_of_cycles (ps, i + 1, last_stage, 0);
> > -
> > + duplicate_insns_of_cycles (ps, i + 1, last_stage, 0, count_reg);
> > +
> > /* Put the epilogue on the exit edge. */
> > gcc_assert (single_exit (loop));
> > e = single_exit (loop);
> > @@ -910,18 +949,42 @@
> > if ( !(count_reg = doloop_register_get (tail)))
> > continue;
> >
> > + if (GET_CODE (PATTERN (tail)) != PARALLEL)
> > + {
> > + /* Check that the COUNT_REG has no other occurences in the loop
> > + until the decrement. */
> > + bool found = false;
> > +
> > + for (insn = head; insn != PREV_INSN (tail); insn = NEXT_INSN
> (insn))
> > + {
> > + if (!reg_mentioned_p (count_reg, insn))
> > + continue;
> > +
> > + found = true;
> > + break;
> > + }
>
> Can simplify to
> for (insn = head; insn != PREV_INSN (tail); insn = NEXT_INSN
> (insn))
> if (found = reg_mentioned_p (count_reg, insn))
> break;
>
> > + if (found)
> > + continue;
> > + }
> > +
>
> Fold the above into doloop_register_get(): it should return a valid
> count_reg iff the loop has a control part comprising of all insns defining
> and using this register, whose value dictates the number of loop
> iterations. Not sure why this check for no uses of count_reg inside loop
> applies only to non-parallel loop branch. Also please add a debug dump upon
> failure before continuing.
>
>
> > /* Don't handle BBs with calls or barriers, or !single_set insns,
> > or auto-increment insns (to avoid creating invalid reg-moves
> > for the auto-increment insns).
> > ??? Should handle auto-increment insns. */
> > - for (insn = head; insn != NEXT_INSN (tail); insn = NEXT_INSN
> (insn))
> > - if (CALL_P (insn)
> > - || BARRIER_P (insn)
> > - || (INSN_P (insn) && !JUMP_P (insn)
> > - && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE)
> > - || (FIND_REG_INC_NOTE (insn, NULL_RTX) != 0))
> > - break;
> > + for (insn = head; insn != NEXT_INSN (tail); insn = NEXT_INSN
> (insn))
> > + {
> > + rtx set;
> >
> > + if (CALL_P (insn)
> > + || BARRIER_P (insn)
> > + || (INSN_P (insn) && !JUMP_P (insn)
> > + && !single_set (insn) && GET_CODE (PATTERN (insn)) !=
> USE)
> > + || (FIND_REG_INC_NOTE (insn, NULL_RTX) != 0)
> > + || (INSN_P (insn) && (set = single_set (insn))
> > + && GET_CODE (SET_DEST (set)) == SUBREG))
> > + break;
> > + }
> > +
> > if (insn != NEXT_INSN (tail))
> > {
> > if (dump_file)
> > @@ -932,8 +995,11 @@
> > fprintf (dump_file, "SMS loop-with-barrier\n");
> > else if (FIND_REG_INC_NOTE (insn, NULL_RTX) != 0)
> > fprintf (dump_file, "SMS reg inc\n");
> > - else
> > - fprintf (dump_file, "SMS loop-with-not-single-set\n");
> > + else if ((INSN_P (insn) && !JUMP_P (insn)
> > + && !single_set (insn) && GET_CODE (PATTERN (insn)) !=
> USE))
> > + fprintf (dump_file, "SMS loop-with-not-single-set\n");
> > + else
> > + fprintf (dump_file, "SMS loop with subreg in lhs\n");
> > print_rtl_single (dump_file, insn);
> > }
> >
> > @@ -1112,9 +1178,9 @@
> > print_node_sched_params (dump_file, g->num_nodes, g);
> > /* Generate prolog and epilog. */
> >
> > if (count_reg && !count_init)
> > - generate_prolog_epilog (ps, loop, count_reg);
> > + generate_prolog_epilog (ps, loop, count_reg, NULL_RTX);
> > else
> > - generate_prolog_epilog (ps, loop, NULL_RTX);
> > + generate_prolog_epilog (ps, loop, count_reg, count_init);
>
> Why not simplify into a single:
> generate_prolog_epilog (ps, loop, count_reg, count_init);
>
>
> >
> > free_undo_replace_buff (reg_move_replaces);
> > }
> > Index: loop-doloop.c
> > ===================================================================
> > --- loop-doloop.c (revision 127390)
> > +++ loop-doloop.c (working copy)
> > @@ -69,36 +69,54 @@
> > if it is not a decrement and branch jump insn. */
> >
> > rtx
> > -doloop_condition_get (rtx pattern)
> > +doloop_condition_get (rtx doloop_pat)
> > {
> > rtx cmp;
> > rtx inc;
> > rtx reg;
> > rtx inc_src;
> > rtx condition;
> > + rtx pattern;
> >
> > - /* The canonical doloop pattern we expect is:
> > + /* The canonical doloop pattern we expect has one of the following
> > + forms:
> >
> > - (parallel [(set (pc) (if_then_else (condition)
> > - (label_ref (label))
> > - (pc)))
> > - (set (reg) (plus (reg) (const_int -1)))
> > - (additional clobbers and uses)])
> > + 1) (parallel [(set (pc) (if_then_else (condition)
> > + (label_ref (label))
> > + (pc)))
> > + (set (reg) (plus (reg) (const_int -1)))
> > + (additional clobbers and uses)])
> >
> > - Some targets (IA-64) wrap the set of the loop counter in
> > - an if_then_else too.
> > + The branch must be the first entry of the parallel (also required
> > + by jump.c), and the second entry of the parallel must be a set of
> > + the loop counter register. Some targets (IA-64) wrap the set of
> > + the loop counter in an if_then_else too.
> >
> > - In summary, the branch must be the first entry of the
> > - parallel (also required by jump.c), and the second
> > - entry of the parallel must be a set of the loop counter
> > - register. */
> > + 2) (set (reg) (plus (reg) (const_int -1))
> > + (set (pc) (if_then_else (reg != 0)
> > + (label_ref (label))
> > + (pc))). */
> >
> > + pattern = PATTERN (doloop_pat);
> > +
> > if (GET_CODE (pattern) != PARALLEL)
> > - return 0;
> > + {
> > + rtx cond;
> >
> > - cmp = XVECEXP (pattern, 0, 0);
> > - inc = XVECEXP (pattern, 0, 1);
> > + cmp = pattern;
> > + inc = PATTERN (PREV_INSN (doloop_pat));
>
> This is probably where we need to check that
> + if (INSN_P (PREV_INSN (doloop_pat)))
it won't harm (and also checking that PREV_INSN (doloop_pat) != NULL_RTX);
however, since doloop_pat comes from the machine description, we could
also just trust it.
Zdenek