This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Implementing Swing Modulo Scheduling in GCC



We addressed the comments below.  The troubled code (unrolling and
renaming) was removed and replaced by a direct dependence computation
using df.c.  We also added support for loops with unknown bounds.

Here is the revised patch relative to mainline.   Passed regression
and bootstrap on powerpc-apple-darwin7.2.0 target.

ChangeLog:

2004-04-15  Ayal Zaks  <zaks@il.ibm.com>
      Mostafa Hagog  <mustafa@il.ibm.com>

      * Makefile.in (modulo-sched.o, ddg.o): New.
      * ddg.h, ddg.c, modulo-sched.c: New files.
      * cfglayout.c (duplicate_insn_chain): Remove "static" and push
      internals to "dupicate_insn".
      (duplicate_insn): New function.
      * cfglayout.h (duplicate_insn_chain, duplicate_insn): New
declarations.
      * common.opt (fmodulo-sched): New flag.
      * df.c (df_bb_regno_last_use_find, df_bb_regno_first_def_find):
Remove
      static and forward declaration.
      (df_find_def, df_reg_used, df_bb_regno_last_def_find) : New
functions.
      * df.h (df_bb_regno_last_use_find, df_bb_regno_first_def_find,
      df_bb_regno_last_def_find, df_find_def, df_reg_used): New
declarations.
      * flags.h (flag_modulo_sched): New flag.
      * opts.c (case OPT_fmodulo_sched): Handle new flag.
      * params.def (max-sms-loop-number, sms-max-ii-factor,
sms-dfa-history,
      sms-loop-average-count-threshold): New parameters.
      * params.h (MAX-SMS-LOOP-NUMBER, SMS-MAX-II-FACTOR, SMS_DFA_HISTORY,
      SMS_LOOP_AVERAGE_COUNT_THRESHOLD): New parameters.
      * passes.c ("sms", "sms-vcg"): New dumps.
      (rest_of_handle_sched): Call sms_schedule.
      * rtl.h (sms_schedule): New declaration.
      * timevar.def (TV_SMS): New.
      * toplev.c (flag_modulo_sched): Initialize.
      ("modulo-sched"): Handle -f option.
      * docs/invoke.texi: Document -fmodulo-sched option.
      * docs/passes.texi: Document new SMS pass.


Mostafa and Ayal.




Vladimir Makarov <vmakarov@redhat.com> wrote on 26/03/2004 21:52:51:

>
> Hi, Ayal and Mostafa.
>
>   Sorry for the delay with review.  It took time because the patch
> is big and not trivial.  I still examined some patch code and probably
> will write more.
>
>
> > This patch includes the initial version of the modulo scheduler
> > (following http://gcc.gnu.org/ml/gcc/2004-02/msg00071.html).
> > It is implemented as a separate phase immediately preceding the
> > first schedule_insn pass, and is activated by -fmodulo-sched flag
> > (it is inactive by default).
> >
>
>   Have you tested compiler with this patch?  It would be reasonable to
> make bootstrap with the option is switched on by default.
>
>   It is interesting how gcc is slower when the option is on.
>
>   Also this is a machine-independent patch so it should be tested on
> some other platforms too.
>
>   It would be great to post some benchmark results of usage of this
> patch.  I think nobody expects improvements for SPEC because MS is too
> specific optimization oriented to scientific application.  But results
> for any benchmark (like sorting, fft, matrix multiplication or
> whetstone) could say about potential of the optimization.
>
> > The simple example
> >
> >   float dot (float* a, float* b, int n)
> >   {
> >     int i;
> >     float result = 0;
> >
> >     for (i=0; i<7; i++)
> >       result += a[i]*b[i];
> >
> >     return result;
> >   }
> >
> > is modulo-scheduled (on powerpc-apple-darwin6.4 with -O3
-fmodulo-sched)
> > into:
> >
> > _dot:
> >         mflr r6
> >         stw r31,-8(r1)
> >         li r2,0
> >         li r0,6         /* adjusting loop count */
> >         bcl 20,31,"L00000000001$pb"
> > "L00000000001$pb":
> >         mtctr r0
> >         slwi r5,r2,2
> >         addi r2,r2,1
> >         mflr r31
> >         stw r6,16(r1)
> >         lfsx f13,r5,r3  /* prolog */
> >         lfsx f0,r5,r4   /* prolog */
> >         addis r9,r31,ha16(LC0-"L00000000001$pb")
> >         lfs f1,lo16(LC0-"L00000000001$pb")(r9)
> > L8:
> >         slwi r7,r2,2
> >         fmadds f1,f13,f0,f1
> >         addi r2,r2,1
> >         lfsx f13,r7,r3
> >         lfsx f0,r7,r4
> >         bdnz L8
> >         lwz r3,16(r1)
> >         fmadds f1,f13,f0,f1  /* epilog */
> >         lwz r31,-8(r1)
> >         mtlr r3
> >         blr
> >
> > The following patch was regression tested and bootstrapped on
> > powerpc-apple-darwin7.2.0 platform.
> >
>
> Have you bootsrapped it with option switched on?
>
> > Ok for mainline?
> >
>
> I am sorry there are a lot of pitfalls in the patch which should be
> fixed.  The most serious one is register renaming.
>
> May be it is reasonable to create a branch for this optimization
> (people could start experimenting with the patch on other platforms
> and make the optimization more mature) but it is up to you.
>
>
> > 2004-03-12  Ayal Zaks  <zaks@il.ibm.com>
> >       Mostafa Hagog  <mustafa@il.ibm.com>
> >
> >       * Makefile.in: New files modulo-sched.o, ddg.o.
> >       * ddg.h: New file with interface of a Data Dependence Graph
> >       for simple loops.
> >       * ddg.c: New file with implementation of a Data Dependence Graph
> >       for simple loops.
> >       * modulo-sched.c: New file with implementation of SMS modulo
> >       scheduler.
> >       * cfglayout.c (duplicate_insn_chain): Remove "static" and push
> >       internals out to a separate new "dupicate_insn" function.
> >       (duplicate_insn): New function - the internals of the above.
> >       * common.opt (fmodulo-sched): New flag for activating modulo
> > scheduling.
> >       * flags.h (flag_modulo_sched): Same as above.
> >       * opts.c (flag_modulo_sched): Set to be default.
> >       * params.def (max-sms-loop-number): New flag limiting the modulo
> >       scheduler to a given number of loops, for debugging.
> >       (sms-max-ii-factor): New flag, for tuning when to quit modulo
> > scheduling.
> >       * params.h (MAX-SMS-LOOP-NUMBER): Same as above.
> >       (SMS-MAX-II-FACTOR): Same as above.
> >       * toplev.c: Add a phase calling sms_schedule, with -dm for dumps.
> >       * docs/invoke.texi: Document the new -fmodulo-sched option.
>
>   The changelog is inaccurate, entries for passes.c, rtl.h,
> timevar.def are absent.  The changelog entries are a bit verbose (this
> is a very minor drawback).  They should describe only changes.
>
>   Probably, you should describe this optimization in doc/passes.texi and
> describe the optimization parameters (at least sms-max-ii-factor) in
> the documentation.
>
> > *** Makefile.in   9 Mar 2004 01:53:24 -0000   1.1261
> > --- Makefile.in   11 Mar 2004 13:04:34 -0000
> > *************** OBJS-common = \
> > *** 857,862 ****
> > --- 857,863 ----
> >   insn-extract.o insn-opinit.o insn-output.o insn-peep.o insn-recog.o
\
> >   integrate.o intl.o jump.o  langhooks.o lcm.o lists.o local-alloc.o
\
> >   loop.o optabs.o options.o opts.o params.o postreload.o predict.o
\
> >+  modulo-sched.o ddg.o                     \
>
> File names are placed in alphabetic order.
>
> >   print-rtl.o print-tree.o value-prof.o var-tracking.o         \
> >   profile.o ra.o ra-build.o ra-colorize.o ra-debug.o ra-rewrite.o
\
> >   real.o recog.o reg-stack.o regclass.o regmove.o regrename.o         \
> > *************** alias.o : alias.c $(CONFIG_H) $(SYSTEM_H
> > *** 1807,1812 ****
> > --- 1808,1816 ----
> >   regmove.o : regmove.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H)
$(RTL_H)
> insn-config.h \
> >      $(RECOG_H) output.h $(REGS_H) hard-reg-set.h flags.h function.h \
> >      $(EXPR_H) $(BASIC_BLOCK_H) toplev.h $(TM_P_H) except.h reload.h
> > + ddg.o : ddg.c ddg.h $(CONFIG_H) $(SYSTEM_H) $(TM_H) $(RTL_H)
> > + modulo-sched.o : modulo-sched.c ddg.h cfgloop.h $(CONFIG_H) \
> > +    $(SYSTEM_H) $(TM_H) $(RTL_H)
>
> The makefile entries for ddg.o and modulo-sched.o are incomplete.  A
> lot of files from which give tow files depend are missed.
>
> > diff -b -c -p -r1.54 cfglayout.c
> > *** cfglayout.c   25 Feb 2004 20:00:00 -0000   1.54
> > --- cfglayout.c   11 Mar 2004 13:04:35 -0000
> > *************** static void change_scope (rtx, tree, tre
> > *** 55,61 ****
> >
> >   void verify_insn_chain (void);
> >   static void fixup_fallthru_exit_predecessor (void);
> > ! static rtx duplicate_insn_chain (rtx, rtx);
> >   static tree insn_scope (rtx);
> >
> >   rtx
> > --- 55,62 ----
> >
> >   void verify_insn_chain (void);
> >   static void fixup_fallthru_exit_predecessor (void);
> > ! void duplicate_insn (rtx);
> > ! rtx duplicate_insn_chain (rtx, rtx);
> >   static tree insn_scope (rtx);
>
> It is a bad practice to use several external declarations of one
> object.  The external declarations should be moved into cfglayout.h
> (or at least to rtl.h).
>
> > diff -b -c -p -r1.38 params.def
> > *** params.def   3 Mar 2004 16:32:38 -0000   1.38
> > --- params.def   11 Mar 2004 13:04:35 -0000
> > *************** DEFPARAM(PARAM_MAX_GCSE_PASSES,
> > *** 131,136 ****
> > --- 131,149 ----
> >      "max-gcse-passes",
> >      "The maximum number of passes to make when doing GCSE",
> >      1)
> > + /* The maximum number of loops we want to perfrom SMS on them (we use
this
> > +    for debuggin).  */
> > + DEFPARAM(PARAM_MAX_SMS_LOOP_NUMBER,
> > +         "max-sms-loop-number",
> > +    "The maximum number of loops we want to perfrom SMS on them\
> > +          (we use this for debuggin).",
>                                      ^ typo
>
> It would be great if you check all code for other typos too.  I found
> some of them.  But it can be done later.
>
> > +    -1)
> > +
> > + /* This parameter is used to tune SMS MAX II calculations.  */
> > + DEFPARAM(PARAM_SMS_MAX_II_FACTOR,
> > +         "sms-max-ii-factor",
> > +    "This parameter is used to tune SMS MAX II calculations",
> > +    10)
> > .
>
> > *** passes.c   3 Mar 2004 16:32:38 -0000   2.3
> > --- passes.c   11 Mar 2004 13:04:35 -0000
> > *************** enum dump_file_index
> > *** 154,159 ****
> > --- 154,161 ----
> >     DFI_combine,
> >     DFI_ce2,
> >     DFI_regmove,
> > +   DFI_sms,
> > +   DFI_sms_vcg,
> >     DFI_sched,
> >     DFI_lreg,
> >     DFI_greg,
> > *************** enum dump_file_index
> > *** 178,184 ****
> >
> >      Remaining -d letters:
> >
> > !    "   e        m   q         "
> >      "          K   O Q     WXY "
> >   */
> >
> > --- 180,186 ----
> >
> >      Remaining -d letters:
> >
> > !    "   e            q         "
> >      "          K   O Q     WXY "
> >   */
> >
> > *************** static struct dump_file_info dump_file_t
> > *** 207,212 ****
> > --- 209,216 ----
> >     { "combine",   'c', 1, 0, 0 },
> >     { "ce2",   'C', 1, 0, 0 },
> >     { "regmove",   'N', 1, 0, 0 },
> > +   { "sms",      'm', 0, 0, 0 },
>                          ^1
> You could still permit to generate vcg file for all file.  But you can
> ignore this.
>
>
> > +   { "sms-vcg",  'm', 0, 0, 0 },
> >     { "sched",   'S', 1, 0, 0 },
> >     { "lreg",   'l', 1, 0, 0 },
> >     { "greg",   'g', 1, 0, 0 },
>
> > *************** rest_of_handle_reorder_blocks (tree decl
> > *** 740,745 ****
> > --- 744,775 ----
> >   static void
> >   rest_of_handle_sched (tree decl, rtx insns)
> >   {
> > +   timevar_push (TV_SMS);
> > +   if (optimize > 0 && flag_schedule_insns && flag_modulo_sched)
> > +     {
> > +        FILE *vcg_dump_file;
> > +
> > +        open_dump_file (DFI_sms_vcg, decl);
>
> This file is never closed.  It is dangerous because it is opened for
> each function.  I feel guilty because I asked about this feature.  You
> could restore dump_file after closing the standard one and call
> close_dump_file with 2 last parameters equal to NULL.
>
>
> > +        vcg_dump_file = dump_file;
> > +
> > +        /* Perform SMS module scheduling.  */
> > +        open_dump_file (DFI_sms, decl);
> > +
> > +        /* We want to be able to create new pseudos.  */
> > +        no_new_pseudos = 0;
> > +        sms_schedule (dump_file, vcg_dump_file);
> > +        close_dump_file (DFI_sms, print_rtl, get_insns ());
> > +
> > +        /* Update the life information, becuase we add pseudos.  */
> > +        max_regno = max_reg_num ();
> > +        allocate_reg_info (max_regno, FALSE, FALSE);
> > +        update_life_info_in_dirty_blocks (UPDATE_LIFE_GLOBAL_RM_NOTES,
> > +                                          (PROP_DEATH_NOTES
> > +                                           | PROP_KILL_DEAD_CODE
> > +                                           | PROP_SCAN_DEAD_CODE));
> > +        no_new_pseudos = 1;
> > +      }
> > +   timevar_pop (TV_SMS);
> >     timevar_push (TV_SCHED);
>
> > *** rtl.h   9 Mar 2004 17:06:21 -0000   1.465
> > --- rtl.h   11 Mar 2004 13:04:36 -0000
> > *************** extern void simplify_using_condition (rt
> > *** 2467,2470 ****
> > --- 2467,2472 ----
> >   /* In ra.c.  */
> >   extern void reg_alloc (void);
> >
> > + /* In modulo-sched.c.  */
> > + extern void sms_schedule (FILE*, FILE*);
> >   #endif /* ! GCC_RTL_H */
>
> Usually prototypes which contains FILE are by #ifdef BUFSIZ.
>
> > Index: doc/invoke.texi
> > ===================================================================
> > RCS file: /cvs/gcc/gcc/gcc/doc/invoke.texi,v
> > retrieving revision 1.425
> > diff -b -c -p -r1.425 invoke.texi
> > *** doc/invoke.texi   10 Mar 2004 06:02:55 -0000   1.425
> > --- doc/invoke.texi   11 Mar 2004 13:04:39 -0000
> > *************** invoking @option{-O2} on programs that u
> > *** 3653,3659 ****
> >   @opindex O3
> >   Optimize yet more.  @option{-O3} turns on all optimizations specified
by
> >   @option{-O2} and also turns on the @option{-finline-functions},
> > ! @option{-fweb} and @option{-frename-registers} options.
> >
> >   @item -O0
> >   @opindex O0
> > --- 3653,3659 ----
> >   @opindex O3
> >   Optimize yet more.  @option{-O3} turns on all optimizations specified
by
> >   @option{-O2} and also turns on the @option{-finline-functions},
> > ! @option{-fweb} , @option{-fmodule-sched} and
@option{-frename-registers} options.
>                                     ^ typo.
> > *************** sense when scheduling before register al
> > *** 4084,4089 ****
> > --- 4084,4093 ----
> >   @opindex fsched-stalled-insns
> >   Define how many insns (if any) can be moved prematurely from the
queue
> >   of stalled insns into the ready list, during the second scheduling
pass.
> > +
> > + @item -fmodulo-sched
> > + @opindex fmodulo-sched
> > + Perform SMS modulo scheduling before the first scheduling pass.
> >
>
> You should add that the option requires -fschedule-insns.
>
> >   @item -fsched-stalled-insns-dep=@var{n}
> >   @opindex fsched-stalled-insns-dep
>
> -------------------------------------ddg.c----------------------------
>
> > #include "expr.h"  /* For gen_move_insn.  */
>
> The comment is not necessary.
>
> > #include "ddg.h"
> >
> > extern void duplicate_insn (rtx);  /* From cfglayout.c.  */
>
> It is a bad practice.  It should be moved into cfglayout.h.
>
> >
> > #define RENAMED_REG_P(x)   (((x)->aux.info && \
> >               GET_CODE ((rtx)(x)->aux.info) == REG) ? 1 : 0)
>
> Macros should have a comment.  According to GNU C standard the
> operator (&&) should be 1st on the new line.
>
> > #define RENAMED_REG(x)     ((x)->aux.info)
> >
> > #define ARTIFICIAL_P(x)    (((x)->aux.info && \
> >               GET_CODE ((rtx)(x)->aux.info) != REG) ? 1 : 0)
> > #define ART2ORIG(x)   ((x)->aux.info)
> >
> > /* Returns non-zero if this is a load insn.  */
> > #define LOAD_INSN_P(x) (single_set (x) && \
> >          (GET_CODE (SET_SRC (single_set (x)))) == MEM \
> >          ? 1 : 0)
> >
> > /* Returns non-zero if this is a store insn.  */
> > #define STORE_INSN_P(x) (single_set (x) && \
> >           (GET_CODE (SET_DEST (single_set (x)))) == MEM \
> >           ? 1 : 0)
> >
>
> This code incorrectly defines at least stores.  E.g. CISC architecture
> can have MEM <- OP (MEM, REG).  I understand this code is not
> important.  It is used only to gather statistics.  But it is better to
> rename the macros or more accurately determine load/store.
>
> Another thing, SUBREG could contain MEM to.
>
> And the last thing, you are calling single_set two times.  It could be
> avoided (e.g. by using a function).
>
> > /* Returns non-zero if this is a load or a store insn.  */
> > #define LOAD_OR_STORE_INSN_P(x) (LOAD_INSN_P (x) || STORE_INSN_P (x) \
> >              ? 1 : 0)
>
> This macro is not used anywhere.  It should be removed.
>
> >
> > enum edge_flag {NOT_IN_SCC=0, IN_SCC};
> >
>
> Global definition should have a comment.
>
> > /* This array holds the mapping for undoing the renaming.  */
> > static rtx *rename_reg_map = NULL;
> >
> > static unsigned max_rename_reg_num = 0;
>
> Global variable should have a comment.
>
> > static int
> > is_ldst_insn (rtx insn)
> > {
> >   rtx pat = single_set (insn);
> >
> >   return (pat
> >      && (GET_CODE (SET_DEST (pat)) == MEM
> >          || GET_CODE (SET_SRC (pat)) == MEM));
> > }
>
> Please, see the comment above for LOAD_INSN_P.  This code is more
> important because it is used to recognize a memory dependence.
>
> Generally speaking, the optimization implementation is oriented to
> RISC architecture.  Gcc supports other architectures too.
>
> > /* Computes the dependence parameters (latency, distance ect.), creates
>                                                              ^
>     There are a lot of typos in the code.
>
> >    a ddg_edge and adds it to the given DDG.  */
>
> > /* The following three functions build the data dependence graph
including
>
>               ^ I think it is obsolete.  There are more three
> function after you added register renaming.
>
> >    loop-carried dependences, by first creating an artificial copy of
the
> >    loop (by unrolling), then calculating intra-(extended)-block
dependences,
> >    and finally generating ddg_edges for such dependences and removing
the
> >    artificial copy.  */
>
>
> This function is most troubled code.
>
> > /* This function tries to keep the DEFs in a semi SSA form, when it
sees a
> >    def of an already existing live-range it renames it and changes all
its
> >    next uses. This is done after creating the artificial copies of the
BB
>
>                ^ Sentences should be separated by two blanks.  There
> are a lot of such places.
>
> >    insns - i.e. the dummy unrolling - in order to have correct alias
> >    analysis.
> >    Side effect : This function adds register ANTI_DEP arcs to the DDG
> >        before renaming the registers because the dep-analysis
> >        will not see these dependencies after renaming.  */
> > static void
> > rename_redefined_regs (ddg_ptr g)
> > {
> >   int i, j;
> >   int num_nodes = g->num_nodes;
> >   sbitmap def_to_rename = sbitmap_alloc (g->num_nodes);
> >
> >   sbitmap_zero (def_to_rename);
> >   max_rename_reg_num = max_reg_num ();
> >
> >   /* Scan all pairs of (i < j) = (insn, def) insns, checking if def
sets a reg
> >      used earlier by insn.  If so, record it in def_to_rename.  */
> >   for (i = 0; i < num_nodes - 1; i++)
> >     {
> >       rtx insn = g->nodes[i].insn;
> >       ddg_node_ptr use_node = &g->nodes[i];
> >
> >       for (j = i+1 ; j < num_nodes; j++)
>
>                   ^ Operator should be surrounded by blanks.  There
> are lot of such places.
>
> The algorithm is O(n**2) complexity and can be improved.  But probably
> it is ok because SP works on only small loops.
>
> >    {
> >      rtx reg;
> >      rtx def = g->nodes[j].insn;
> >      ddg_node_ptr def_node = &g->nodes[j];
> >
> >      if (! single_set (def)
> >          || GET_CODE (SET_DEST (PATTERN (def))) != REG)
> >        continue;
> >
>
> You could improve renaming (make it in more places) if you process
> more complex cases.  The definition could be SUBREG, ZERO_EXTRACT etc.
> It could be not only single_set.  In general code is oriented to
> rs6000.  It should be more machine-independent.
>
> You could look at df.c (reaching definitions analysis) and web.c
> (register renaming) using df how to make it.
>
> Another thing, you can make code worse for some architectures.  Insns
> can requre that two operands have the same register (please see
> regmove.c).  In this case renaming will result in generation of
> an additional move insn by reload.
>
> >      reg = SET_DEST (PATTERN (def));
>
> Are you going to rename hard registers too.  It is wrong.  They should
> never renamed.  They have special purposes like return/argument
> registers.
>
> >      if (reg_referenced_p (reg, PATTERN (insn))
> >          || (GET_CODE (insn) == CALL_INSN
> >         && find_reg_fusage (insn, USE, reg)))
> >        {
> >          SET_BIT (def_to_rename, j);
> >          max_rename_reg_num++;
> >
> >          /* Add ANTI dependence because the coming dep-analysis will
> >        not see it; if def is an artificial node we do not
> >        add the ANTI (or any other) dependence.  */
> >          if (! ARTIFICIAL_P (def_node))
> >       {
> >         ddg_edge_ptr e = create_ddg_edge (use_node, def_node,
> >                       ANTI_DEP, REG_DEP, 0, 0);
> >         add_edge_to_ddg (g, e);
> >
> >       }
> >        }
> >    }
> >     }
> >
> >     rename_reg_map = xcalloc (max_rename_reg_num, sizeof (rtx));
> >     /* Now rename defs marked in def_to_rename, along with their uses.
*/
> >     /* ??? Could use EXECUTE_IF_SET_IN_SBITMAP (def_to_rename, 0, i,.
*/
> >     for (i = 0; i < num_nodes; i++)
> >       {
> >    rtx reg, new_reg;
> >    int last_def = 1;
> >
> >    if (! TEST_BIT (def_to_rename, i))
> >      continue;
> >
> >    reg = SET_DEST (PATTERN (g->nodes[i].insn));
> >    new_reg = gen_reg_rtx (GET_MODE (reg));
>
> You should set up the same attributes for the new regs (please look at
> web.c for example).
>
> >    SET_DEST (PATTERN (g->nodes[i].insn)) = new_reg;
> >
> >    /* Now rename uses.  */
> >    for (j = i + 1; j < num_nodes; j++)
> >      {
> >        rtx use_insn = g->nodes[j].insn;
> >        rtx set = single_set (use_insn);
> >
> >        if (! set)
> >          continue;
> >
>
>   This is wrong.  After renaming definition, you should rename all
> occurrences of the register not only in some insns.  You should also
> rename register notes (like REG_EQUAL) and mention that register notes
> REG_DEAD and REG_UNUSED might be inaccurate and will be fixed by live
> analysis later.  It is strange that gcc did not crash or generate
> incorrect code.  Probably you were lucky.
>
> >        if (REGNO (new_reg) >= max_rename_reg_num)
> >          abort ();
> >
> >        rename_reg_map[REGNO (new_reg)] = reg;
> >
> >        /* TODO: If use_insn uses reg in an address computation (reg +
const)
> >           and def is an increment of reg (reg = reg + const'), then
> >           replace (reg + const) by (reg + (const + const')) instead of
> >           renaming it to (new_reg + const).  */
> >        SET_SRC (set) = replace_rtx (SET_SRC (set), reg, new_reg);
> >
> >        if (SET_DEST (set) == reg)
> >          {
> >       /* A subsequent def kills the live range.  */
> >       last_def = 0;
> >       break;
> >          }
> >
> >        SET_DEST (set) = replace_rtx (SET_DEST (set), reg, new_reg);
> >      }
> >    /* Generate a move from the last renamed register to the original
reg.  */
> >    if (last_def)
> >      emit_insn_before (gen_move_insn (reg, new_reg) ,
>                                                    ^ unecessary blank
> >              g->closing_branch->first_note);
> >
> >    if (!ARTIFICIAL_P (&g->nodes[i]))
> >      RENAMED_REG (&g->nodes[i]) = reg;
> >       }
> > }
>
>
> > int
> > longest_simple_path (struct ddg * g, int src, int dest, sbitmap nodes)
> > {
> >   while (change)
> >     {
> >       change = 0;
> >       sbitmap_copy (workset, tmp);
> >       sbitmap_zero (tmp);
> >       EXECUTE_IF_SET_IN_SBITMAP (workset, 0, u,
>
> It is better to put only small code in EXECUTE_IF_SET_IN_SBITMAP for
> debugging purposes.  But you can ignore it.
>
> >    {
> >      ddg_node_ptr u_node = &g->nodes[u];
>
> You should add a blank line here.  It is usual practice to separate
> definitions and statements.  This is not the single place.
>
> >      for (e = u_node->out; e; e = e->next_out)
> >        {
> >          ddg_node_ptr v_node = e->dest;
> >          v = v_node->cuid;
> >
>
> ------------------------------modulo-sched.c----------------------------
>
> > #include "expr.h"  /* For gen_move_insn.  */
> > #include "params.h"  /* For MAX_SMS_LOOP_NUMBER.  */
> > #include "gcov-io.h" /* For profile_info. */
>
> These comments are not necessary.
>
> You should put a reference to the article describing SMS and describe
> the algorithm in brief.
>
> > /* Perform signed modulo, always returning a non-negative value.  */
> > #define SMODULO(x,y) ((x)%(y) < 0 ? ((x)%(y) + (y)) : (x)%(y))
>                            ^ Blanks around operators.
>
> > /* Holds the partial schedule as an array of II rows. Each entry of the
> >    array points to a linked list of PS_INSNs, which represents the
> >    instructions that are scheduled for that row.  */
> > struct partial_schedule
> > {
> >   int ii;
> >   int history;  /* Threshold for conflict checking using DFA.  */
> >   ps_insn_ptr *rows;
> >   int min_cycle;
> >   int max_cycle;
>
> There are no comments for min_cycle/max_cycle and other members.
>
> >   ddg_ptr g;
> > };
>
>
> > extern int issue_rate;
>
> It is a bad practice.  Please put it sched-int.h add '#include'.
>
> > /* From cfglayout.c.  */
> > rtx duplicate_insn_chain (rtx, rtx);
>
> The same.  Please put it in cfglayout.h
>
> > /* The scheduling parameters held for each node include a lower-bound
> >    on it's scheduling cycle (asap); the absolute cycle we decided to
> >    schedule it (time; time >= asap); a pointer to the first register-
> >    move instruction added to handle the modulo-variable-expansion (mve)
> >    of the register defined by this node (first_reg_move; copies the
> >    original register defined by the node); the number of such register-
> >    move instructions generated (nreg_moves; they immediately preceed
> >    first_reg_move); the row and stage of the node (caching results of
> >    % and /) and the column of the node in the partial schedule.  */
>
> It is hard to read.  Please some formating.
>
> >   /* Check if condition = (ne (reg) (const_int 1)), which is more
> >      restrictive than the check in doloop_condition_get:
> >      if ((GET_CODE (condition) != GE && GET_CODE (condition) != NE)
> >     || GET_CODE (XEXP (condition, 1)) != CONST_INT).  */
>
> Why do you use more restrict condition?
>
> >   if (GET_CODE (condition) != NE
> >       || XEXP (condition, 1) != const1_rtx)
> >     return NULL_RTX;
>
> > static rtx
> > const_iteration_count (rtx count_reg, basic_block pre_header, int *
count)
>
> Count should be of HOST_WIDE_TYPE.  This is a type what INTVAL returns.
>
> > {
> >   rtx insn;
> >   rtx head, tail;
> >   get_block_head_tail (pre_header->index, &head, &tail);
> >
> >   for (insn = tail; insn != PREV_INSN (head); insn = PREV_INSN (insn))
> >     if (INSN_P (insn) && single_set (insn) &&
> >    rtx_equal_p (count_reg, SET_DEST (single_set (insn))))
> >       {
> >    rtx pat = single_set (insn);
> >
> >    if (GET_CODE (SET_SRC (pat)) == CONST_INT)
> >      {
> >        *count = INTVAL (SET_SRC (pat));
> >
>
> > /* A very simple resource-based lower bound on the initiation interval.
> >    Todo: improve the accuracy of this bound by considering the
> >    utilization of various units.  */
>
> ??? is used usually for to do in gcc code.
>
> > /* Calculate an upper bound for II. SMS should not schedule the loop if
it
>                                      ^ Two blanks.  There are lot of
> such places.
>
> >    requires more cycles than this bound. Currently set to the sum of
the
> >    longest latency edge for each node. Reset based on experiments.  */
> > static int
> > calculate_maxii (ddg_ptr g)
> > {
> >   int i;
> >   int maxii = 0;
> >
> >   for (i=0; i<g->num_nodes; i++)
>           ^    ^  Blanks, please.
> >     {
> >       ddg_node_ptr u = &g->nodes[i];
> >       ddg_edge_ptr e;
> >       int max_edge_latency = 0;
> >
> >       for (e = u->out; e ; e = e->next_out)
> >    max_edge_latency = MAX (max_edge_latency , e->latency);
>                                                 ^ unessary blank.
> >       /* Walk throught the DFA for the current row.  */
>                        ^ typo
>
>
> > /* Rotate the rows of PS such that insns scheduled at time
> >    START_CYCLE will appear in row 0.  Updates max/min_cycles.  */
> > void
> > rotate_partial_schedule (partial_schedule_ptr ps, int start_cycle)
> > {
> >   int i, row, backward_rotates;
> >   int last_row = ps->ii - 1;
> >
> >   if (start_cycle == 0)
> >     return;
> >
> >   backward_rotates = SMODULO (start_cycle, ps->ii);
> >
> >   /* Revisit later and optimize this into a single loop.  */
> >   for (i = 0; i < backward_rotates; i++)
> >     {
> >       ps_insn_ptr first_row = ps->rows[0];
> >
> >       for (row = 0; row < last_row; row++)
> >    ps->rows[row] = ps->rows[row+1];
> >
>
> It is an inefficient algorithm.  You could improve it using one loop.
> You could ignore it because the number iteration are small.  Although
> I know a company which compiler searches for II up to several
> hundreds.
>
> >       ps->rows[last_row] = first_row;
> >     }
> >
> >   ps->max_cycle -= start_cycle;
> >   ps->min_cycle -= start_cycle;
> > }
>
>
> > /* Main entry point, perform SMS scheduling on the loops of the
function
> >    that consist of single basic blocks.  */
> > void
> > sms_schedule (FILE *dump_file, FILE *vcg_dump_file)
> >
>
> I did not find that you check usage of DFA interface.  What does
> happen if there is no DFA description or it is switched off.
>
> Vlad
(See attached file: gcc3.5.patch) (See attached file: modulo-sched.c)(See
attached file: ddg.c)(See attached file: ddg.h)

Attachment: gcc3.5.patch
Description: Binary data

Attachment: modulo-sched.c
Description: Binary data

Attachment: ddg.c
Description: Binary data

Attachment: ddg.h
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]