This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
| Other format: | [Raw text] | |
We addressed the comments below. The troubled code (unrolling and
renaming) was removed and replaced by a direct dependence computation
using df.c. We also added support for loops with unknown bounds.
Here is the revised patch relative to mainline. Passed regression
and bootstrap on powerpc-apple-darwin7.2.0 target.
ChangeLog:
2004-04-15 Ayal Zaks <zaks@il.ibm.com>
Mostafa Hagog <mustafa@il.ibm.com>
* Makefile.in (modulo-sched.o, ddg.o): New.
* ddg.h, ddg.c, modulo-sched.c: New files.
* cfglayout.c (duplicate_insn_chain): Remove "static" and push
internals to "dupicate_insn".
(duplicate_insn): New function.
* cfglayout.h (duplicate_insn_chain, duplicate_insn): New
declarations.
* common.opt (fmodulo-sched): New flag.
* df.c (df_bb_regno_last_use_find, df_bb_regno_first_def_find):
Remove
static and forward declaration.
(df_find_def, df_reg_used, df_bb_regno_last_def_find) : New
functions.
* df.h (df_bb_regno_last_use_find, df_bb_regno_first_def_find,
df_bb_regno_last_def_find, df_find_def, df_reg_used): New
declarations.
* flags.h (flag_modulo_sched): New flag.
* opts.c (case OPT_fmodulo_sched): Handle new flag.
* params.def (max-sms-loop-number, sms-max-ii-factor,
sms-dfa-history,
sms-loop-average-count-threshold): New parameters.
* params.h (MAX-SMS-LOOP-NUMBER, SMS-MAX-II-FACTOR, SMS_DFA_HISTORY,
SMS_LOOP_AVERAGE_COUNT_THRESHOLD): New parameters.
* passes.c ("sms", "sms-vcg"): New dumps.
(rest_of_handle_sched): Call sms_schedule.
* rtl.h (sms_schedule): New declaration.
* timevar.def (TV_SMS): New.
* toplev.c (flag_modulo_sched): Initialize.
("modulo-sched"): Handle -f option.
* docs/invoke.texi: Document -fmodulo-sched option.
* docs/passes.texi: Document new SMS pass.
Mostafa and Ayal.
Vladimir Makarov <vmakarov@redhat.com> wrote on 26/03/2004 21:52:51:
>
> Hi, Ayal and Mostafa.
>
> Sorry for the delay with review. It took time because the patch
> is big and not trivial. I still examined some patch code and probably
> will write more.
>
>
> > This patch includes the initial version of the modulo scheduler
> > (following http://gcc.gnu.org/ml/gcc/2004-02/msg00071.html).
> > It is implemented as a separate phase immediately preceding the
> > first schedule_insn pass, and is activated by -fmodulo-sched flag
> > (it is inactive by default).
> >
>
> Have you tested compiler with this patch? It would be reasonable to
> make bootstrap with the option is switched on by default.
>
> It is interesting how gcc is slower when the option is on.
>
> Also this is a machine-independent patch so it should be tested on
> some other platforms too.
>
> It would be great to post some benchmark results of usage of this
> patch. I think nobody expects improvements for SPEC because MS is too
> specific optimization oriented to scientific application. But results
> for any benchmark (like sorting, fft, matrix multiplication or
> whetstone) could say about potential of the optimization.
>
> > The simple example
> >
> > float dot (float* a, float* b, int n)
> > {
> > int i;
> > float result = 0;
> >
> > for (i=0; i<7; i++)
> > result += a[i]*b[i];
> >
> > return result;
> > }
> >
> > is modulo-scheduled (on powerpc-apple-darwin6.4 with -O3
-fmodulo-sched)
> > into:
> >
> > _dot:
> > mflr r6
> > stw r31,-8(r1)
> > li r2,0
> > li r0,6 /* adjusting loop count */
> > bcl 20,31,"L00000000001$pb"
> > "L00000000001$pb":
> > mtctr r0
> > slwi r5,r2,2
> > addi r2,r2,1
> > mflr r31
> > stw r6,16(r1)
> > lfsx f13,r5,r3 /* prolog */
> > lfsx f0,r5,r4 /* prolog */
> > addis r9,r31,ha16(LC0-"L00000000001$pb")
> > lfs f1,lo16(LC0-"L00000000001$pb")(r9)
> > L8:
> > slwi r7,r2,2
> > fmadds f1,f13,f0,f1
> > addi r2,r2,1
> > lfsx f13,r7,r3
> > lfsx f0,r7,r4
> > bdnz L8
> > lwz r3,16(r1)
> > fmadds f1,f13,f0,f1 /* epilog */
> > lwz r31,-8(r1)
> > mtlr r3
> > blr
> >
> > The following patch was regression tested and bootstrapped on
> > powerpc-apple-darwin7.2.0 platform.
> >
>
> Have you bootsrapped it with option switched on?
>
> > Ok for mainline?
> >
>
> I am sorry there are a lot of pitfalls in the patch which should be
> fixed. The most serious one is register renaming.
>
> May be it is reasonable to create a branch for this optimization
> (people could start experimenting with the patch on other platforms
> and make the optimization more mature) but it is up to you.
>
>
> > 2004-03-12 Ayal Zaks <zaks@il.ibm.com>
> > Mostafa Hagog <mustafa@il.ibm.com>
> >
> > * Makefile.in: New files modulo-sched.o, ddg.o.
> > * ddg.h: New file with interface of a Data Dependence Graph
> > for simple loops.
> > * ddg.c: New file with implementation of a Data Dependence Graph
> > for simple loops.
> > * modulo-sched.c: New file with implementation of SMS modulo
> > scheduler.
> > * cfglayout.c (duplicate_insn_chain): Remove "static" and push
> > internals out to a separate new "dupicate_insn" function.
> > (duplicate_insn): New function - the internals of the above.
> > * common.opt (fmodulo-sched): New flag for activating modulo
> > scheduling.
> > * flags.h (flag_modulo_sched): Same as above.
> > * opts.c (flag_modulo_sched): Set to be default.
> > * params.def (max-sms-loop-number): New flag limiting the modulo
> > scheduler to a given number of loops, for debugging.
> > (sms-max-ii-factor): New flag, for tuning when to quit modulo
> > scheduling.
> > * params.h (MAX-SMS-LOOP-NUMBER): Same as above.
> > (SMS-MAX-II-FACTOR): Same as above.
> > * toplev.c: Add a phase calling sms_schedule, with -dm for dumps.
> > * docs/invoke.texi: Document the new -fmodulo-sched option.
>
> The changelog is inaccurate, entries for passes.c, rtl.h,
> timevar.def are absent. The changelog entries are a bit verbose (this
> is a very minor drawback). They should describe only changes.
>
> Probably, you should describe this optimization in doc/passes.texi and
> describe the optimization parameters (at least sms-max-ii-factor) in
> the documentation.
>
> > *** Makefile.in 9 Mar 2004 01:53:24 -0000 1.1261
> > --- Makefile.in 11 Mar 2004 13:04:34 -0000
> > *************** OBJS-common = \
> > *** 857,862 ****
> > --- 857,863 ----
> > insn-extract.o insn-opinit.o insn-output.o insn-peep.o insn-recog.o
\
> > integrate.o intl.o jump.o langhooks.o lcm.o lists.o local-alloc.o
\
> > loop.o optabs.o options.o opts.o params.o postreload.o predict.o
\
> >+ modulo-sched.o ddg.o \
>
> File names are placed in alphabetic order.
>
> > print-rtl.o print-tree.o value-prof.o var-tracking.o \
> > profile.o ra.o ra-build.o ra-colorize.o ra-debug.o ra-rewrite.o
\
> > real.o recog.o reg-stack.o regclass.o regmove.o regrename.o \
> > *************** alias.o : alias.c $(CONFIG_H) $(SYSTEM_H
> > *** 1807,1812 ****
> > --- 1808,1816 ----
> > regmove.o : regmove.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H)
$(RTL_H)
> insn-config.h \
> > $(RECOG_H) output.h $(REGS_H) hard-reg-set.h flags.h function.h \
> > $(EXPR_H) $(BASIC_BLOCK_H) toplev.h $(TM_P_H) except.h reload.h
> > + ddg.o : ddg.c ddg.h $(CONFIG_H) $(SYSTEM_H) $(TM_H) $(RTL_H)
> > + modulo-sched.o : modulo-sched.c ddg.h cfgloop.h $(CONFIG_H) \
> > + $(SYSTEM_H) $(TM_H) $(RTL_H)
>
> The makefile entries for ddg.o and modulo-sched.o are incomplete. A
> lot of files from which give tow files depend are missed.
>
> > diff -b -c -p -r1.54 cfglayout.c
> > *** cfglayout.c 25 Feb 2004 20:00:00 -0000 1.54
> > --- cfglayout.c 11 Mar 2004 13:04:35 -0000
> > *************** static void change_scope (rtx, tree, tre
> > *** 55,61 ****
> >
> > void verify_insn_chain (void);
> > static void fixup_fallthru_exit_predecessor (void);
> > ! static rtx duplicate_insn_chain (rtx, rtx);
> > static tree insn_scope (rtx);
> >
> > rtx
> > --- 55,62 ----
> >
> > void verify_insn_chain (void);
> > static void fixup_fallthru_exit_predecessor (void);
> > ! void duplicate_insn (rtx);
> > ! rtx duplicate_insn_chain (rtx, rtx);
> > static tree insn_scope (rtx);
>
> It is a bad practice to use several external declarations of one
> object. The external declarations should be moved into cfglayout.h
> (or at least to rtl.h).
>
> > diff -b -c -p -r1.38 params.def
> > *** params.def 3 Mar 2004 16:32:38 -0000 1.38
> > --- params.def 11 Mar 2004 13:04:35 -0000
> > *************** DEFPARAM(PARAM_MAX_GCSE_PASSES,
> > *** 131,136 ****
> > --- 131,149 ----
> > "max-gcse-passes",
> > "The maximum number of passes to make when doing GCSE",
> > 1)
> > + /* The maximum number of loops we want to perfrom SMS on them (we use
this
> > + for debuggin). */
> > + DEFPARAM(PARAM_MAX_SMS_LOOP_NUMBER,
> > + "max-sms-loop-number",
> > + "The maximum number of loops we want to perfrom SMS on them\
> > + (we use this for debuggin).",
> ^ typo
>
> It would be great if you check all code for other typos too. I found
> some of them. But it can be done later.
>
> > + -1)
> > +
> > + /* This parameter is used to tune SMS MAX II calculations. */
> > + DEFPARAM(PARAM_SMS_MAX_II_FACTOR,
> > + "sms-max-ii-factor",
> > + "This parameter is used to tune SMS MAX II calculations",
> > + 10)
> > .
>
> > *** passes.c 3 Mar 2004 16:32:38 -0000 2.3
> > --- passes.c 11 Mar 2004 13:04:35 -0000
> > *************** enum dump_file_index
> > *** 154,159 ****
> > --- 154,161 ----
> > DFI_combine,
> > DFI_ce2,
> > DFI_regmove,
> > + DFI_sms,
> > + DFI_sms_vcg,
> > DFI_sched,
> > DFI_lreg,
> > DFI_greg,
> > *************** enum dump_file_index
> > *** 178,184 ****
> >
> > Remaining -d letters:
> >
> > ! " e m q "
> > " K O Q WXY "
> > */
> >
> > --- 180,186 ----
> >
> > Remaining -d letters:
> >
> > ! " e q "
> > " K O Q WXY "
> > */
> >
> > *************** static struct dump_file_info dump_file_t
> > *** 207,212 ****
> > --- 209,216 ----
> > { "combine", 'c', 1, 0, 0 },
> > { "ce2", 'C', 1, 0, 0 },
> > { "regmove", 'N', 1, 0, 0 },
> > + { "sms", 'm', 0, 0, 0 },
> ^1
> You could still permit to generate vcg file for all file. But you can
> ignore this.
>
>
> > + { "sms-vcg", 'm', 0, 0, 0 },
> > { "sched", 'S', 1, 0, 0 },
> > { "lreg", 'l', 1, 0, 0 },
> > { "greg", 'g', 1, 0, 0 },
>
> > *************** rest_of_handle_reorder_blocks (tree decl
> > *** 740,745 ****
> > --- 744,775 ----
> > static void
> > rest_of_handle_sched (tree decl, rtx insns)
> > {
> > + timevar_push (TV_SMS);
> > + if (optimize > 0 && flag_schedule_insns && flag_modulo_sched)
> > + {
> > + FILE *vcg_dump_file;
> > +
> > + open_dump_file (DFI_sms_vcg, decl);
>
> This file is never closed. It is dangerous because it is opened for
> each function. I feel guilty because I asked about this feature. You
> could restore dump_file after closing the standard one and call
> close_dump_file with 2 last parameters equal to NULL.
>
>
> > + vcg_dump_file = dump_file;
> > +
> > + /* Perform SMS module scheduling. */
> > + open_dump_file (DFI_sms, decl);
> > +
> > + /* We want to be able to create new pseudos. */
> > + no_new_pseudos = 0;
> > + sms_schedule (dump_file, vcg_dump_file);
> > + close_dump_file (DFI_sms, print_rtl, get_insns ());
> > +
> > + /* Update the life information, becuase we add pseudos. */
> > + max_regno = max_reg_num ();
> > + allocate_reg_info (max_regno, FALSE, FALSE);
> > + update_life_info_in_dirty_blocks (UPDATE_LIFE_GLOBAL_RM_NOTES,
> > + (PROP_DEATH_NOTES
> > + | PROP_KILL_DEAD_CODE
> > + | PROP_SCAN_DEAD_CODE));
> > + no_new_pseudos = 1;
> > + }
> > + timevar_pop (TV_SMS);
> > timevar_push (TV_SCHED);
>
> > *** rtl.h 9 Mar 2004 17:06:21 -0000 1.465
> > --- rtl.h 11 Mar 2004 13:04:36 -0000
> > *************** extern void simplify_using_condition (rt
> > *** 2467,2470 ****
> > --- 2467,2472 ----
> > /* In ra.c. */
> > extern void reg_alloc (void);
> >
> > + /* In modulo-sched.c. */
> > + extern void sms_schedule (FILE*, FILE*);
> > #endif /* ! GCC_RTL_H */
>
> Usually prototypes which contains FILE are by #ifdef BUFSIZ.
>
> > Index: doc/invoke.texi
> > ===================================================================
> > RCS file: /cvs/gcc/gcc/gcc/doc/invoke.texi,v
> > retrieving revision 1.425
> > diff -b -c -p -r1.425 invoke.texi
> > *** doc/invoke.texi 10 Mar 2004 06:02:55 -0000 1.425
> > --- doc/invoke.texi 11 Mar 2004 13:04:39 -0000
> > *************** invoking @option{-O2} on programs that u
> > *** 3653,3659 ****
> > @opindex O3
> > Optimize yet more. @option{-O3} turns on all optimizations specified
by
> > @option{-O2} and also turns on the @option{-finline-functions},
> > ! @option{-fweb} and @option{-frename-registers} options.
> >
> > @item -O0
> > @opindex O0
> > --- 3653,3659 ----
> > @opindex O3
> > Optimize yet more. @option{-O3} turns on all optimizations specified
by
> > @option{-O2} and also turns on the @option{-finline-functions},
> > ! @option{-fweb} , @option{-fmodule-sched} and
@option{-frename-registers} options.
> ^ typo.
> > *************** sense when scheduling before register al
> > *** 4084,4089 ****
> > --- 4084,4093 ----
> > @opindex fsched-stalled-insns
> > Define how many insns (if any) can be moved prematurely from the
queue
> > of stalled insns into the ready list, during the second scheduling
pass.
> > +
> > + @item -fmodulo-sched
> > + @opindex fmodulo-sched
> > + Perform SMS modulo scheduling before the first scheduling pass.
> >
>
> You should add that the option requires -fschedule-insns.
>
> > @item -fsched-stalled-insns-dep=@var{n}
> > @opindex fsched-stalled-insns-dep
>
> -------------------------------------ddg.c----------------------------
>
> > #include "expr.h" /* For gen_move_insn. */
>
> The comment is not necessary.
>
> > #include "ddg.h"
> >
> > extern void duplicate_insn (rtx); /* From cfglayout.c. */
>
> It is a bad practice. It should be moved into cfglayout.h.
>
> >
> > #define RENAMED_REG_P(x) (((x)->aux.info && \
> > GET_CODE ((rtx)(x)->aux.info) == REG) ? 1 : 0)
>
> Macros should have a comment. According to GNU C standard the
> operator (&&) should be 1st on the new line.
>
> > #define RENAMED_REG(x) ((x)->aux.info)
> >
> > #define ARTIFICIAL_P(x) (((x)->aux.info && \
> > GET_CODE ((rtx)(x)->aux.info) != REG) ? 1 : 0)
> > #define ART2ORIG(x) ((x)->aux.info)
> >
> > /* Returns non-zero if this is a load insn. */
> > #define LOAD_INSN_P(x) (single_set (x) && \
> > (GET_CODE (SET_SRC (single_set (x)))) == MEM \
> > ? 1 : 0)
> >
> > /* Returns non-zero if this is a store insn. */
> > #define STORE_INSN_P(x) (single_set (x) && \
> > (GET_CODE (SET_DEST (single_set (x)))) == MEM \
> > ? 1 : 0)
> >
>
> This code incorrectly defines at least stores. E.g. CISC architecture
> can have MEM <- OP (MEM, REG). I understand this code is not
> important. It is used only to gather statistics. But it is better to
> rename the macros or more accurately determine load/store.
>
> Another thing, SUBREG could contain MEM to.
>
> And the last thing, you are calling single_set two times. It could be
> avoided (e.g. by using a function).
>
> > /* Returns non-zero if this is a load or a store insn. */
> > #define LOAD_OR_STORE_INSN_P(x) (LOAD_INSN_P (x) || STORE_INSN_P (x) \
> > ? 1 : 0)
>
> This macro is not used anywhere. It should be removed.
>
> >
> > enum edge_flag {NOT_IN_SCC=0, IN_SCC};
> >
>
> Global definition should have a comment.
>
> > /* This array holds the mapping for undoing the renaming. */
> > static rtx *rename_reg_map = NULL;
> >
> > static unsigned max_rename_reg_num = 0;
>
> Global variable should have a comment.
>
> > static int
> > is_ldst_insn (rtx insn)
> > {
> > rtx pat = single_set (insn);
> >
> > return (pat
> > && (GET_CODE (SET_DEST (pat)) == MEM
> > || GET_CODE (SET_SRC (pat)) == MEM));
> > }
>
> Please, see the comment above for LOAD_INSN_P. This code is more
> important because it is used to recognize a memory dependence.
>
> Generally speaking, the optimization implementation is oriented to
> RISC architecture. Gcc supports other architectures too.
>
> > /* Computes the dependence parameters (latency, distance ect.), creates
> ^
> There are a lot of typos in the code.
>
> > a ddg_edge and adds it to the given DDG. */
>
> > /* The following three functions build the data dependence graph
including
>
> ^ I think it is obsolete. There are more three
> function after you added register renaming.
>
> > loop-carried dependences, by first creating an artificial copy of
the
> > loop (by unrolling), then calculating intra-(extended)-block
dependences,
> > and finally generating ddg_edges for such dependences and removing
the
> > artificial copy. */
>
>
> This function is most troubled code.
>
> > /* This function tries to keep the DEFs in a semi SSA form, when it
sees a
> > def of an already existing live-range it renames it and changes all
its
> > next uses. This is done after creating the artificial copies of the
BB
>
> ^ Sentences should be separated by two blanks. There
> are a lot of such places.
>
> > insns - i.e. the dummy unrolling - in order to have correct alias
> > analysis.
> > Side effect : This function adds register ANTI_DEP arcs to the DDG
> > before renaming the registers because the dep-analysis
> > will not see these dependencies after renaming. */
> > static void
> > rename_redefined_regs (ddg_ptr g)
> > {
> > int i, j;
> > int num_nodes = g->num_nodes;
> > sbitmap def_to_rename = sbitmap_alloc (g->num_nodes);
> >
> > sbitmap_zero (def_to_rename);
> > max_rename_reg_num = max_reg_num ();
> >
> > /* Scan all pairs of (i < j) = (insn, def) insns, checking if def
sets a reg
> > used earlier by insn. If so, record it in def_to_rename. */
> > for (i = 0; i < num_nodes - 1; i++)
> > {
> > rtx insn = g->nodes[i].insn;
> > ddg_node_ptr use_node = &g->nodes[i];
> >
> > for (j = i+1 ; j < num_nodes; j++)
>
> ^ Operator should be surrounded by blanks. There
> are lot of such places.
>
> The algorithm is O(n**2) complexity and can be improved. But probably
> it is ok because SP works on only small loops.
>
> > {
> > rtx reg;
> > rtx def = g->nodes[j].insn;
> > ddg_node_ptr def_node = &g->nodes[j];
> >
> > if (! single_set (def)
> > || GET_CODE (SET_DEST (PATTERN (def))) != REG)
> > continue;
> >
>
> You could improve renaming (make it in more places) if you process
> more complex cases. The definition could be SUBREG, ZERO_EXTRACT etc.
> It could be not only single_set. In general code is oriented to
> rs6000. It should be more machine-independent.
>
> You could look at df.c (reaching definitions analysis) and web.c
> (register renaming) using df how to make it.
>
> Another thing, you can make code worse for some architectures. Insns
> can requre that two operands have the same register (please see
> regmove.c). In this case renaming will result in generation of
> an additional move insn by reload.
>
> > reg = SET_DEST (PATTERN (def));
>
> Are you going to rename hard registers too. It is wrong. They should
> never renamed. They have special purposes like return/argument
> registers.
>
> > if (reg_referenced_p (reg, PATTERN (insn))
> > || (GET_CODE (insn) == CALL_INSN
> > && find_reg_fusage (insn, USE, reg)))
> > {
> > SET_BIT (def_to_rename, j);
> > max_rename_reg_num++;
> >
> > /* Add ANTI dependence because the coming dep-analysis will
> > not see it; if def is an artificial node we do not
> > add the ANTI (or any other) dependence. */
> > if (! ARTIFICIAL_P (def_node))
> > {
> > ddg_edge_ptr e = create_ddg_edge (use_node, def_node,
> > ANTI_DEP, REG_DEP, 0, 0);
> > add_edge_to_ddg (g, e);
> >
> > }
> > }
> > }
> > }
> >
> > rename_reg_map = xcalloc (max_rename_reg_num, sizeof (rtx));
> > /* Now rename defs marked in def_to_rename, along with their uses.
*/
> > /* ??? Could use EXECUTE_IF_SET_IN_SBITMAP (def_to_rename, 0, i,.
*/
> > for (i = 0; i < num_nodes; i++)
> > {
> > rtx reg, new_reg;
> > int last_def = 1;
> >
> > if (! TEST_BIT (def_to_rename, i))
> > continue;
> >
> > reg = SET_DEST (PATTERN (g->nodes[i].insn));
> > new_reg = gen_reg_rtx (GET_MODE (reg));
>
> You should set up the same attributes for the new regs (please look at
> web.c for example).
>
> > SET_DEST (PATTERN (g->nodes[i].insn)) = new_reg;
> >
> > /* Now rename uses. */
> > for (j = i + 1; j < num_nodes; j++)
> > {
> > rtx use_insn = g->nodes[j].insn;
> > rtx set = single_set (use_insn);
> >
> > if (! set)
> > continue;
> >
>
> This is wrong. After renaming definition, you should rename all
> occurrences of the register not only in some insns. You should also
> rename register notes (like REG_EQUAL) and mention that register notes
> REG_DEAD and REG_UNUSED might be inaccurate and will be fixed by live
> analysis later. It is strange that gcc did not crash or generate
> incorrect code. Probably you were lucky.
>
> > if (REGNO (new_reg) >= max_rename_reg_num)
> > abort ();
> >
> > rename_reg_map[REGNO (new_reg)] = reg;
> >
> > /* TODO: If use_insn uses reg in an address computation (reg +
const)
> > and def is an increment of reg (reg = reg + const'), then
> > replace (reg + const) by (reg + (const + const')) instead of
> > renaming it to (new_reg + const). */
> > SET_SRC (set) = replace_rtx (SET_SRC (set), reg, new_reg);
> >
> > if (SET_DEST (set) == reg)
> > {
> > /* A subsequent def kills the live range. */
> > last_def = 0;
> > break;
> > }
> >
> > SET_DEST (set) = replace_rtx (SET_DEST (set), reg, new_reg);
> > }
> > /* Generate a move from the last renamed register to the original
reg. */
> > if (last_def)
> > emit_insn_before (gen_move_insn (reg, new_reg) ,
> ^ unecessary blank
> > g->closing_branch->first_note);
> >
> > if (!ARTIFICIAL_P (&g->nodes[i]))
> > RENAMED_REG (&g->nodes[i]) = reg;
> > }
> > }
>
>
> > int
> > longest_simple_path (struct ddg * g, int src, int dest, sbitmap nodes)
> > {
> > while (change)
> > {
> > change = 0;
> > sbitmap_copy (workset, tmp);
> > sbitmap_zero (tmp);
> > EXECUTE_IF_SET_IN_SBITMAP (workset, 0, u,
>
> It is better to put only small code in EXECUTE_IF_SET_IN_SBITMAP for
> debugging purposes. But you can ignore it.
>
> > {
> > ddg_node_ptr u_node = &g->nodes[u];
>
> You should add a blank line here. It is usual practice to separate
> definitions and statements. This is not the single place.
>
> > for (e = u_node->out; e; e = e->next_out)
> > {
> > ddg_node_ptr v_node = e->dest;
> > v = v_node->cuid;
> >
>
> ------------------------------modulo-sched.c----------------------------
>
> > #include "expr.h" /* For gen_move_insn. */
> > #include "params.h" /* For MAX_SMS_LOOP_NUMBER. */
> > #include "gcov-io.h" /* For profile_info. */
>
> These comments are not necessary.
>
> You should put a reference to the article describing SMS and describe
> the algorithm in brief.
>
> > /* Perform signed modulo, always returning a non-negative value. */
> > #define SMODULO(x,y) ((x)%(y) < 0 ? ((x)%(y) + (y)) : (x)%(y))
> ^ Blanks around operators.
>
> > /* Holds the partial schedule as an array of II rows. Each entry of the
> > array points to a linked list of PS_INSNs, which represents the
> > instructions that are scheduled for that row. */
> > struct partial_schedule
> > {
> > int ii;
> > int history; /* Threshold for conflict checking using DFA. */
> > ps_insn_ptr *rows;
> > int min_cycle;
> > int max_cycle;
>
> There are no comments for min_cycle/max_cycle and other members.
>
> > ddg_ptr g;
> > };
>
>
> > extern int issue_rate;
>
> It is a bad practice. Please put it sched-int.h add '#include'.
>
> > /* From cfglayout.c. */
> > rtx duplicate_insn_chain (rtx, rtx);
>
> The same. Please put it in cfglayout.h
>
> > /* The scheduling parameters held for each node include a lower-bound
> > on it's scheduling cycle (asap); the absolute cycle we decided to
> > schedule it (time; time >= asap); a pointer to the first register-
> > move instruction added to handle the modulo-variable-expansion (mve)
> > of the register defined by this node (first_reg_move; copies the
> > original register defined by the node); the number of such register-
> > move instructions generated (nreg_moves; they immediately preceed
> > first_reg_move); the row and stage of the node (caching results of
> > % and /) and the column of the node in the partial schedule. */
>
> It is hard to read. Please some formating.
>
> > /* Check if condition = (ne (reg) (const_int 1)), which is more
> > restrictive than the check in doloop_condition_get:
> > if ((GET_CODE (condition) != GE && GET_CODE (condition) != NE)
> > || GET_CODE (XEXP (condition, 1)) != CONST_INT). */
>
> Why do you use more restrict condition?
>
> > if (GET_CODE (condition) != NE
> > || XEXP (condition, 1) != const1_rtx)
> > return NULL_RTX;
>
> > static rtx
> > const_iteration_count (rtx count_reg, basic_block pre_header, int *
count)
>
> Count should be of HOST_WIDE_TYPE. This is a type what INTVAL returns.
>
> > {
> > rtx insn;
> > rtx head, tail;
> > get_block_head_tail (pre_header->index, &head, &tail);
> >
> > for (insn = tail; insn != PREV_INSN (head); insn = PREV_INSN (insn))
> > if (INSN_P (insn) && single_set (insn) &&
> > rtx_equal_p (count_reg, SET_DEST (single_set (insn))))
> > {
> > rtx pat = single_set (insn);
> >
> > if (GET_CODE (SET_SRC (pat)) == CONST_INT)
> > {
> > *count = INTVAL (SET_SRC (pat));
> >
>
> > /* A very simple resource-based lower bound on the initiation interval.
> > Todo: improve the accuracy of this bound by considering the
> > utilization of various units. */
>
> ??? is used usually for to do in gcc code.
>
> > /* Calculate an upper bound for II. SMS should not schedule the loop if
it
> ^ Two blanks. There are lot of
> such places.
>
> > requires more cycles than this bound. Currently set to the sum of
the
> > longest latency edge for each node. Reset based on experiments. */
> > static int
> > calculate_maxii (ddg_ptr g)
> > {
> > int i;
> > int maxii = 0;
> >
> > for (i=0; i<g->num_nodes; i++)
> ^ ^ Blanks, please.
> > {
> > ddg_node_ptr u = &g->nodes[i];
> > ddg_edge_ptr e;
> > int max_edge_latency = 0;
> >
> > for (e = u->out; e ; e = e->next_out)
> > max_edge_latency = MAX (max_edge_latency , e->latency);
> ^ unessary blank.
> > /* Walk throught the DFA for the current row. */
> ^ typo
>
>
> > /* Rotate the rows of PS such that insns scheduled at time
> > START_CYCLE will appear in row 0. Updates max/min_cycles. */
> > void
> > rotate_partial_schedule (partial_schedule_ptr ps, int start_cycle)
> > {
> > int i, row, backward_rotates;
> > int last_row = ps->ii - 1;
> >
> > if (start_cycle == 0)
> > return;
> >
> > backward_rotates = SMODULO (start_cycle, ps->ii);
> >
> > /* Revisit later and optimize this into a single loop. */
> > for (i = 0; i < backward_rotates; i++)
> > {
> > ps_insn_ptr first_row = ps->rows[0];
> >
> > for (row = 0; row < last_row; row++)
> > ps->rows[row] = ps->rows[row+1];
> >
>
> It is an inefficient algorithm. You could improve it using one loop.
> You could ignore it because the number iteration are small. Although
> I know a company which compiler searches for II up to several
> hundreds.
>
> > ps->rows[last_row] = first_row;
> > }
> >
> > ps->max_cycle -= start_cycle;
> > ps->min_cycle -= start_cycle;
> > }
>
>
> > /* Main entry point, perform SMS scheduling on the loops of the
function
> > that consist of single basic blocks. */
> > void
> > sms_schedule (FILE *dump_file, FILE *vcg_dump_file)
> >
>
> I did not find that you check usage of DFA interface. What does
> happen if there is no DFA description or it is switched off.
>
> Vlad
(See attached file: gcc3.5.patch) (See attached file: modulo-sched.c)(See
attached file: ddg.c)(See attached file: ddg.h)Attachment:
gcc3.5.patch
Description: Binary data
Attachment:
modulo-sched.c
Description: Binary data
Attachment:
ddg.c
Description: Binary data
Attachment:
ddg.h
Description: Binary data
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |