This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] insn priority adjustments in scheduler and rs6000 port
- From: "Vladimir N. Makarov" <vmakarov at redhat dot com>
- To: Dorit Naishlos <DORIT at il dot ibm dot com>
- Cc: Ayal Zaks <ZAKS at il dot ibm dot com>, David Edelsohn <dje at watson dot ibm dot com>,gcc-patches at gcc dot gnu dot org
- Date: Fri, 03 Oct 2003 18:55:18 -0400
- Subject: Re: [PATCH] insn priority adjustments in scheduler and rs6000 port
- References: <OF802890A2.E8F2C474-ONC2256DB3.004F48ED-C2256DB3.00591DE2@telaviv.ibm.com>
> > I like your new idea much better than the previous one. You could
> > implement premature_issue as a hook (if it is not defined, the value
> > should be zero) and print its value (if it is not zero in debugging
> > output). So if it is possible to get the same behaviour (and as I
> > understand to get the same improvement), could you rewrite the patch,
> > please. I'll review it as quick as possible. Please, don't forget
> > about the documentation/style/benchmarks.
> >
>
> Thanks. Attached is the revised patch. Still bootstrapping, and being
> tested for SPEC performance. It includes some style & documentation
> updates, and also the revision I described before (i.e, the update of the
> INSN_TICK for instructions that depend on prematurely scheduled insns).
>
You mentioned SPEC but the results are absent in the message. As I
understood, the results mentioned in
http://gcc.gnu.org/ml/gcc-patches/2003-09/msg02105.html
are for the 1st patch 'insn priority adjustments in scheduler and
rs6000 port'. Is it true? Or the results were for the two patches?
It is better to send the patch (not ChangeLog entry) as an
attachment. Otherwise, it is hard to apply the patch to gcc (because
of tabulation to blanks transformation). I wanted to check that the
patch does not create problems for other ports (like x86). Although I
see that there should be no problem (if powerpc is compiled correctly
with the patch), it is better to check it. I hope you did it.
The hook `is_costly_dependence' should be documented too in tm.texi.
In my opinion you should describe when to use the hook (it is just for
the problem you are trying to solve for OOO power4 processor).
If you are changing a documentation it is also reasonable to `make
doc' to make sure that the documentation changes are ok and then look
at the generated .dvi file to check how it looks. I can not help you
in editing the documentation because English is not my native language
too.
I thought about the problem you are solving and did not find a
simpler solution. So in general your approach is ok because as I
understand the patch improves the code (at least for SPEC). The
classic approach of insn scheduling for OOO processors is to schedule
insn in such way that the insn will have no hazards (and stalls) after
its issue. I've not seen articles about other approaches. So I think
you are a pioneer for the approach (although I might be mistaken). It
is interesting to try the patch for other OOO processor but of course
it is out of scope of your work.
So could you send me a fixed version of the patch for the final
approval.
Vlad
>
> ChangeLog entry
> ---------------
>
> 2003-10-02 Dorit Naishlos <dorit@il.ibm.com>
>
> * haifa-sched.c (ok_for_early_schedule): New function.
> (early_queue_to_ready): New function.
> (schedule_block): allow early removal of insns from Q.
^ Capital letter should be used/
> (schedule_insn): update INSN_TICK in case of premature issue.
^ The same as above. There are a lot of the same mistakes
below.
> * common.opt (sched_stalled_insns): New flag.
> (sched_stalled_insns_dep): New flag.
> * flags.h: Same above flags.
> * opts.c: Same as above.
> * toplev.c: Same as above.
> * target.h (targetm.sched.is_costly_dependence): New
> hook.
> * target-def.h: same as above.
^ Capital letter
> * config/rs6000/rs6000.h: (rs6000_sched_costly_dep):
> support new flag -msched-costly-dep.
^ Capital letter.
> (DEFAULT_SCHED_COSTLY_DEP): Define.
> * config/rs6000/rs6000.c:
> (rs6000_is_costly_dependence): New function
^ Period should be here.
> (is_load_insn, is_store_insn): New functions
^ The same as above.
> (is_load_insn1, is_store_insn1, is_mem_ref): New
> functions.
> * doc/invoke.texi (fsched_stalled_insns)
^^^^^^^^^^^^^^^^^^^^^^ Obviously it should
be removed.
> (fsched_stalled_insns, msched-costly-dep): Document
> options.
>
>
>
> Index: gcc/common.opt
> ===================================================================
> RCS file: /cvsroot/gcc/gcc/gcc/common.opt,v
> retrieving revision 1.16
> diff -c -3 -p -r1.16 common.opt
> *** gcc/common.opt 3 Sep 2003 20:57:31 -0000 1.16
> --- gcc/common.opt 2 Oct 2003 15:31:50 -0000
> *************** fschedule-insns2
> *** 592,597 ****
> --- 592,613 ----
> Common
> Reschedule instructions after register allocation
>
> + fsched-stalled-insns
> + Common
> + Allow premature scheduling of queued insns
> +
> + fsched-stalled-insns=
> + Common RejectNegative Joined UInteger
> + -fsched-stalled-insns=<number> Set number of queued insns that can be prematurely scheduled
> +
> + fsched-stalled-insns-dep
> + Common
> + Set dependence distance checking in premature scheduling of queued insns
> +
> + fsched-stalled-insns-dep=
> + Common RejectNegative Joined UInteger
> + -fsched-stalled-insns-dep=<number> Set dependence distance checking in premature scheduling of queued insns
> +
> fshared-data
> Common
> Mark data as shared rather than private
> Index: gcc/flags.h
> ===================================================================
> RCS file: /cvsroot/gcc/gcc/gcc/flags.h,v
> retrieving revision 1.121
> diff -c -3 -p -r1.121 flags.h
> *** gcc/flags.h 28 Sep 2003 04:56:33 -0000 1.121
> --- gcc/flags.h 2 Oct 2003 15:31:50 -0000
> *************** extern int flag_schedule_speculative;
> *** 439,444 ****
> --- 439,458 ----
> extern int flag_schedule_speculative_load;
> extern int flag_schedule_speculative_load_dangerous;
>
> + /* The following flags have an effect during scheduling after register
> + allocation:
> +
> + sched_stalled_insns means that insns can be moved prematurely from the queue
> + of stalled insns into the ready list.
> +
> + sched_stalled_insns_dep controls how many recently scheduled cycles will
> + be examined for a dependency on a stalled insn that is candidate for
> + premature removal from the queue of stalled insns into the ready list (has
> + an effect only if the flag 'sched_stalled_insns' is set). */
> +
> + extern int flag_sched_stalled_insns;
> + extern int flag_sched_stalled_insns_dep;
> +
> /* flag_branch_on_count_reg means try to replace add-1,compare,branch tupple
> by a cheaper branch, on a count register. */
> extern int flag_branch_on_count_reg;
> Index: gcc/haifa-sched.c
> ===================================================================
> RCS file: /cvsroot/gcc/gcc/gcc/haifa-sched.c,v
> retrieving revision 1.229
> diff -c -3 -p -r1.229 haifa-sched.c
> *** gcc/haifa-sched.c 15 Sep 2003 18:52:33 -0000 1.229
> --- gcc/haifa-sched.c 2 Oct 2003 15:31:51 -0000
> *************** static void ready_sort (struct ready_lis
> *** 517,522 ****
> --- 517,523 ----
> static rtx ready_remove_first (struct ready_list *);
>
> static void queue_to_ready (struct ready_list *);
> + static int early_queue_to_ready (state_t, struct ready_list *);
>
> static void debug_ready_list (struct ready_list *);
>
> *************** schedule_insn (rtx insn, struct ready_li
> *** 1247,1252 ****
> --- 1248,1254 ----
> rtx link;
> int advance = 0;
> int unit = 0;
> + int premature_issue = 0;
>
> if (!targetm.sched.use_dfa_pipeline_interface
> || !(*targetm.sched.use_dfa_pipeline_interface) ())
> *************** schedule_insn (rtx insn, struct ready_li
> *** 1290,1301 ****
> return 0;
> }
>
> for (link = INSN_DEPEND (insn); link != 0; link = XEXP (link, 1))
> {
> rtx next = XEXP (link, 0);
> int cost = insn_cost (insn, link, next);
>
> ! INSN_TICK (next) = MAX (INSN_TICK (next), clock + cost);
>
> if ((INSN_DEP_COUNT (next) -= 1) == 0)
> {
> --- 1292,1310 ----
> return 0;
> }
>
> + if (INSN_TICK (insn) > clock)
> + {
> + /* 'insn' has been prematurely moved from the queue to the
> + ready list. */
> + premature_issue = INSN_TICK (insn) - clock;
> + }
> +
> for (link = INSN_DEPEND (insn); link != 0; link = XEXP (link, 1))
> {
> rtx next = XEXP (link, 0);
> int cost = insn_cost (insn, link, next);
>
> ! INSN_TICK (next) = MAX (INSN_TICK (next), clock + cost + premature_issue);
>
> if ((INSN_DEP_COUNT (next) -= 1) == 0)
> {
> *************** queue_to_ready (struct ready_list *ready
> *** 1809,1814 ****
> --- 1818,1974 ----
> }
> }
>
> + /* Used by early_queue_to_ready. Determines whether it is "ok" to
> + prematurely move INSN from the queue to the ready list. Currently,
> + if a target defines the hook 'is_costly_dependence', this function
> + uses the hook to check whether there exist any dependences which are
> + considered costly by the target, between INSN and other insns that
> + have already been scheduled. Dependences are checked up to Y cycles
> + back, with default Y=1; The flag -fsched-stalled-insns-dep=Y allows
> + controlling this value.
> + (Other considerations could be taken into account instead (or in
> + addition) depending on user flags and target hooks. */
> +
> + static bool
> + ok_for_early_queue_removal (rtx insn)
> + {
> + int n_cycles;
> + rtx prev_insn = last_scheduled_insn;
> +
> + if (targetm.sched.is_costly_dependence)
> + {
> + for (n_cycles = flag_sched_stalled_insns_dep; n_cycles; n_cycles--)
> + {
> + for ( ; prev_insn; prev_insn = PREV_INSN (prev_insn))
> + {
> + rtx dep_link = 0;
> + int dep_cost;
> +
> + if (GET_CODE (prev_insn) != NOTE)
> + {
> + dep_link = find_insn_list (insn, INSN_DEPEND (prev_insn));
> + if (dep_link)
> + {
> + dep_cost = insn_cost (prev_insn, dep_link, insn) ;
> + if (targetm.sched.is_costly_dependence (prev_insn, insn, dep_link, dep_cost,
> + flag_sched_stalled_insns_dep - n_cycles))
> + return false;
> + }
> + }
> +
> + if (GET_MODE (prev_insn) == TImode) /* end of dispatch group */
> + break;
> + }
> +
> + if (!prev_insn)
> + break;
> + prev_insn = PREV_INSN (prev_insn);
> + }
> + }
> +
> + return true;
> + }
> +
> +
> + /* Remove insns from the queue, before they become "ready" with respect
> + to FU latency considerations. */
> +
> + static int
> + early_queue_to_ready (state_t state, struct ready_list *ready)
> + {
> + rtx insn;
> + rtx link;
> + rtx next_link;
> + rtx prev_link;
> + bool move_to_ready;
> + int cost;
> + state_t temp_state = alloca (dfa_state_size);
> + int stalls;
> + int insns_removed = 0;
> +
> + /*
> + Flag '-fsched-stalled-insns=X' determines the aggressiveness of this function:
It is better split lines whose length > 80 columns.
> +
> + X == 0: There is no limit on how many queued insns can be removed prematurely.
> + (flag_sched_stalled_insns = -1).
> +
> + X >= 1: Only X queued insns can be removed prematurely in each invocation.
> + (flag_sched_stalled_insns = X).
> +
> + Otherwise: Early queue removal is disabled.
> + (flag_sched_stalled_insns = 0)
> + */
> +
> + if (! flag_sched_stalled_insns)
> + return 0;
> +
> + for (stalls = 0; stalls <= MAX_INSN_QUEUE_INDEX; stalls++)
> + {
> + if ((link = insn_queue[NEXT_Q_AFTER (q_ptr, stalls)]))
> + {
> + if (sched_verbose > 6)
> + fprintf (sched_dump, ";; look at index %d + %d\n", q_ptr, stalls);
> +
> + prev_link = 0;
> + while (link)
> + {
> + next_link = XEXP (link, 1);
> + insn = XEXP (link, 0);
> + if (insn && sched_verbose > 6)
> + print_rtl_single (sched_dump, insn);
> +
> + memcpy (temp_state, state, dfa_state_size);
> + if (recog_memoized (insn) < 0)
> + /* non-negative to indicate that it's not ready
> + to avoid infinite Q->R->Q->R... */
> + cost = 0;
> + else
> + cost = state_transition (temp_state, insn);
> +
> + if (sched_verbose >= 6)
> + fprintf (sched_dump, "transition cost = %d\n", cost);
> +
> + move_to_ready = false;
> + if (cost < 0)
> + {
> + move_to_ready = ok_for_early_queue_removal (insn);
> + if (move_to_ready == true)
> + {
> + /* move from Q to R */
> + q_size -= 1;
> + ready_add (ready, insn);
> +
> + if (prev_link)
> + XEXP (prev_link, 1) = next_link;
> + else
> + insn_queue[NEXT_Q_AFTER (q_ptr, stalls)] = next_link;
> +
> + free_INSN_LIST_node (link);
> +
> + if (sched_verbose >= 2)
> + fprintf (sched_dump, ";;\t\tEarly Q-->Ready: insn %s\n",
> + (*current_sched_info->print_insn) (insn, 0));
> +
> + insns_removed++;
> + if (insns_removed == flag_sched_stalled_insns)
> + /* remove only one insn from Q at a time */
> + return insns_removed;
> + }
> + }
> +
> + if (move_to_ready == false)
> + prev_link = link;
> +
> + link = next_link;
> + } /* while link */
> + } /* if link */
> +
> + } /* for stalls.. */
> +
> + return insns_removed;
> + }
> +
> +
> /* Print the ready list for debugging purposes. Callable from debugger. */
>
> static void
> *************** schedule_block (int b, int rgn_n_insns)
> *** 2251,2256 ****
> --- 2411,2430 ----
> }
> else
> {
> + if (ready.n_ready == 0
> + && can_issue_more
> + && reload_completed)
> + {
> + /* Allow scheduling insns directly from the queue in case
> + there's nothing better to do (ready list is empty) but
> + there are still vacant dispatch slots in the current cycle. */
> + if (sched_verbose >= 6)
> + fprintf(sched_dump,";;\t\tSecond chance\n");
> + memcpy (temp_state, curr_state, dfa_state_size);
> + if (early_queue_to_ready (temp_state, &ready))
> + ready_sort (&ready);
> + }
> +
> if (ready.n_ready == 0 || !can_issue_more
> || state_dead_lock_p (curr_state)
> || !(*current_sched_info->schedule_more_p) ())
> Index: gcc/opts.c
> ===================================================================
> RCS file: /cvsroot/gcc/gcc/gcc/opts.c,v
> retrieving revision 1.38
> diff -c -3 -p -r1.38 opts.c
> *** gcc/opts.c 5 Sep 2003 05:36:47 -0000 1.38
> --- gcc/opts.c 2 Oct 2003 15:31:51 -0000
> *************** common_handle_option (size_t scode, cons
> *** 1264,1269 ****
> --- 1264,1287 ----
> flag_schedule_insns_after_reload = value;
> break;
>
> + case OPT_fsched_stalled_insns:
> + flag_sched_stalled_insns = value;
> + break;
> +
> + case OPT_fsched_stalled_insns_:
> + flag_sched_stalled_insns = value;
> + if (flag_sched_stalled_insns == 0)
> + flag_sched_stalled_insns = -1;
> + break;
> +
> + case OPT_fsched_stalled_insns_dep:
> + flag_sched_stalled_insns_dep = 1;
> + break;
> +
> + case OPT_fsched_stalled_insns_dep_:
> + flag_sched_stalled_insns_dep = value;
> + break;
> +
> case OPT_fshared_data:
> flag_shared_data = value;
> break;
> Index: gcc/target-def.h
> ===================================================================
> RCS file: /cvsroot/gcc/gcc/gcc/target-def.h,v
> retrieving revision 1.56
> diff -c -3 -p -r1.56 target-def.h
> *** gcc/target-def.h 23 Sep 2003 19:17:42 -0000 1.56
> --- gcc/target-def.h 2 Oct 2003 15:31:52 -0000
> *************** Foundation, 59 Temple Place - Suite 330,
> *** 230,235 ****
> --- 230,236 ----
> #define TARGET_SCHED_DFA_NEW_CYCLE 0
> #define TARGET_SCHED_INIT_DFA_BUBBLES 0
> #define TARGET_SCHED_DFA_BUBBLE 0
> + #define TARGET_SCHED_IS_COSTLY_DEPENDENCE 0
>
> #define TARGET_SCHED \
> {TARGET_SCHED_ADJUST_COST, \
> *************** Foundation, 59 Temple Place - Suite 330,
> *** 250,256 ****
> TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD, \
> TARGET_SCHED_DFA_NEW_CYCLE, \
> TARGET_SCHED_INIT_DFA_BUBBLES, \
> ! TARGET_SCHED_DFA_BUBBLE}
>
> /* In tree.c. */
> #define TARGET_MERGE_DECL_ATTRIBUTES merge_decl_attributes
> --- 251,258 ----
> TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD, \
> TARGET_SCHED_DFA_NEW_CYCLE, \
> TARGET_SCHED_INIT_DFA_BUBBLES, \
> ! TARGET_SCHED_DFA_BUBBLE, \
> ! TARGET_SCHED_IS_COSTLY_DEPENDENCE}
>
> /* In tree.c. */
> #define TARGET_MERGE_DECL_ATTRIBUTES merge_decl_attributes
> Index: gcc/target.h
> ===================================================================
> RCS file: /cvsroot/gcc/gcc/gcc/target.h,v
> retrieving revision 1.63
> diff -c -3 -p -r1.63 target.h
> *** gcc/target.h 23 Sep 2003 19:17:42 -0000 1.63
> --- gcc/target.h 2 Oct 2003 15:31:52 -0000
> *************** struct gcc_target
> *** 251,256 ****
> --- 251,268 ----
> scheduling. */
> void (* init_dfa_bubbles) (void);
> rtx (* dfa_bubble) (int);
> + /* The following member value is a pointer to a function called
> + by the insn scheduler. It should return true if there exists a
> + dependence which is considered costly by the target, between
> + the insn passed as the first parameter, and the insn passed as
> + the second parameter. The third parameter is the INSN_DEPEND
> + link that represents the dependence between the two insns. The
> + fourth argument is the cost of the dependence as estimated by
> + the scheduler. The last argument is the distance in cycles
> + between the already scheduled insn (first parameter) and the
> + the second insn (second parameter).
> + */
> + bool (* is_costly_dependence) PARAMS ((rtx, rtx, rtx, int, int));
> } sched;
>
> /* Given two decls, merge their attributes and return the result. */
> Index: gcc/toplev.c
> ===================================================================
> RCS file: /cvsroot/gcc/gcc/gcc/toplev.c,v
> retrieving revision 1.830
> diff -c -3 -p -r1.830 toplev.c
> *** gcc/toplev.c 28 Sep 2003 04:56:33 -0000 1.830
> --- gcc/toplev.c 2 Oct 2003 15:31:54 -0000
> *************** int flag_schedule_speculative = 1;
> *** 826,831 ****
> --- 826,845 ----
> int flag_schedule_speculative_load = 0;
> int flag_schedule_speculative_load_dangerous = 0;
>
> + /* The following flags have an effect during scheduling after register
> + allocation:
> +
> + flag_sched_stalled_insns means that insns can be moved prematurely from the queue
> + of stalled insns into the ready list.
> +
> + flag_sched_stalled_insns_dep controls how many insn groups will be examined
> + for a dependency on a stalled insn that is candidate for premature removal
> + from the queue of stalled insns into the ready list (has an effect only if
> + the flag 'sched_stalled_insns' is set). */
> +
> + int flag_sched_stalled_insns = 0;
> + int flag_sched_stalled_insns_dep = 1;
> +
> int flag_single_precision_constant;
>
> /* flag_branch_on_count_reg means try to replace add-1,compare,branch tupple
> *************** static const lang_independent_options f_
> *** 1069,1074 ****
> --- 1083,1090 ----
> {"sched-spec",&flag_schedule_speculative, 1 },
> {"sched-spec-load",&flag_schedule_speculative_load, 1 },
> {"sched-spec-load-dangerous",&flag_schedule_speculative_load_dangerous, 1 },
> + {"sched-stalled-insns", &flag_sched_stalled_insns, 0 },
> + {"sched-stalled-insns-dep", &flag_sched_stalled_insns_dep, 1 },
> {"sched2-use-superblocks", &flag_sched2_use_superblocks, 1 },
> {"sched2-use-traces", &flag_sched2_use_traces, 1 },
> {"branch-count-reg",&flag_branch_on_count_reg, 1 },
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> RCS file: /cvsroot/gcc/gcc/gcc/config/rs6000/rs6000.c,v
> retrieving revision 1.521
> diff -c -3 -p -r1.521 rs6000.c
> *** gcc/config/rs6000/rs6000.c 23 Sep 2003 21:37:31 -0000 1.521
> --- gcc/config/rs6000/rs6000.c 2 Oct 2003 15:32:00 -0000
> *************** struct rs6000_cpu_select rs6000_select[3
> *** 80,85 ****
> --- 80,89 ----
> { (const char *)0, "-mtune=", 1, 0 },
> };
>
> + /* Support for -msched-costly-dep option. */
> + const char *rs6000_sched_costly_dep_str;
> + enum rs6000_dependence_cost rs6000_sched_costly_dep;
> +
> /* Size of long double */
> const char *rs6000_long_double_size_string;
> int rs6000_long_double_type_size;
> *************** static bool rs6000_rtx_costs (rtx, int,
> *** 270,275 ****
> --- 274,280 ----
> static int rs6000_adjust_cost (rtx, rtx, rtx, int);
> static int rs6000_adjust_priority (rtx, int);
> static int rs6000_issue_rate (void);
> + static bool rs6000_is_costly_dependence (rtx, rtx, rtx, int, int);
> static int rs6000_use_sched_lookahead (void);
>
> static void rs6000_init_builtins (void);
> *************** static const char alt_reg_names[][8] =
> *** 457,462 ****
> --- 462,469 ----
> #define TARGET_SCHED_ADJUST_COST rs6000_adjust_cost
> #undef TARGET_SCHED_ADJUST_PRIORITY
> #define TARGET_SCHED_ADJUST_PRIORITY rs6000_adjust_priority
> + #undef TARGET_SCHED_IS_COSTLY_DEPENDENCE
> + #define TARGET_SCHED_IS_COSTLY_DEPENDENCE rs6000_is_costly_dependence
>
> #undef TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD
> #define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD rs6000_use_sched_lookahead
> *************** rs6000_override_options (const char *def
> *** 820,825 ****
> --- 827,847 ----
> rs6000_default_long_calls = (base[0] != 'n');
> }
>
> + /* Handle -msched-costly-dep option. */
> + rs6000_sched_costly_dep = DEFAULT_SCHED_COSTLY_DEP;
> + if (rs6000_sched_costly_dep_str)
> + {
> + if (! strcmp (rs6000_sched_costly_dep_str, "no"))
> + rs6000_sched_costly_dep = no_dep_costly;
> + else if (! strcmp (rs6000_sched_costly_dep_str, "all"))
> + rs6000_sched_costly_dep = all_deps_costly;
> + else if (! strcmp (rs6000_sched_costly_dep_str, "true_store_to_load"))
> + rs6000_sched_costly_dep = true_store_to_load_dep_costly;
> + else if (! strcmp (rs6000_sched_costly_dep_str, "store_to_load"))
> + rs6000_sched_costly_dep = store_to_load_dep_costly;
> + else rs6000_sched_costly_dep = atoi (rs6000_sched_costly_dep_str);
> + }
> +
> #ifdef TARGET_REGNAMES
> /* If the user desires alternate register names, copy in the
> alternate names now. */
> *************** rs6000_use_sched_lookahead (void)
> *** 13129,13134 ****
> --- 13151,13293 ----
> return 4;
> return 0;
> }
> +
> + /* Determine is PAT refers to memory. */
> +
> + static bool
> + is_mem_ref (rtx pat)
> + {
> + const char * fmt;
> + int i, j;
> + bool ret = false;
> +
> + if (GET_CODE (pat) == MEM)
> + return true;
> +
> + /* Recursively process the pattern. */
> + fmt = GET_RTX_FORMAT (GET_CODE (pat));
> +
> + for (i = GET_RTX_LENGTH (GET_CODE (pat)) - 1; i >= 0 && !ret; i--)
> + {
> + if (fmt[i] == 'e')
> + ret |= is_mem_ref (XEXP (pat, i));
> + else if (fmt[i] == 'E')
> + for (j = XVECLEN (pat, i) - 1; j >= 0; j--)
> + ret |= is_mem_ref (XVECEXP (pat, i, j));
> + }
> +
> + return ret;
> + }
> +
> + /* Determine if PAT is a PATTERN of a load insn. */
> +
> + static bool
> + is_load_insn1 (rtx pat)
> + {
> + if (!pat || pat == NULL_RTX)
> + return false;
> +
> + if (GET_CODE (pat) == SET)
> + return is_mem_ref (SET_SRC (pat));
> +
> + if (GET_CODE (pat) == PARALLEL)
> + {
> + int i;
Please, add a blank line here.
> + for (i = 0; i < XVECLEN (pat, 0); i++)
> + if (is_load_insn1 (XVECEXP (pat, 0, i)))
> + return true;
> + }
> +
> + return false;
> + }
> +
> + /* Determine if INSN loads from memory. */
> +
> + static bool
> + is_load_insn (rtx insn)
> + {
> + if (!insn || !INSN_P (insn))
> + return false;
> +
> + if (GET_CODE (insn) == CALL_INSN)
> + return false;
> +
> + return is_load_insn1 (PATTERN (insn));
> + }
> +
> + /* Determine if PAT is a PATTERN of a store insn. */
> +
> + static bool
> + is_store_insn1 (rtx pat)
> + {
> + if (!pat || pat == NULL_RTX)
> + return false;
> +
> + if (GET_CODE (pat) == SET)
> + return is_mem_ref (SET_DEST (pat));
> +
> + if (GET_CODE (pat) == PARALLEL)
> + {
> + int i;
Blank line, please.
> + for (i = 0; i < XVECLEN (pat, 0); i++)
> + if (is_store_insn1 (XVECEXP (pat, 0, i)))
> + return true;
> + }
> +
> + return false;
> + }
> +
> + /* Determine if INSN stores to memory. */
> +
> + static bool
> + is_store_insn (rtx insn)
> + {
> + if (!insn || !INSN_P (insn))
> + return false;
> +
> + return is_store_insn1 (PATTERN (insn));
> + }
> +
> + /* Returns whether the dependence between INSN and NEXT is considered
> + costly by the given target. */
> +
> + static bool
> + rs6000_is_costly_dependence (rtx insn, rtx next, rtx link, int cost, int distance)
> + {
> + /* If the flag is not enbled - no dependence is considered costly;
> + allow all dependent insns in the same group.
> + This is the most aggressive option. */
> + if (rs6000_sched_costly_dep == no_dep_costly)
> + return false;
> +
> + /* If the flag is set to 1 - a dependence is always considered costly;
> + do not allow dependent instructions in the same group.
> + This is the most conservative option. */
> + if (rs6000_sched_costly_dep == all_deps_costly)
> + return true;
> +
> + if (rs6000_sched_costly_dep == store_to_load_dep_costly
> + && is_load_insn (next)
> + && is_store_insn (insn))
> + /* Prevent load after store in the same group. */
> + return true;
> +
> + if (rs6000_sched_costly_dep == true_store_to_load_dep_costly
> + && is_load_insn (next)
> + && is_store_insn (insn)
> + && (!link || (int) REG_NOTE_KIND (link) == 0))
> + /* Prevent load after store in the same group if it is a true dependence. */
> + return true;
> +
> + /* The flag is set to X; dependences with latency >= X are considered costly,
> + and will not be scheduled in the same group. */
> + if (rs6000_sched_costly_dep <= max_dep_latency
> + && ((cost - distance) >= (int)rs6000_sched_costly_dep))
> + return true;
> +
> + return false;
> + }
> +
>
>
> /* Length in units of the trampoline for entering a nested function. */
> Index: gcc/config/rs6000/rs6000.h
> ===================================================================
> RCS file: /cvsroot/gcc/gcc/gcc/config/rs6000/rs6000.h,v
> retrieving revision 1.286
> diff -c -3 -p -r1.286 rs6000.h
> *** gcc/config/rs6000/rs6000.h 21 Jul 2003 20:18:52 -0000 1.286
> --- gcc/config/rs6000/rs6000.h 2 Oct 2003 15:32:01 -0000
> *************** extern enum processor_type rs6000_cpu;
> *** 376,381 ****
> --- 376,391 ----
> and the old mnemonics are dialect zero. */
> #define ASSEMBLER_DIALECT (TARGET_NEW_MNEMONICS ? 1 : 0)
>
> + /* Types of costly dependences. */
> + enum rs6000_dependence_cost
> + {
> + max_dep_latency = 1000,
> + no_dep_costly,
> + all_deps_costly,
> + true_store_to_load_dep_costly,
> + store_to_load_dep_costly
> + };
> +
> /* This is meant to be overridden in target specific files. */
> #define SUBTARGET_OPTIONS
>
> *************** extern enum processor_type rs6000_cpu;
> *** 402,407 ****
> --- 412,419 ----
> {"longcall", &rs6000_longcall_switch, \
> N_("Avoid all range limits on call instructions"), 0}, \
> {"no-longcall", &rs6000_longcall_switch, "", 0}, \
> + {"sched-costly-dep=", &rs6000_sched_costly_dep_str, \
> + N_("determine which dependences between insns are considered costly"), 0}, \
> {"align-", &rs6000_alignment_string, \
> N_("Specify alignment of structure fields default/natural"), 0}, \
> SUBTARGET_OPTIONS \
> *************** extern const char *rs6000_longcall_switc
> *** 457,462 ****
> --- 469,477 ----
> extern int rs6000_default_long_calls;
> extern const char* rs6000_alignment_string;
> extern int rs6000_alignment_flags;
> + extern const char *rs6000_sched_costly_dep_str;
> + extern enum rs6000_dependence_cost rs6000_sched_costly_dep;
> +
>
> /* Alignment options for fields in structures for sub-targets following
> AIX-like ABI.
> *************** extern int rs6000_alignment_flags;
> *** 474,479 ****
> --- 489,499 ----
> #else
> #define TARGET_ALIGN_NATURAL 0
> #endif
> +
> + /* Set a default value for DEFAULT_SCHED_COSTLY_DEP used by target hook
> + is_costly_dependence. */
> + #define DEFAULT_SCHED_COSTLY_DEP \
> + (rs6000_cpu == PROCESSOR_POWER4 ? store_to_load_dep_costly : no_dep_costly)
>
> /* Define TARGET_MFCRF if the target assembler supports the optional
> field operand for mfcr and the target processor supports the
> Index: gcc/doc/invoke.texi
> ===================================================================
> RCS file: /cvsroot/gcc/gcc/gcc/doc/invoke.texi,v
> retrieving revision 1.339
> diff -c -3 -p -r1.339 invoke.texi
> *** gcc/doc/invoke.texi 25 Sep 2003 01:25:52 -0000 1.339
> --- gcc/doc/invoke.texi 2 Oct 2003 15:32:05 -0000
> *************** in the following sections.
> *** 287,293 ****
> -frerun-cse-after-loop -frerun-loop-opt @gol
> -frounding-math -fschedule-insns -fschedule-insns2 @gol
> -fno-sched-interblock -fno-sched-spec -fsched-spec-load @gol
> ! -fsched-spec-load-dangerous -fsched2-use-superblocks @gol
> -fsched2-use-traces -fsignaling-nans @gol
> -fsingle-precision-constant -fssa -fssa-ccp -fssa-dce @gol
> -fstrength-reduce -fstrict-aliasing -ftracer -fthread-jumps @gol
> --- 287,295 ----
> -frerun-cse-after-loop -frerun-loop-opt @gol
> -frounding-math -fschedule-insns -fschedule-insns2 @gol
> -fno-sched-interblock -fno-sched-spec -fsched-spec-load @gol
> ! -fsched-spec-load-dangerous @gol
> ! -fsched-stalled-insns=@var{n} -sched-stalled-insns-dep=@var{n} @gol
> ! -fsched2-use-superblocks @gol
> -fsched2-use-traces -fsignaling-nans @gol
> -fsingle-precision-constant -fssa -fssa-ccp -fssa-dce @gol
> -fstrength-reduce -fstrict-aliasing -ftracer -fthread-jumps @gol
> *************** in the following sections.
> *** 431,436 ****
> --- 433,439 ----
> -mno-relocatable -mrelocatable-lib -mno-relocatable-lib @gol
> -mtoc -mno-toc -mlittle -mlittle-endian -mbig -mbig-endian @gol
> -mdynamic-no-pic @gol
> + -msched-costly-dep=@var{dependence_type} @gol
> -mcall-sysv -mcall-netbsd @gol
> -maix-struct-return -msvr4-struct-return @gol
> -mabi=altivec -mabi=no-altivec @gol
> *************** Allow speculative motion of more load in
> *** 4114,4119 ****
> --- 4117,4134 ----
> sense when scheduling before register allocation, i.e.@: with
> @option{-fschedule-insns} or at @option{-O2} or higher.
>
> + @item -fsched-stalled-insns=@var{n}
> + @opindex fsched-stalled-insns
> + Define how many insns (if any) can be moved prematurely from the queue
> + of stalled insns into the ready list, during the second scheduling pass.
> +
> + @item -fsched-stalled-insns-dep=@var{n}
> + @opindex fsched-stalled-insns-dep
> + Define how many insn groups (cycles) will be examined for a dependency
> + on a stalled insn that is candidate for premature removal from the queue
^ a
> + of stalled insns. Has an effect only suring the second scheduling pass,
It has ... ^during
> + and only if the flag sched_stalled_insns is set.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@option{-fsched-stalled-insns} is used and its value is not zero
> +
> @item -fsched2-use-superblocks
> @opindex fsched2-use-superblocks
> When scheduling after register allocation, do use superblock scheduling
> *************** relocatable, but that its external refer
> *** 7526,7531 ****
> --- 7541,7557 ----
> resulting code is suitable for applications, but not shared
> libraries.
>
> + @item -msched-costly-dep=@var{dependence_type}
> + @opindex msched-costly-dep
> + This option controls which dependences are considered costly
> + by the target during intruction scheduling. The argument
^ additional blank
> + @var{dependence_type} takes one of the following values:
> + @var{no}: no dependence is costly,
> + @var{all}: all dependences are costly,
> + @var{true_store_to_load}: a true dependence from store to load is costly,
> + @var{store_to_load}: any dependence from store to load is costly,
> + @var{number}: any dependence which latency >= @var{number} is costly.
> +
> @item -mcall-sysv
> @opindex mcall-sysv
> On System V.4 and embedded PowerPC systems compile code using calling
>
>
>
>