This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
cfg merge part 18 - superblock formation pass
- From: Jan Hubicka <jh at suse dot cz>
- To: gcc-patches at gcc dot gnu dot org, rth at cygnus dot com,gcc-pdo at atrey dot karlin dot mff dot cuni dot cz, aj at suse dot de
- Date: Fri, 24 May 2002 15:46:35 +0200
- Subject: cfg merge part 18 - superblock formation pass
Hi,
this patch implements superblock formation pass. According Andreas
Jaeger testing, currently it brings about 0.5% improvement to SPECint on
Athlon with profile feedback. There is one important slowdown for
crafty (4.3%), without this problem it should be more. I attrbute this
to fact that crafty are just at the corner of code cache and any code
growth is problem for these. For the comparison -O3 versus -O3 -ftracer
the slowdown does not happend. The speedups are comming from
perl(2.5%)/gcc(3.8%)/vortex(1.3%) benchmarks, as expected since all
these tests have complicated but relatively well predictable control
flow. The results are relatively consistent with ones reported by DEC
team about 1.2% on SPEC95.
The benefits can be also increased twice or almost three times by adding
the webizer pass (or SSA/deSSA) just after tracing, since the code
duplication now commonly turns local pseudos to global causing bad
results from CSE/register allocation. Aditionally on non-x86
arhcitectures I believe there should be considerably benefit from using
trace scheduling that already works on the branch.
But even currently I think the effectivity compares well to code
inlining (1% speedup) and unrolling (1.3% speedup) given the complexity
of the pass and runtime/code size costs (the code growth varies about
0.1-5% with 2.5% average growth.
The compilation slows down by about 2%, not because tracer is costy, but
because it temporarily increases the code size considerably (about 30%)
and relies on crossjumping to collapse it again, so the cost of CSE2
(exponential in superblock size) and register allocation increases
noticeably.
The main drawback of tracing is the dificulty to tune it, since at time
code is specialized, tracer has no idea whether it worth it. There are
couple of --param options to play around values and I didn't paid any
important care to tune them, only to keep code size under controll.
Regtested/bootstrapped mainline with -O2 -ftracer, OK for mainline? The
results reported by Andreas Jaeger (non PDO and SPECfp results come later, but
it can be expected to get less benefits there, tracer is not pass helping
to numeric code and relies quite heavily on the profile).
Base Compiler: GCC CVS from 2002-05-23 plus tracer
Peak Compiler: GCC CVS from 2002-05-23 plus tracer
cflags base: -O2 -march=athlon-xp -malign-double
cflags peak: -O2 -march=athlon-xp -malign-double -ftracer
Iterations: 1
Running on: macintyre
PDO for base and peak: Yes
This run is: SPECint
Total time for base compilation: 550 s
Total time for peak compilation: 562 s
Estimated Estimated
Base Base Base Peak Peak Peak
Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio
------------ -------- -------- -------- -------- -------- --------
164.gzip 1400 247 566* 1400 248 565*
175.vpr 1400 453 309* 1400 455 308*
176.gcc 1100 265 415* 1100 255 431*
181.mcf 1800 886 203* 1800 886 203*
186.crafty 1000 140 713* 1000 146 683*
197.parser 1800 472 381* 1800 469 383*
252.eon X X
253.perlbmk 1800 288 626* 1800 280 642*
254.gap 1100 220 501* 1100 217 508*
255.vortex 1900 362 524* 1900 358 531*
256.bzip2 1500 391 384* 1500 387 388*
300.twolf 3000 862 348* 3000 862 348*
Est. SPECint_base2000 428
Est. SPECint2000 430
Honza
Fri May 24 15:26:12 CEST 2002 Jan Hubicka <jh@suse.cz>
* Makefile.in (tracer.o): New.
* params.def (TRACER_*): New options.
* rtl.h (tracer): Declare.
* timevar.def (TV_TRACER): New.
* toplev.c (dump_file_index): Add DFI_tracer.
(dump_file_info): Add tracer.
(flag_tracer): New.
(lang_indepdenent_options): Add tracer.
(rest_of_compilation): Call tracer.
* tracer.c: New file.
* invoke.texi (-ftracer): Document.
(--param tracer-*): Document.
Index: gcc/Makefile.in
===================================================================
RCS file: /cvs/gcc/egcs/gcc/Makefile.in,v
retrieving revision 1.879
diff -c -3 -p -r1.879 Makefile.in
*** gcc/Makefile.in 23 May 2002 18:13:25 -0000 1.879
--- gcc/Makefile.in 24 May 2002 13:23:27 -0000
*************** OBJS = alias.o bb-reorder.o bitmap.o bui
*** 736,742 ****
reload.o reload1.o reorg.o resource.o rtl.o rtlanal.o rtl-error.o \
sbitmap.o sched-deps.o sched-ebb.o sched-rgn.o sched-vis.o sdbout.o \
sibcall.o simplify-rtx.o ssa.o ssa-ccp.o ssa-dce.o stmt.o \
! stor-layout.o stringpool.o timevar.o toplev.o tree.o tree-dump.o \
tree-inline.o unroll.o varasm.o varray.o version.o vmsdbgout.o xcoffout.o \
$(GGC) $(out_object_file) $(EXTRA_OBJS)
--- 736,742 ----
reload.o reload1.o reorg.o resource.o rtl.o rtlanal.o rtl-error.o \
sbitmap.o sched-deps.o sched-ebb.o sched-rgn.o sched-vis.o sdbout.o \
sibcall.o simplify-rtx.o ssa.o ssa-ccp.o ssa-dce.o stmt.o \
! stor-layout.o stringpool.o timevar.o toplev.o tracer.o tree.o tree-dump.o \
tree-inline.o unroll.o varasm.o varray.o version.o vmsdbgout.o xcoffout.o \
$(GGC) $(out_object_file) $(EXTRA_OBJS)
*************** predict.o: predict.c $(CONFIG_H) $(SYSTE
*** 1597,1602 ****
--- 1597,1605 ----
lists.o: lists.c $(CONFIG_H) $(SYSTEM_H) toplev.h $(RTL_H) $(GGC_H)
bb-reorder.o : bb-reorder.c $(CONFIG_H) $(SYSTEM_H) $(RTL_H) $(TREE_H) \
flags.h $(BASIC_BLOCK_H) hard-reg-set.h output.h cfglayout.h $(TARGET_H)
+ tracer.o : tracer.c $(CONFIG_H) $(SYSTEM_H) $(RTL_H) $(TREE_H) \
+ $(BASIC_BLOCK_H) hard-reg-set.h output.h cfglayout.h flags.h \
+ $(PARAMS_H) profile.h
cfglayout.o : cfglayout.c $(CONFIG_H) $(SYSTEM_H) $(RTL_H) $(TREE_H) \
insn-config.h $(BASIC_BLOCK_H) hard-reg-set.h output.h function.h \
cfglayout.h
Index: gcc/params.def
===================================================================
RCS file: /cvs/gcc/egcs/gcc/params.def,v
retrieving revision 1.14
diff -c -3 -p -r1.14 params.def
*** gcc/params.def 15 May 2002 09:00:04 -0000 1.14
--- gcc/params.def 24 May 2002 13:23:27 -0000
*************** DEFPARAM(HOT_BB_FREQUENCY_FRACTION,
*** 159,164 ****
--- 159,188 ----
"hot-bb-frequency-fraction",
"Select fraction of the maximal frequency of executions of basic block in function given basic block needs to have to be considered hot",
1000)
+ DEFPARAM(TRACER_DYNAMIC_COVERAGE_FEEDBACK,
+ "tracer-dynamic-coverage-feedback",
+ "How large fraction should be covered by trace formation weighted by the execution frequencies (in percents). Used when profile feedback is available",
+ 95)
+ DEFPARAM(TRACER_DYNAMIC_COVERAGE,
+ "tracer-dynamic-coverage",
+ "How large fraction should be covered by trace formation weighted by the execution frequencies (in percents). Used when profile feedback is not available.",
+ 75)
+ DEFPARAM(TRACER_MAX_CODE_GROWTH,
+ "tracer-max-code-growth",
+ "Maximal code growth caused by tail duplication (in percents)",
+ 100)
+ DEFPARAM(TRACER_MIN_BRANCH_RATIO,
+ "tracer-min-branch-ratio",
+ "Stop reverse growth if the reverse probability of best edge is less htan this threshold (in percents)",
+ 10)
+ DEFPARAM(TRACER_MIN_BRANCH_PROBABILITY_FEEDBACK,
+ "tracer-min-branch-probability-feedback",
+ "Stop forward growth if the probability of best edge is less htan this threshold (in percents). Used when profile feedback is available",
+ 30)
+ DEFPARAM(TRACER_MIN_BRANCH_PROBABILITY,
+ "tracer-min-branch-probability",
+ "Stop forward growth if the probability of best edge is less htan this threshold (in percents). Used when profile feedback is not available",
+ 50)
/*
Local variables:
mode:c
Index: gcc/rtl.h
===================================================================
RCS file: /cvs/gcc/egcs/gcc/rtl.h,v
retrieving revision 1.348
diff -c -3 -p -r1.348 rtl.h
*** gcc/rtl.h 21 May 2002 15:39:31 -0000 1.348
--- gcc/rtl.h 24 May 2002 13:23:28 -0000
*************** extern void if_convert PARAMS ((int));
*** 2282,2285 ****
--- 2282,2287 ----
/* In predict.c */
extern void invert_br_probabilities PARAMS ((rtx));
extern bool expensive_function_p PARAMS ((int));
+ /* In tracer.c */
+ extern void tracer PARAMS ((void));
#endif /* ! GCC_RTL_H */
Index: gcc/timevar.def
===================================================================
RCS file: /cvs/gcc/egcs/gcc/timevar.def,v
retrieving revision 1.13
diff -c -3 -p -r1.13 timevar.def
*** gcc/timevar.def 6 Mar 2002 10:17:23 -0000 1.13
--- gcc/timevar.def 24 May 2002 13:23:28 -0000
*************** DEFTIMEVAR (TV_JUMP , "
*** 58,63 ****
--- 58,64 ----
DEFTIMEVAR (TV_CSE , "CSE")
DEFTIMEVAR (TV_GCSE , "global CSE")
DEFTIMEVAR (TV_LOOP , "loop analysis")
+ DEFTIMEVAR (TV_TRACER , "tracer")
DEFTIMEVAR (TV_CSE2 , "CSE 2")
DEFTIMEVAR (TV_BRANCH_PROB , "branch prediction")
DEFTIMEVAR (TV_FLOW , "flow analysis")
Index: gcc/toplev.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/toplev.c,v
retrieving revision 1.634
diff -c -3 -p -r1.634 toplev.c
*** gcc/toplev.c 23 May 2002 23:36:56 -0000 1.634
--- gcc/toplev.c 24 May 2002 13:23:29 -0000
*************** enum dump_file_index
*** 223,228 ****
--- 223,229 ----
DFI_loop,
DFI_cfg,
DFI_bp,
+ DFI_tracer,
DFI_cse2,
DFI_life,
DFI_combine,
*************** static struct dump_file_info dump_file[D
*** 270,275 ****
--- 271,277 ----
{ "loop", 'L', 1, 0, 0 },
{ "cfg", 'f', 1, 0, 0 },
{ "bp", 'b', 1, 0, 0 },
+ { "tracer", 'T', 1, 0, 0 },
{ "cse2", 't', 1, 0, 0 },
{ "life", 'f', 1, 0, 0 }, /* Yes, duplicate enable switch. */
{ "combine", 'c', 1, 0, 0 },
*************** int flag_merge_constants = 1;
*** 865,870 ****
--- 867,876 ----
one, unconditionally renumber instruction UIDs. */
int flag_renumber_insns = 1;
+ /* Nonzero if we perform superblock formation. */
+
+ int flag_tracer = 0;
+
/* Values of the -falign-* flags: how much to align labels in code.
0 means `use default', 1 means `don't align'.
For each variable, there is an _log variant which is the power
*************** static const lang_independent_options f_
*** 976,981 ****
--- 982,989 ----
N_("When possible do not generate stack frames") },
{"optimize-sibling-calls", &flag_optimize_sibling_calls, 1,
N_("Optimize sibling and tail recursive calls") },
+ {"tracer", &flag_tracer, 1,
+ N_("Perform superblock formation via tail duplication") },
{"cse-follow-jumps", &flag_cse_follow_jumps, 1,
N_("When running CSE, follow jumps to their targets") },
{"cse-skip-blocks", &flag_cse_skip_blocks, 1,
*************** rest_of_compilation (decl)
*** 2957,2962 ****
--- 2965,2986 ----
flow_loops_free (&loops);
close_dump_file (DFI_bp, print_rtl_with_bb, insns);
timevar_pop (TV_BRANCH_PROB);
+ }
+ if (flag_tracer)
+ {
+ timevar_push (TV_TRACER);
+ open_dump_file (DFI_tracer, decl);
+ if (rtl_dump_file)
+ dump_flow_info (rtl_dump_file);
+ cleanup_cfg (CLEANUP_EXPENSIVE);
+ tracer ();
+ cleanup_cfg (CLEANUP_EXPENSIVE);
+ #ifdef ENABLE_CHECKING
+ verify_flow_info ();
+ #endif
+ close_dump_file (DFI_tracer, print_rtl_with_bb, get_insns ());
+ timevar_pop (TV_TRACER);
+ reg_scan (get_insns (), max_reg_num (), 0);
}
if (optimize > 0)
Index: gcc/tracer.c
===================================================================
RCS file: gcc/tracer.c
diff -N gcc/tracer.c
*** /dev/null 1 Jan 1970 00:00:00 -0000
--- gcc/tracer.c 24 May 2002 13:23:29 -0000
***************
*** 0 ****
--- 1,371 ----
+ /* The tracer pass for the GNU compiler.
+ Contributed by Jan Hubicka, SuSE Labs.
+ Copyright (C) 2001 Free Software Foundation, Inc.
+
+ This file is part of GCC.
+
+ GCC is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2, or (at your option)
+ any later version.
+
+ GCC is distributed in the hope that it will be useful, but WITHOUT
+ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
+ License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with GCC; see the file COPYING. If not, write to the Free
+ Software Foundation, 59 Temple Place - Suite 330, Boston, MA
+ 02111-1307, USA. */
+
+ /* This pass performs the tail duplication needed for superblock formation.
+ For more information see:
+
+ Design and Analysis of Profile-Based Optimization in Compaq's
+ Compilation Tools for Alpha; Journal of Instruction-Level
+ Parallelism 3 (2000) 1-25
+
+ Unlike Compaq's implementation we don't do the loop peeling as most
+ probably a better job can be done by a special pass and we don't
+ need to worry too much about the code size implications as the tail
+ duplicates are crossjumped again if optimizations are not
+ performed. */
+
+
+ #include "config.h"
+ #include "system.h"
+ #include "tree.h"
+ #include "rtl.h"
+ #include "hard-reg-set.h"
+ #include "basic-block.h"
+ #include "output.h"
+ #include "cfglayout.h"
+ #include "fibheap.h"
+ #include "flags.h"
+ #include "params.h"
+ #include "profile.h"
+
+ static int count_insns PARAMS ((basic_block));
+ static bool ignore_bb_p PARAMS ((basic_block));
+ static bool better_p PARAMS ((edge, edge));
+ static int find_trace PARAMS ((basic_block, basic_block *));
+ static void tail_duplicate PARAMS ((void));
+ static void layout_superblocks PARAMS ((void));
+ static bool ignore_bb_p PARAMS ((basic_block));
+
+ /* Return true if BB has been seen - it is connected to some trace
+ already. */
+
+ #define seen(bb) (RBI (bb)->visited || RBI (bb)->next)
+
+ /* Return true if we should ignore the basic block for purposes of tracing. */
+ static bool
+ ignore_bb_p (basic_block bb)
+ {
+ if (bb->index < 0)
+ return true;
+ if (!maybe_hot_bb_p (bb))
+ return true;
+ return false;
+ }
+
+ /* Return number of instructions in the block. */
+
+ static int
+ count_insns (bb)
+ basic_block bb;
+ {
+ rtx insn;
+ int n = 0;
+
+ for (insn = bb->head; insn != NEXT_INSN (bb->end); insn = NEXT_INSN (insn))
+ if (active_insn_p (insn))
+ n++;
+ return n;
+ }
+
+ /* Return true if E1 is more frequent than E2. */
+ static bool
+ better_p (e1, e2)
+ edge e1, e2;
+ {
+ if (e1->count != e2->count)
+ return e1->count > e2->count;
+ if (e1->src->frequency * e1->probability !=
+ e2->src->frequency * e2->probability)
+ return (e1->src->frequency * e1->probability
+ > e2->src->frequency * e2->probability);
+ /* This is needed to avoid changes in the decision after
+ CFG is modified. */
+ if (e1->src != e2->src)
+ return e1->src->index > e2->src->index;
+ return e1->dest->index > e2->dest->index;
+ }
+
+ /* Return most frequent successor of basic block BB. */
+
+ static edge
+ find_best_successor (basic_block bb)
+ {
+ edge e;
+ edge best = NULL;
+
+ for (e = bb->succ; e; e = e->succ_next)
+ if (!best || better_p (e, best))
+ best = e;
+ if (!best || ignore_bb_p (best->dest))
+ return NULL;
+ if (best->probability <= ((int) (REG_BR_PROB_BASE / 100
+ * (profile_info.count_profiles_merged
+ && flag_branch_probabilities
+ ? PARAM_VALUE
+ (TRACER_MIN_BRANCH_PROBABILITY_FEEDBACK)
+ : PARAM_VALUE
+ (TRACER_MIN_BRANCH_PROBABILITY)))))
+ return NULL;
+ return best;
+ }
+
+ /* Return most frequent predecessor of basic block BB. */
+
+ static edge
+ find_best_predecessor (basic_block bb)
+ {
+ edge e;
+ edge best = NULL;
+
+ for (e = bb->pred; e; e = e->pred_next)
+ if (!best || better_p (e, best))
+ best = e;
+ if (!best || ignore_bb_p (best->src))
+ return NULL;
+ if (EDGE_FREQUENCY (best) * REG_BR_PROB_BASE
+ < bb->frequency * (int) (REG_BR_PROB_BASE / 100
+ * PARAM_VALUE (TRACER_MIN_BRANCH_RATIO)))
+ return NULL;
+ return best;
+ }
+
+ /* Find the trace using bb and record it in the TRACE array.
+ Return number of basic blocks recorded. */
+
+ static int
+ find_trace (bb, trace)
+ basic_block bb;
+ basic_block *trace;
+ {
+ int i = 0;
+ edge e;
+
+ if (rtl_dump_file)
+ fprintf (rtl_dump_file, "Trace seed %i [%i]", bb->index, bb->frequency);
+
+ while ((e = find_best_predecessor (bb)) != NULL)
+ {
+ basic_block bb2 = e->src;
+ if (find_best_successor (bb2) != e
+ || seen (bb2)
+ || (e->flags & (EDGE_DFS_BACK | EDGE_COMPLEX)))
+ break;
+ if (rtl_dump_file)
+ fprintf (rtl_dump_file, ",%i [%i]", bb->index, bb->frequency);
+ bb = bb2;
+ }
+ if (rtl_dump_file)
+ fprintf (rtl_dump_file, " forward %i [%i]", bb->index, bb->frequency);
+ trace [i++] = bb;
+
+ /* Follow the trace in forward direction. */
+ while ((e = find_best_successor (bb)) != NULL)
+ {
+ bb = e->dest;
+ if (find_best_predecessor (bb) != e
+ || seen (bb)
+ || (e->flags & (EDGE_DFS_BACK | EDGE_COMPLEX)))
+ break;
+ if (rtl_dump_file)
+ fprintf (rtl_dump_file, ",%i [%i]", bb->index, bb->frequency);
+ trace [i++] = bb;
+ }
+ if (rtl_dump_file)
+ fprintf (rtl_dump_file, "\n");
+ return i;
+ }
+
+ /* Look for basic blocks in frequency order, construct traces and tail duplicate
+ if profitable. */
+
+ static void
+ tail_duplicate ()
+ {
+ fibnode_t *blocks = xcalloc (n_basic_blocks, sizeof (fibnode_t));
+ basic_block *trace = xmalloc (sizeof (basic_block) * n_basic_blocks);
+ int ninsns = 0, nduplicated = 0;
+ gcov_type weighted_insns = 0, traced_insns = 0;
+ fibheap_t heap = fibheap_new ();
+ int i;
+ gcov_type cover_insns;
+ int max_dup_insns;
+
+ for (i = 0; i < n_basic_blocks; i++)
+ if (!ignore_bb_p (BASIC_BLOCK (i)))
+ blocks[i] = fibheap_insert (heap, -BASIC_BLOCK (i)->frequency,
+ BASIC_BLOCK (i));
+
+ for (i = 0; i < n_basic_blocks; i++)
+ {
+ int n = count_insns (BASIC_BLOCK (i));
+
+ ninsns += n;
+ weighted_insns += n * BASIC_BLOCK (i)->frequency;
+ }
+
+ if (profile_info.count_profiles_merged
+ && flag_branch_probabilities)
+ cover_insns = ((weighted_insns
+ * PARAM_VALUE (TRACER_DYNAMIC_COVERAGE_FEEDBACK) + 50) / 100);
+ else
+ cover_insns = ((weighted_insns * PARAM_VALUE (TRACER_DYNAMIC_COVERAGE) + 50)
+ / 100);
+ max_dup_insns = (ninsns * PARAM_VALUE (TRACER_MAX_CODE_GROWTH) + 50) / 100;
+
+ for (; traced_insns < cover_insns
+ && nduplicated < max_dup_insns
+ && !fibheap_empty (heap);)
+ {
+ basic_block bb = fibheap_extract_min (heap);
+ int n, pos;
+
+ if (!bb)
+ break;
+
+ blocks[bb->index] = NULL;
+
+ if (ignore_bb_p (bb))
+ continue;
+ if (seen (bb))
+ abort ();
+
+ n = find_trace (bb, trace);
+
+ bb = trace[0];
+ traced_insns += bb->frequency * count_insns (bb);
+ if (blocks[bb->index])
+ {
+ fibheap_delete_node (heap, blocks[bb->index]);
+ blocks[bb->index] = NULL;
+ }
+
+ for (pos = 1; pos < n; pos++)
+ {
+ basic_block bb2 = trace[pos];
+
+ if (blocks[bb2->index])
+ {
+ fibheap_delete_node (heap, blocks[bb2->index]);
+ blocks[bb2->index] = NULL;
+ }
+ if (bb2->pred && bb2->pred->pred_next
+ && cfg_layout_can_duplicate_bb_p (bb2))
+ {
+ edge e = bb2->pred;
+ basic_block old = bb2;
+
+ while (e->src != bb)
+ e = e->pred_next;
+ bb2 = cfg_layout_duplicate_bb (bb2, e);
+ nduplicated += count_insns (bb2);
+
+ /* Reconsider the original copy of block we've duplicated.
+ Removing the most common predecesor may make it to be
+ head. */
+ blocks[old->index] = fibheap_insert (heap, -old->frequency, old);
+
+ if (rtl_dump_file)
+ fprintf (rtl_dump_file, "Duplicated %i as %i [%i]\n",
+ old->index, bb2->index, bb2->frequency);
+ }
+ RBI (bb)->next = bb2;
+ RBI (bb2)->visited = 1;
+ traced_insns += bb2->frequency * count_insns (bb2);
+ bb = bb2;
+ /* In case the trace became infrequent, stop duplicating. */
+ if (ignore_bb_p (bb))
+ break;
+ }
+ if (rtl_dump_file)
+ fprintf (rtl_dump_file, " covered now %.1f\n\n",
+ traced_insns * 100.0 / weighted_insns);
+ }
+ if (rtl_dump_file)
+ fprintf (rtl_dump_file, "Duplicated %i insns (%i%%)\n", nduplicated,
+ nduplicated * 100 / ninsns);
+
+ free (blocks);
+ free (trace);
+ fibheap_delete (heap);
+ }
+ /* Connect the superblocks into linear seuqence. At the moment we attempt to keep
+ the original order as much as possible, but the algorithm may be made smarter
+ later if needed. BB reordering pass should void most of the benefits of such
+ change though. */
+
+ static void
+ layout_superblocks ()
+ {
+ basic_block end = BASIC_BLOCK (0);
+ int i = 1;
+
+ while (i < n_basic_blocks)
+ {
+ edge e, best = NULL;
+ while (RBI (end)->next)
+ end = RBI (end)->next;
+
+ for (e = end->succ; e; e = e->succ_next)
+ if (e->dest != EXIT_BLOCK_PTR
+ && e->dest != BASIC_BLOCK (0)
+ && ! RBI (e->dest)->visited
+ && (!best || EDGE_FREQUENCY (e) > EDGE_FREQUENCY (best)))
+ best = e;
+
+ if (best)
+ {
+ RBI (end)->next = best->dest;
+ RBI (best->dest)->visited = 1;
+ }
+ else
+ for (; i < n_basic_blocks; i++)
+ {
+ basic_block bb = BASIC_BLOCK (i);
+
+ if (!RBI (bb)->visited)
+ {
+ RBI (end)->next = bb;
+ RBI (bb)->visited = 1;
+ break;
+ }
+ }
+ }
+ }
+
+ /* Main entry point to this file. */
+
+ void
+ tracer ()
+ {
+ if (n_basic_blocks <= 1)
+ return;
+ cfg_layout_initialize ();
+ mark_dfs_back_edges ();
+ if (rtl_dump_file)
+ dump_flow_info (rtl_dump_file);
+ tail_duplicate ();
+ layout_superblocks ();
+ if (rtl_dump_file)
+ dump_flow_info (rtl_dump_file);
+ cfg_layout_finalize ();
+ /* Merge basic blocks in duplicated traces. */
+ cleanup_cfg (CLEANUP_EXPENSIVE);
+ }
Index: gcc/doc/invoke.texi
===================================================================
RCS file: /cvs/gcc/egcs/gcc/doc/invoke.texi,v
retrieving revision 1.148
diff -c -3 -p -r1.148 invoke.texi
*** gcc/doc/invoke.texi 23 May 2002 15:48:00 -0000 1.148
--- gcc/doc/invoke.texi 24 May 2002 13:23:37 -0000
*************** in the following sections.
*** 282,288 ****
-frerun-cse-after-loop -frerun-loop-opt @gol
-fschedule-insns -fschedule-insns2 @gol
-fsingle-precision-constant -fssa -fssa-ccp -fssa-dce @gol
! -fstrength-reduce -fstrict-aliasing -fthread-jumps -ftrapv @gol
-funroll-all-loops -funroll-loops @gol
--param @var{name}=@var{value}
-O -O0 -O1 -O2 -O3 -Os}
--- 282,288 ----
-frerun-cse-after-loop -frerun-loop-opt @gol
-fschedule-insns -fschedule-insns2 @gol
-fsingle-precision-constant -fssa -fssa-ccp -fssa-dce @gol
! -fstrength-reduce -fstrict-aliasing -ftracer -fthread-jumps -ftrapv @gol
-funroll-all-loops -funroll-loops @gol
--param @var{name}=@var{value}
-O -O0 -O1 -O2 -O3 -Os}
*************** those which have no call-preserved regis
*** 3637,3642 ****
--- 3637,3648 ----
For all machines, optimization level 2 and higher enables this flag by
default.
+ @item -ftracer
+ @opindex ftracer
+ Perform tail duplication to enlarge superblock size. These transformations
+ simplify control structure of programs allowing other optimizations to do
+ better job.
+
@item -funroll-loops
@opindex funroll-loops
Unroll loops whose number of iterations can be determined at compile
*************** given basic block needs to have to be co
*** 3941,3946 ****
--- 3947,3985 ----
@item hot-bb-frequency-fraction
Select fraction of the maximal frequency of executions of basic block in
function given basic block needs to have to be considered hot
+
+ @item tracer-dynamic-coverage
+ @item tracer-dynamic-coverage-feedback
+
+ This value is used to limit superblock formation once given percentage of
+ executed instructions is covered. This limits unnecesary code size expansion.
+
+ The @option{tracer-dynamic-coverage-feedback} is used only when profile
+ feedback is available. The real profiles (as opposed to statically estimated
+ ones) are much less balanced allowing the threshold to be larger value.
+
+ @item tracer-max-code-growth
+ Stop tail duplication once code growth has reached given percentage. This is
+ rather hokey argument, as most of the duplicates will be elliminated later in
+ cross jumping, so it may be set to much higher values than is the desired code
+ growth.
+
+ @item tracer-min-branch-ratio
+
+ Stop reverse growth when the reverse probability of best edge is less than this
+ threshold (in percent).
+
+ @item tracer-min-branch-ratio
+ @item tracer-min-branch-ratio-feedback
+
+ Stop forward growth if the best edge do have probability lower than this
+ threshold.
+
+ Similary to @option{tracer-dynamic-coverage} two values are present, one for
+ compilation for profile feedback and one for compilation without. The value
+ for compilation with profile feedback needs to be more conservative (higher) in
+ order to make tracer effective.
+
@end table
@end table