This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[patch] TSP based bb reordering
- From: Zdenek Dvorak <rakdver at atrey dot karlin dot mff dot cuni dot cz>
- To: gcc-patches at gcc dot gnu dot org
- Date: Wed, 31 Mar 2004 09:12:12 +0200
- Subject: [patch] TSP based bb reordering
Hello,
this patch implements basic block reordering based on the reduction
to the traveling salesman problem as described in
www.research.att.com/~dsj/papers/PLDI.ps. I have done the testing
on i686 (-march=athlon), so the original motivation (optimization for
the static branch prediction) was not useful, but using it just to
minimize the number of executed unconditional jumps turned out to
also have benefits (mostly on sizes of binaries, see test results
below). The time of compilation is increased by about 3% when using
it. Bootstrapped (with -ftsp-ordering) & regtested on i686.
Zdenek
Peak is with -ftsp-ordering.
Without profile feedback:
Base Base Base Peak Peak Peak
Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio
------------ -------- -------- -------- -------- -------- --------
164.gzip 1400 190 736 * 1400 188 743 *
175.vpr 1400 379 370 * 1400 374 375 *
176.gcc 1100 -- X 1100 -- X
181.mcf 1800 733 246 * 1800 733 245 *
186.crafty 1000 114 878 * 1000 114 874 *
197.parser 1800 396 454 * 1800 398 453 *
252.eon 1300 -- X 1300 -- X
253.perlbmk 1800 199 906 * 1800 199 903 *
254.gap 1100 178 618 * 1100 175 629 *
255.vortex 1900 238 797 * 1900 240 791 *
256.bzip2 1500 326 459 * 1500 328 458 *
300.twolf 3000 781 384 * 3000 773 388 *
Sizes of binaries:
53142 gzip_base.default-linux
52822 gzip_peak.default-linux
152806 vpr_base.default-linux
151142 vpr_peak.default-linux
24619 mcf_base.default-linux
24619 mcf_peak.default-linux
256204 crafty_base.default-linux
254060 crafty_peak.default-linux
120386 parser_base.default-linux
118082 parser_peak.default-linux
573441 perlbmk_base.default-linux
564577 perlbmk_peak.default-linux
477278 gap_base.default-linux
470174 gap_peak.default-linux
664688 vortex_base.default-linux
653936 vortex_peak.default-linux
46364 bzip2_base.default-linux
46332 bzip2_peak.default-linux
212799 twolf_base.default-linux
209855 twolf_peak.default-linux
With profile feedback:
Base Base Base Peak Peak Peak
Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio
------------ -------- -------- -------- -------- -------- --------
164.gzip 1400 186 752 * 1400 187 751 *
175.vpr 1400 379 370 * 1400 376 373 *
176.gcc 1100 -- X 1100 -- X
181.mcf 1800 732 246 * 1800 734 245 *
186.crafty 1000 111 902 * 1000 111 899 *
197.parser 1800 396 455 * 1800 394 456 *
252.eon 1300 -- X 1300 -- X
253.perlbmk 1800 189 951 * 1800 190 949 *
254.gap 1100 174 633 * 1100 174 634 *
255.vortex 1900 223 852 * 1900 214 886 *
256.bzip2 1500 322 466 * 1500 323 464 *
300.twolf 3000 786 382 * 3000 792 379 *
Sizes of binaries:
52126 gzip_base.FDO
51774 gzip_peak.FDO
149358 vpr_base.FDO
148846 vpr_peak.FDO
24755 mcf_base.FDO
24755 mcf_peak.FDO
255444 crafty_base.FDO
253364 crafty_peak.FDO
117450 parser_base.FDO
116426 parser_peak.FDO
553193 perlbmk_base.FDO
545737 perlbmk_peak.FDO
457606 gap_base.FDO
453158 gap_peak.FDO
641336 vortex_base.FDO
632024 vortex_peak.FDO
45732 bzip2_base.FDO
45316 bzip2_peak.FDO
208839 twolf_base.FDO
205255 twolf_peak.FDO
* tsp.c: New file.
* Makefile.in (tsp.o): New.
* bb-reorder.c (JUMP_COST, HIT_TAKEN_COST, MISS_TAKEN_COST,
MISS_FALLTHRU_COST): New.
(reorder_using_tsp, cost_of_uncond_jump, cost_of_cond_jump,
cost_of_cond_branch, cost_b_after_a, record_edges): New functions.
(reorder_basic_blocks): Call reorder_using_tsp.
* cfglayout.h (struct vertex): New.
(MAX_TSP_SIZE): New.
(solve_tsp): Declare.
* common.opt (ftsp-ordering): New.
* flags.h (flag_tsp_ordering): Declare.
* opts.c (decode_options, common_handle_option): Handle
flag_tsp_ordering.
* toplev.c (flag_tsp_ordering): New.
(process_options): Enable flag_reorder_blocks if flag_tsp_ordering is
set.
* doc/invoke.texi: Document -ftsp-ordering.
* doc/passes.texi: Document tsp based bb reordering pass.
Index: Makefile.in
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Makefile.in,v
retrieving revision 1.1266
diff -c -3 -p -r1.1266 Makefile.in
*** Makefile.in 24 Mar 2004 18:03:15 -0000 1.1266
--- Makefile.in 31 Mar 2004 07:02:05 -0000
*************** C_OBJS = c-parse.o c-lang.o c-pretty-pri
*** 846,852 ****
OBJS-common = \
alias.o bb-reorder.o bitmap.o builtins.o caller-save.o calls.o \
! cfg.o cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o cfgloop.o \
cfgloopanal.o cfgloopmanip.o loop-init.o loop-unswitch.o loop-unroll.o \
cfgrtl.o combine.o conflict.o convert.o coverage.o cse.o cselib.o \
dbxout.o debug.o df.o diagnostic.o dojump.o dominance.o loop-doloop.o \
--- 846,852 ----
OBJS-common = \
alias.o bb-reorder.o bitmap.o builtins.o caller-save.o calls.o \
! cfg.o cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o tsp.o cfgloop.o \
cfgloopanal.o cfgloopmanip.o loop-init.o loop-unswitch.o loop-unroll.o \
cfgrtl.o combine.o conflict.o convert.o coverage.o cse.o cselib.o \
dbxout.o debug.o df.o diagnostic.o dojump.o dominance.o loop-doloop.o \
*************** tracer.o : tracer.c $(CONFIG_H) $(SYSTEM
*** 1849,1854 ****
--- 1849,1856 ----
cfglayout.o : cfglayout.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
$(RTL_H) $(TREE_H) insn-config.h $(BASIC_BLOCK_H) hard-reg-set.h output.h \
function.h cfglayout.h cfgloop.h $(TARGET_H) gt-cfglayout.h $(GGC_H)
+ tsp.o: tsp.c $(CONFIG_H) $(SYSTEM_H) cfglayout.h $(RTL_H) $(BASIC_BLOCK_H) \
+ output.h coretypes.h $(TM_H)
timevar.o : timevar.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TIMEVAR_H) flags.h \
intl.h toplev.h
regrename.o : regrename.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
Index: bb-reorder.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/bb-reorder.c,v
retrieving revision 1.66
diff -c -3 -p -r1.66 bb-reorder.c
*** bb-reorder.c 20 Mar 2004 16:50:30 -0000 1.66
--- bb-reorder.c 31 Mar 2004 07:02:05 -0000
*************** static int exec_threshold[N_ROUNDS] = {5
*** 91,96 ****
--- 91,121 ----
block the edge destination is not duplicated while connecting traces. */
#define DUPLICATION_THRESHOLD 100
+ /* The parameters of the cost function for the tsp based reordering. These
+ can be redefined by targets as needed. The values here correspond simply
+ to minimising the total number of jumps executed, and a slight preference
+ for the taken branch to be the less frequent one, which is usually a good
+ thing to do. Making the value of HIT_TAKEN_COST and MISS_TAKEN_COST
+ different makes sense only on targets with static branch prediction. */
+
+ /* Cost of an unconditional jump. */
+ #define JUMP_COST 100
+
+ /* The cost of correctly predicted branch that is taken. */
+ #ifndef HIT_TAKEN_COST
+ #define HIT_TAKEN_COST 3
+ #endif
+
+ /* The cost of incorrectly predicted branch that is taken. */
+ #ifndef MISS_TAKEN_COST
+ #define MISS_TAKEN_COST 3
+ #endif
+
+ /* The cost of incorrectly predicted branch that is not taken. */
+ #ifndef MISS_FALLTHRU_COST
+ #define MISS_FALLTHRU_COST 0
+ #endif
+
/* Length of unconditional jump instruction. */
static int uncond_jump_length;
*************** static bool better_edge_p (basic_block,
*** 153,158 ****
--- 178,189 ----
static void connect_traces (int, struct trace *);
static bool copy_bb_p (basic_block, int);
static int get_uncond_jump_length (void);
+ static void reorder_using_tsp (void);
+ static int cost_of_uncond_jump (edge);
+ static int cost_of_cond_jump (edge);
+ static int cost_of_cond_branch (basic_block);
+ static int cost_b_after_a (basic_block, basic_block);
+ static void record_edges (basic_block, struct vertex *, int last);
/* Find the traces for Software Trace Cache. Chain each trace through
RBI()->next. Store the number of traces to N_TRACES and description of
*************** get_uncond_jump_length (void)
*** 1063,1068 ****
--- 1094,1345 ----
return length;
}
+ /* Returns a cost of unconditional jump corresponding to edge E. */
+
+ static int
+ cost_of_uncond_jump (edge e)
+ {
+ if (e->dest == EXIT_BLOCK_PTR)
+ return 0;
+
+ if (optimize_size)
+ return 1;
+
+ /* If the area is cold, do flat optimization for size, with low
+ priority. */
+ if (probably_cold_bb_p (e->src))
+ return JUMP_COST;
+
+ return JUMP_COST * e->src->frequency;
+ }
+
+ /* Returns a cost of conditional jump corresponding to edge E. */
+
+ static int
+ cost_of_cond_jump (edge e)
+ {
+ int cost = 0;
+
+ if (optimize_size
+ || probably_cold_bb_p (e->src))
+ return 0;
+
+ /* We assume the more probable branch is predicted. */
+ if (2 * e->probability > REG_BR_PROB_BASE)
+ {
+ /* E is predicted to be taken. */
+ cost += EDGE_FREQUENCY (e) * HIT_TAKEN_COST;
+ cost += (e->src->frequency - EDGE_FREQUENCY (e)) * MISS_FALLTHRU_COST;
+ }
+ else
+ {
+ /* E is predicted not to be taken. */
+ cost += EDGE_FREQUENCY (e) * MISS_TAKEN_COST;
+
+ /* HIT_FALLTHRU has no cost. */
+ }
+
+ return cost;
+ }
+
+ /* Return a cost of jumps from basic block BB to two locations not adjacent
+ to it. */
+ static int
+ cost_of_cond_branch (basic_block bb)
+ {
+ int more_freq, less_freq, tmp, cost = 0;
+
+ if (optimize_size)
+ return 1;
+
+ /* If the area is cold, do flat optimization for size, with low
+ priority. */
+ if (probably_cold_bb_p (bb))
+ return JUMP_COST;
+
+ /* It is assumed to be transformed into
+
+ bb -- helper
+ \ \
+ \ \
+ more less */
+
+ more_freq = EDGE_FREQUENCY (bb->succ);
+ less_freq = EDGE_FREQUENCY (bb->succ->succ_next);
+
+ if (less_freq > more_freq)
+ {
+ tmp = less_freq;
+ less_freq = more_freq;
+ more_freq = tmp;
+ }
+
+ cost += more_freq * HIT_TAKEN_COST;
+ cost += less_freq * (MISS_FALLTHRU_COST + JUMP_COST);
+
+ return cost;
+ }
+
+ /* Returns cost for placing B immediately after A. */
+ static int
+ cost_b_after_a (basic_block a, basic_block b)
+ {
+ if (!a->succ)
+ return 0;
+
+ if (!a->succ->succ_next)
+ return a->succ->dest == b ? 0 : cost_of_uncond_jump (a->succ);
+
+ if (!any_condjump_p (BB_END (a)))
+ return 0;
+
+ if (a->succ->succ_next->succ_next)
+ abort ();
+
+ if (a->succ->dest == b)
+ return cost_of_cond_jump (a->succ->succ_next);
+
+ if (a->succ->succ_next->dest == b)
+ return cost_of_cond_jump (a->succ);
+
+ return cost_of_cond_branch (a);
+ }
+
+ /* Record costs for edges coming out of FROM to VERTEX; LAST is true
+ if it is the tail of the considered segment. */
+ static void
+ record_edges (basic_block from, struct vertex *vertex, int last)
+ {
+ edge e;
+ basic_block tgt;
+
+ vertex->n_spec = 0;
+
+ if (last
+ || !from->succ
+ || (from->succ->succ_next
+ && !any_condjump_p (BB_END (from))))
+ {
+ vertex->def_cost = 0;
+ return;
+ }
+
+ for (e = from->succ; e; e = e->succ_next)
+ {
+ tgt = e->dest;
+ if (tgt == EXIT_BLOCK_PTR)
+ {
+ if (vertex->n_spec != 0)
+ abort ();
+ vertex->def_cost = 0;
+ return;
+ }
+
+ /* We indeed want <= here, as edges to the head of the current segment
+ are irrelevant. */
+ if (tgt->rbi->visited <= 0)
+ continue;
+
+ if (tgt == from)
+ continue;
+
+ vertex->spec_tgt[vertex->n_spec] = tgt->rbi->visited;
+ vertex->spec_cost[vertex->n_spec] = cost_b_after_a (from, tgt);
+ vertex->n_spec++;
+ }
+ if (vertex->n_spec > 2)
+ abort ();
+
+ vertex->def_cost = (from->succ->succ_next
+ ? cost_of_cond_branch (from)
+ : cost_of_uncond_jump (from->succ));
+ }
+
+ /* Reorder blocks using tsp solver. */
+
+ static void
+ reorder_using_tsp (void)
+ {
+ basic_block block[MAX_TSP_SIZE + 2];
+ struct vertex graph[MAX_TSP_SIZE + 2];
+ int tour[MAX_TSP_SIZE + 2];
+ basic_block start, old_start;
+ int n, i, a;
+ basic_block bb;
+
+ if (n_basic_blocks <= 1)
+ return;
+
+ FOR_EACH_BB (bb)
+ {
+ bb->rbi->visited = -1;
+ }
+
+ #define NEXT_BLOCK(BB) \
+ ((BB) == EXIT_BLOCK_PTR \
+ ? NULL \
+ : ((BB)->rbi->next == NULL \
+ ? EXIT_BLOCK_PTR \
+ : (BB)->rbi->next))
+ start = ENTRY_BLOCK_PTR->next_bb;
+ while (1)
+ {
+ old_start = start;
+
+ if (dump_file)
+ fprintf (dump_file, "Old order:");
+ for (n = 0;
+ start && n < MAX_TSP_SIZE + 2;
+ n++, start = NEXT_BLOCK (start))
+ {
+ block[n] = start;
+ start->rbi->visited = n;
+ if (dump_file)
+ fprintf (dump_file, " %d", start->index);
+ }
+ if (dump_file)
+ fprintf (dump_file, "\n");
+
+ for (i = 0; i < n; i++)
+ {
+ record_edges (block[i], graph + i, i == n - 1);
+ tour[i] = i + 1;
+ }
+ solve_tsp (n, tour, graph, optimize >= 3);
+
+ if (block[n - 1] == EXIT_BLOCK_PTR)
+ block[n - 1] = NULL;
+
+ if (dump_file)
+ fprintf (dump_file, "New order:");
+ for (i = 0, a = 0; i < n - 1; i++, a = tour[a])
+ {
+ block[a]->rbi->visited = -1;
+ block[a]->rbi->next = block[tour[a]];
+
+ if (dump_file)
+ fprintf (dump_file, " %d", block[a]->index);
+ }
+ if (block[a])
+ {
+ block[a]->rbi->visited = -1;
+ if (dump_file)
+ fprintf (dump_file, " %d\n", block[a]->index);
+ }
+ else if (dump_file)
+ fprintf (dump_file, " EXIT\n");
+
+ if (!start)
+ break;
+
+ /* Ensure some overlap between the instances. */
+ start = old_start;
+ for (; n > 0; n -= 2)
+ start = NEXT_BLOCK (start);
+ }
+ #undef NEXT_BLOCK
+ }
+
/* Reorder basic blocks. The main entry point to this file. */
void
*************** reorder_basic_blocks (void)
*** 1110,1115 ****
--- 1387,1395 ----
if (dump_file)
dump_flow_info (dump_file);
+
+ if (flag_tsp_ordering)
+ reorder_using_tsp ();
cfg_layout_finalize ();
Index: cfglayout.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfglayout.h,v
retrieving revision 1.10
diff -c -3 -p -r1.10 cfglayout.h
*** cfglayout.h 30 Dec 2003 10:40:51 -0000 1.10
--- cfglayout.h 31 Mar 2004 07:02:05 -0000
*************** typedef struct reorder_block_def
*** 35,40 ****
--- 35,49 ----
extern rtx cfg_layout_function_footer;
+ /* A type for vertices in tsp solver. */
+ struct vertex
+ {
+ int n_spec; /* Number of special edges. */
+ int spec_tgt[2]; /* Special edge targets. */
+ int spec_cost[2]; /* Special edge costs. */
+ int def_cost; /* Cost of remaining edges. */
+ };
+
extern void cfg_layout_initialize (void);
extern void cfg_layout_finalize (void);
extern bool cfg_layout_can_duplicate_bb_p (basic_block);
*************** extern bool can_copy_bbs_p (basic_block
*** 45,47 ****
--- 54,62 ----
extern void copy_bbs (basic_block *, unsigned, basic_block *,
edge *, unsigned, edge *, struct loop *);
extern void cfg_layout_initialize_rbi (basic_block);
+
+ /* Maximum number of vertices of tsp instance solved (in fact 2 less, as source
+ and target are not included). */
+ #define MAX_TSP_SIZE 200
+
+ extern void solve_tsp (int, int *, struct vertex *, int);
Index: common.opt
===================================================================
RCS file: /cvs/gcc/gcc/gcc/common.opt,v
retrieving revision 1.30
diff -c -3 -p -r1.30 common.opt
*** common.opt 10 Mar 2004 06:02:50 -0000 1.30
--- common.opt 31 Mar 2004 07:02:05 -0000
*************** ftrapv
*** 704,709 ****
--- 704,713 ----
Common
Trap for signed overflow in addition, subtraction and multiplication
+ ftsp-ordering
+ Common
+ Reorder basic blocks to improve code placement using reduction to tsp
+
funit-at-a-time
Common
Compile whole compilation unit at a time
Index: flags.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/flags.h,v
retrieving revision 1.134
diff -c -3 -p -r1.134 flags.h
*** flags.h 10 Mar 2004 06:02:51 -0000 1.134
--- flags.h 31 Mar 2004 07:02:05 -0000
*************** extern int flag_branch_probabilities;
*** 210,215 ****
--- 210,219 ----
extern int flag_reorder_blocks;
+ /* Nonzero if blocks should be reordered using tsp. */
+
+ extern int flag_tsp_ordering;
+
/* Nonzero if functions should be reordered. */
extern int flag_reorder_functions;
Index: opts.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/opts.c,v
retrieving revision 1.62
diff -c -3 -p -r1.62 opts.c
*** opts.c 14 Mar 2004 22:26:06 -0000 1.62
--- opts.c 31 Mar 2004 07:02:05 -0000
*************** decode_options (unsigned int argc, const
*** 570,575 ****
--- 570,576 ----
if (optimize >= 3)
{
+ flag_tsp_ordering = 1;
flag_inline_functions = 1;
flag_rename_registers = 1;
flag_unswitch_loops = 1;
*************** common_handle_option (size_t scode, cons
*** 1418,1423 ****
--- 1419,1428 ----
case OPT_ftrapv:
flag_trapv = value;
+ break;
+
+ case OPT_ftsp_ordering:
+ flag_tsp_ordering = value;
break;
case OPT_funit_at_a_time:
Index: toplev.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/toplev.c,v
retrieving revision 1.889
diff -c -3 -p -r1.889 toplev.c
*** toplev.c 29 Mar 2004 23:00:27 -0000 1.889
--- toplev.c 31 Mar 2004 07:02:05 -0000
*************** int flag_branch_probabilities = 0;
*** 249,254 ****
--- 249,258 ----
int flag_reorder_blocks = 0;
+ /* Nonzero if basic blocks should be reordered using tsp. */
+
+ int flag_tsp_ordering = 0;
+
/* Nonzero if functions should be reordered. */
int flag_reorder_functions = 0;
*************** process_options (void)
*** 2276,2281 ****
--- 2280,2288 ----
if (flag_value_profile_transformations)
flag_profile_values = 1;
+
+ if (flag_tsp_ordering)
+ flag_reorder_blocks = 1;
/* Warn about options that are not supported on this machine. */
#ifndef INSN_SCHEDULING
Index: tsp.c
===================================================================
RCS file: tsp.c
diff -N tsp.c
*** /dev/null 1 Jan 1970 00:00:00 -0000
--- tsp.c 31 Mar 2004 07:02:05 -0000
***************
*** 0 ****
--- 1,540 ----
+ /* A simple aproximate ATSP solver.
+ Copyright (C) 2004 Free Software Foundation, Inc.
+
+ This file is part of GCC.
+
+ GCC is free software; you can redistribute it and/or modify it under
+ the terms of the GNU General Public License as published by the Free
+ Software Foundation; either version 2, or (at your option) any later
+ version.
+
+ GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+ WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with GCC; see the file COPYING. If not, write to the Free
+ Software Foundation, 59 Temple Place - Suite 330, Boston, MA
+ 02111-1307, USA. */
+
+ #include "config.h"
+ #include "system.h"
+ #include "coretypes.h"
+ #include "tm.h"
+ #include "rtl.h"
+ #include "basic-block.h"
+ #include "cfglayout.h"
+ #include "output.h"
+
+ /* A simple assymetric tsp solver. It uses the following heuristic (referred
+ as OPT-3 + HyperOPT in literature):
+
+ (Tour is made so that it always starts in 0 and ends in n - 1).
+
+ All improving 3-swaps (reordering of tour ABCD into ACBD, where A, B,
+ C and D are parts of the tour) are performed. To decrease the amount
+ of work done, we use a small observation -- there are at most two cheap
+ edges coming from each vertex (all other edges out of the vertex
+ have the same cost) and at least one such edge must obviously be
+ involved in the swap; moreover, it must be one of the newly added edges.
+
+ For every two non-overlapping segments of length 3, relink them in
+ an optimal manner. */
+
+ #define HOPT_H 3
+ #define HOPT_H_NORDS 6
+ #define HOPT_H_VERT (2 * HOPT_H - 2)
+ #define ORDS_PER_EDGE (HOPT_H_NORDS / (HOPT_H_VERT - 1))
+
+ /* All orders HyperOPT heuristics tries. */
+ static int hopt_h_ords[HOPT_H_NORDS][HOPT_H_VERT] =
+ {
+ {0, 1, 2, 3},
+ {0, 1, 3, 2},
+ {0, 2, 1, 3},
+ {0, 2, 3, 1},
+ {0, 3, 1, 2},
+ {0, 3, 2, 1}
+ };
+
+ /* For each edge indices of sums in the HyperOPT heuristics to that the
+ weight of the edge should be added. */
+ static int hopt_h_sums[HOPT_H_VERT][HOPT_H_VERT][ORDS_PER_EDGE] =
+ {
+ { /* (0, *) */
+ {HOPT_H_NORDS, HOPT_H_NORDS}, /* (0, 0) */
+ {0, 1}, /* (0, 1) */
+ {2, 3}, /* (0, 2) */
+ {4, 5} /* (0, 3) */
+ },
+ { /* (1, *) */
+ {3, 5}, /* (1, 0) */
+ {HOPT_H_NORDS, HOPT_H_NORDS}, /* (1, 1) */
+ {0, 4}, /* (1, 2) */
+ {1, 2} /* (1, 3) */
+ },
+ { /* (2, *) */
+ {1, 4}, /* (2, 0) */
+ {2, 5}, /* (2, 1) */
+ {HOPT_H_NORDS, HOPT_H_NORDS}, /* (2, 2) */
+ {0, 3} /* (2, 3) */
+ },
+ { /* (3, *) */
+ {0, 2}, /* (3, 0) */
+ {3, 4}, /* (3, 1) */
+ {1, 5}, /* (3, 2) */
+ {HOPT_H_NORDS, HOPT_H_NORDS} /* (3, 3) */
+ }
+ };
+
+ /* Weights of edges are cached here. */
+
+ static int graph[MAX_TSP_SIZE + 2][MAX_TSP_SIZE + 2];
+
+ /* Returns weight of edge A --> B. */
+
+ #define edge_weight(A, B) graph[A][B]
+
+ static void optimize_tsp (int, int *, struct vertex *, int);
+ static int tsp_value (int, int *);
+ static int three_swap_delta (int *, int, int, int);
+ void dump_tour (FILE *, int *, int);
+ static int hopt_delta (int [], int [], int [], sbitmap, int *, struct vertex *);
+ static void best_three_swap (int, int, int *, int *, int *, int *, int, int *);
+ static void hopt_perform (int *, int, int, int);
+
+ /* Solves a tsp instance of size N. WEIGHTS is a matrix of weights of
+ edges. Result is retunrned in TOUR; the initial tour is also obtained
+ from there. */
+
+ void
+ solve_tsp (int n, int *tour, struct vertex *weights, int iterate)
+ {
+ int tspvalue;
+ int i, j;
+
+ if (dump_file)
+ fprintf (dump_file, "TSP:\n");
+
+ for (i = 0; i < n; i++)
+ {
+ for (j = 0; j < n; j++)
+ graph[i][j] = weights[i].def_cost;
+ for (j = 0; j < weights[i].n_spec; j++)
+ graph[i][weights[i].spec_tgt[j]] = weights[i].spec_cost[j];
+ }
+
+ tspvalue = tsp_value (n, tour);
+
+ if (dump_file)
+ {
+ dump_tour (dump_file, tour, n);
+ fprintf (dump_file, "%d\n", tspvalue);
+ }
+
+ if (tspvalue)
+ optimize_tsp (n, tour, weights, iterate);
+
+ if (dump_file)
+ {
+ dump_tour (dump_file, tour, n);
+ fprintf (dump_file, "FINAL: %d\n", tsp_value (n, tour));
+ }
+ }
+
+ /* Dumps TOUR of length N to FILE. */
+
+ void
+ dump_tour (FILE *file, int *tour, int n)
+ {
+ int i, a;
+
+ for (i = 0, a = 0; i < n - 1; i++, a = tour[a])
+ fprintf (file, "%d ", a);
+ fprintf (file, "%d\n", a);
+ }
+
+ /* Return value of a tsp TOUR on N vertices. */
+
+ static int
+ tsp_value (int n, int *tour)
+ {
+ int i, act, sum = 0;
+
+ act = 0;
+ for (i = 0; i < n - 1; i++, act = tour[act])
+ sum += edge_weight (act, tour[act]);
+
+ return sum;
+ }
+
+ /* Returns gain of performing a 3-swap on TOUR in graph WEIGHTS at
+ positions I1, I2 and I3. */
+
+ static int
+ three_swap_delta (int *tour, int i1, int i2, int i3)
+ {
+ int old, new;
+
+ old = (edge_weight (i1, tour[i1])
+ + edge_weight (i2, tour[i2])
+ + edge_weight (i3, tour[i3]));
+
+ new = (edge_weight (i1, tour[i2])
+ + edge_weight (i2, tour[i3])
+ + edge_weight (i3, tour[i1]));
+
+ return new - old;
+ }
+
+ /* Returns gain of performing a HyperOPT optimization on segments stored in V1
+ and V2. Index of the best permutation is returned in BEST_PERM. WEIGHTS
+ is the graph in that we work. V2S is a bitmap of members of V2. TOUR is
+ the current tour. */
+
+ static int
+ hopt_delta (int tour[], int v1[], int v2[], sbitmap v2s, int *best_perm,
+ struct vertex *weights)
+ {
+ int i, j, old, best, bi, a, av, ew;
+ int sums[HOPT_H_NORDS + 1];
+
+ for (i = 0; i < HOPT_H_VERT; i++)
+ for (j = 0; j < weights[v1[i]].n_spec; j++)
+ {
+ a = weights[v1[i]].spec_tgt[j];
+ if (a != tour[v1[i]] && TEST_BIT (v2s, a))
+ goto cont;
+ }
+ return 0;
+
+ cont: ;
+
+ /* Loop is manually unrolled here based on assumption that noone will ever
+ increase the value of HOPT_H. Just make sure that if someone insane
+ enough does it, he notices this place. */
+ #if (ORDS_PER_EDGE != 2)
+ abort ();
+ #endif
+
+ memset (sums, 0, sizeof (sums));
+ for (i = 0; i < HOPT_H_VERT; i++)
+ {
+ av = v1[i];
+ for (j = 0; j < HOPT_H_VERT; j++)
+ {
+ ew = edge_weight (av, v2[j]);
+ sums[hopt_h_sums[i][j][0]] += ew;
+ sums[hopt_h_sums[i][j][1]] += ew;
+ }
+ }
+
+ bi = 0;
+ old = best = sums[0];
+ for (i = 1; i < HOPT_H_NORDS; i++)
+ {
+ a = sums[i];
+
+ if (a >= best)
+ continue;
+
+ best = a;
+ bi = i;
+ }
+
+ *best_perm = bi;
+ return best - old;
+ }
+
+ /* Perform PERMth HyperOPT swap on TOUR at positions I1 and I2. */
+
+ static void
+ hopt_perform (int *tour, int i1, int i2, int perm)
+ {
+ int i, j, a;
+ int v1[HOPT_H_VERT], v2[HOPT_H_VERT];
+
+ for (a = i1, i = 0; i < HOPT_H - 1; i++, a = tour[a])
+ v1[i] = a;
+ for (a = i2; i < HOPT_H_VERT; i++, a = tour[a])
+ v1[i] = a;
+
+ for (a = tour[i1], i = 1; i < HOPT_H; i++, a = tour[a])
+ v2[i] = a;
+ for (a = tour[i2]; i < HOPT_H_VERT; i++, a = tour[a])
+ v2[i] = a;
+ v2[0] = a;
+
+ a = hopt_h_ords[perm][0];
+ for (i = 1; i < HOPT_H_VERT; i++, a = j)
+ {
+ j = hopt_h_ords[perm][i];
+ tour[v1[a]] = v2[j];
+ }
+ tour[v1[a]] = v2[hopt_h_ords[perm][0]];
+ }
+
+ /* Find the best 3-swap that introduces I1 --> I2 into TOUR in graph WEIGHTS
+ on N vertices. Endpoints of the segments are returned in BI1, BI2 and BI3,
+ value in BEST_3OPT. */
+
+ static void
+ best_three_swap (int i1, int i2, int *bi1, int *bi2, int *bi3, int *best_3opt,
+ int n, int *tour)
+ {
+ int a, seen_i1 = false, m, v;
+
+ if (tour[i1] == i2)
+ return;
+
+ for (a = 0; tour[a] != i2; a = tour[a])
+ if (a == i1)
+ seen_i1 = true;
+
+ if (!seen_i1)
+ {
+ /* Backward edge. The middle region must be between i1 and i2. */
+ for (m = i2; m != i1; m = tour[m])
+ {
+ v = three_swap_delta (tour, a, m, i1);
+ if (v >= *best_3opt)
+ continue;
+
+ *bi1 = a;
+ *bi2 = m;
+ *bi3 = i1;
+ *best_3opt = v;
+ }
+ }
+ else
+ {
+ /* Either this is the arc from A to C... */
+ for (m = i2; tour[m] != n; m = tour[m])
+ {
+ v = three_swap_delta (tour, i1, a, m);
+ if (v >= *best_3opt)
+ continue;
+
+ *bi1 = i1;
+ *bi2 = a;
+ *bi3 = m;
+ *best_3opt = v;
+ }
+
+ /* ...or from B to D. */
+ for (m = 0; m != i1; m = tour[m])
+ {
+ v = three_swap_delta (tour, m, i1, a);
+ if (v >= *best_3opt)
+ continue;
+
+ *bi1 = m;
+ *bi2 = i1;
+ *bi3 = a;
+ *best_3opt = v;
+ }
+ }
+ }
+
+ /* Improves TOUR using the heuristics. N is number of vertices, the graph is
+ stored in WEIGHTS. ITERATE is true if we should iterate until convergence
+ rather than making just one pass. */
+
+ static void
+ optimize_tsp (int n, int *tour, struct vertex *weights, int iterate)
+ {
+ int i1, i2, best_3opt, bi1, bi2, bi3, a, best_hopt, hi2, hperm, aperm, f, i;
+ int changed = true;
+ int temp;
+ int value = 0, new_value;
+ int work[MAX_TSP_SIZE + 2];
+ int wbot, wtop;
+ sbitmap in_work, v2s;
+ int v1[HOPT_H_VERT], v2[HOPT_H_VERT];
+
+ #ifndef ENABLE_CHECKING
+ if (dump_file)
+ {
+ #endif
+ value = tsp_value (n, tour);
+ #ifndef ENABLE_CHECKING
+ }
+ #endif
+
+ in_work = sbitmap_alloc (n);
+ v2s = sbitmap_alloc (n);
+ sbitmap_zero (v2s);
+
+ #define PUT(X) \
+ do \
+ { \
+ if (!TEST_BIT (in_work, (X))) \
+ { \
+ SET_BIT (in_work, (X)); \
+ work[wtop++] = (X); \
+ if (wtop == MAX_TSP_SIZE + 2) \
+ wtop = 0; \
+ } \
+ } \
+ while (0)
+
+ while (changed)
+ {
+ changed = false;
+
+ for (i1 = 0; i1 < n - 1; i1++)
+ work[i1] = i1;
+ sbitmap_ones (in_work);
+ wbot = 0;
+ wtop = n - 1;
+
+ while (wbot != wtop)
+ {
+ i1 = work[wbot++];
+ if (wbot == MAX_TSP_SIZE + 2)
+ wbot = 0;
+ RESET_BIT (in_work, i1);
+
+ /* Find a best 3-OPT move starting with I1. */
+ best_3opt = 0;
+ bi1 = bi2 = bi3 = 0;
+
+ for (a = 0; a < weights[i1].n_spec; a++)
+ {
+ i2 = weights[i1].spec_tgt[a];
+
+ best_three_swap (i1, i2, &bi1, &bi2, &bi3, &best_3opt,
+ n, tour);
+ }
+
+ /* Find a best HyperOPT move starting with I1. */
+ best_hopt = 0;
+ hperm = 0;
+ hi2 = 0;
+ for (i2 = i1, a = 0; i2 != n && a < HOPT_H - 1; a++)
+ i2 = tour[i2];
+ for (f = i2, a = 0; f != n && a < HOPT_H - 1; a++)
+ f = tour[f];
+
+ if (f != n)
+ {
+ for (a = i1, i = 0; i < HOPT_H - 1; i++, a = tour[a])
+ v1[i] = a;
+ for (a = i2; i < HOPT_H_VERT; i++, a = tour[a])
+ v1[i] = a;
+
+ for (a = tour[i1], i = 1; i < HOPT_H; i++, a = tour[a])
+ {
+ v2[i] = a;
+ SET_BIT (v2s, a);
+ }
+ for (a = tour[i2]; i < HOPT_H_VERT; i++, a = tour[a])
+ {
+ v2[i] = a;
+ SET_BIT (v2s, a);
+ }
+ v2[0] = a;
+ SET_BIT (v2s, a);
+
+ while (1)
+ {
+ a = hopt_delta (tour, v1, v2, v2s, &aperm, weights);
+ if (a < best_hopt)
+ {
+ hi2 = i2;
+ best_hopt = a;
+ hperm = aperm;
+ }
+
+ i2 = tour[i2];
+ f = tour[f];
+ if (f == n)
+ break;
+
+ RESET_BIT (v2s, i2);
+ SET_BIT (v2s, f);
+
+ for (a = i2, i = HOPT_H - 1; i < HOPT_H_VERT; i++, a = tour[a])
+ v1[i] = a;
+ for (a = tour[i2], i = HOPT_H; i < HOPT_H_VERT; i++, a = tour[a])
+ v2[i] = a;
+ v2[0] = a;
+ }
+
+ for (i = 0; i < HOPT_H_VERT; i++)
+ RESET_BIT (v2s, v2[i]);
+ }
+
+ if (best_3opt < best_hopt)
+ {
+ temp = tour[bi1];
+ tour[bi1] = tour[bi2];
+ tour[bi2] = tour[bi3];
+ tour[bi3] = temp;
+
+ #ifndef ENABLE_CHECKING
+ if (dump_file)
+ {
+ #endif
+ new_value = value + best_3opt;
+ value = tsp_value (n, tour);
+ #ifndef ENABLE_CHECKING
+ }
+ #endif
+
+ PUT (bi1);
+ PUT (bi2);
+ PUT (bi3);
+
+ if (dump_file)
+ {
+ fprintf (dump_file,
+ "3 swap %d %d %d -- new tour value %d\n",
+ bi1, bi2, bi3, value);
+ dump_tour (dump_file, tour, n);
+ }
+ }
+ else if (best_hopt < 0)
+ {
+ for (a = i1, i2 = 0; i2 < HOPT_H; i2++, a = tour[a])
+ PUT (a);
+ for (a = hi2, i2 = 0; i2 < HOPT_H; i2++, a = tour[a])
+ PUT (a);
+
+ hopt_perform (tour, i1, hi2, hperm);
+
+ #ifndef ENABLE_CHECKING
+ if (dump_file)
+ {
+ #endif
+ new_value = value + best_hopt;
+ value = tsp_value (n, tour);
+ #ifndef ENABLE_CHECKING
+ }
+ #endif
+
+ if (dump_file)
+ {
+ fprintf (dump_file,
+ "hopt %d %d -- new tour value %d\n", i1, hi2, value);
+ dump_tour (dump_file, tour, n);
+ }
+ }
+ else
+ continue;
+
+ #ifdef ENABLE_CECKING
+ if (value != new_value)
+ abort ();
+ #endif
+ changed = true;
+ }
+
+ if (!iterate)
+ break;
+ }
+
+ sbitmap_free (in_work);
+ sbitmap_free (v2s);
+ }
Index: doc/invoke.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/invoke.texi,v
retrieving revision 1.438
diff -c -3 -p -r1.438 invoke.texi
*** doc/invoke.texi 25 Mar 2004 17:43:24 -0000 1.438
--- doc/invoke.texi 31 Mar 2004 07:02:05 -0000
*************** in the following sections.
*** 284,290 ****
-foptimize-sibling-calls -fprefetch-loop-arrays @gol
-fprofile-generate -fprofile-use @gol
-freduce-all-givs -fregmove -frename-registers @gol
! -freorder-blocks -freorder-functions @gol
-frerun-cse-after-loop -frerun-loop-opt @gol
-frounding-math -fschedule-insns -fschedule-insns2 @gol
-fno-sched-interblock -fno-sched-spec -fsched-spec-load @gol
--- 284,290 ----
-foptimize-sibling-calls -fprefetch-loop-arrays @gol
-fprofile-generate -fprofile-use @gol
-freduce-all-givs -fregmove -frename-registers @gol
! -freorder-blocks -ftsp-ordering -freorder-functions @gol
-frerun-cse-after-loop -frerun-loop-opt @gol
-frounding-math -fschedule-insns -fschedule-insns2 @gol
-fno-sched-interblock -fno-sched-spec -fsched-spec-load @gol
*************** invoking @option{-O2} on programs that u
*** 3655,3661 ****
@opindex O3
Optimize yet more. @option{-O3} turns on all optimizations specified by
@option{-O2} and also turns on the @option{-finline-functions},
! @option{-fweb} and @option{-frename-registers} options.
@item -O0
@opindex O0
--- 3655,3661 ----
@opindex O3
Optimize yet more. @option{-O3} turns on all optimizations specified by
@option{-O2} and also turns on the @option{-finline-functions},
! @option{-ftsp-ordering}, @option{-fweb} and @option{-frename-registers} options.
@item -O0
@opindex O0
*************** Reorder basic blocks in the compiled fun
*** 4194,4199 ****
--- 4194,4208 ----
taken branches and improve code locality.
Enabled at levels @option{-O2}, @option{-O3}.
+
+ @item -ftsp-ordering
+ @opindex ftsp-ordering
+ In addition to heuristic used at @option{-freorder-blocks}, use also a more
+ powerful method based on the reduction to the Travelling Salesman Problem
+ to determine the good order of basic blocks in the function. This generally
+ produces smaller and faster programs, at the expense of longer compile times.
+
+ Enabled at level @option{-O3}.
@item -freorder-functions
@opindex freorder-functions
Index: doc/passes.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/passes.texi,v
retrieving revision 1.31
diff -c -3 -p -r1.31 passes.texi
*** doc/passes.texi 7 Feb 2004 14:14:54 -0000 1.31
--- doc/passes.texi 31 Mar 2004 07:02:05 -0000
*************** Basic block reordering. This pass imple
*** 499,507 ****
positioning. If profile information is not available, various types of
static analysis are performed to make the predictions normally coming
from the profile feedback (IE execution frequency, branch probability,
! etc). It is implemented in the file @file{bb-reorder.c}, and the
! various prediction routines are in @file{predict.c}.
!
@opindex dB
The option @option{-dB} causes a debugging dump of the RTL code after
this pass. This dump file's name is made by appending @samp{.bbro} to
--- 499,510 ----
positioning. If profile information is not available, various types of
static analysis are performed to make the predictions normally coming
from the profile feedback (IE execution frequency, branch probability,
! etc). To determine the ordering, an algorithm of Software Trace Cache
! is used. In addition, it is also possible to use the method based on
! reduction to the Travelling Salesman Problem. The core of the pass is
! implemented in the file @file{bb-reorder.c}, various prediction routines
! are in @file{predict.c} and the TSP solver is in @file{tsp.c}.
!
@opindex dB
The option @option{-dB} causes a debugging dump of the RTL code after
this pass. This dump file's name is made by appending @samp{.bbro} to