This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
[patch] TSP based bb reordering

From: Zdenek Dvorak <rakdver at atrey dot karlin dot mff dot cuni dot cz>
To: gcc-patches at gcc dot gnu dot org
Date: Wed, 31 Mar 2004 09:12:12 +0200
Subject: [patch] TSP based bb reordering
Hello,

this patch implements basic block reordering based on the reduction
to the traveling salesman problem as described in
www.research.att.com/~dsj/papers/PLDI.ps.  I have done the testing
on i686 (-march=athlon), so the original motivation (optimization for
the static branch prediction) was not useful, but using it just to
minimize the number of executed unconditional jumps turned out to
also have benefits (mostly on sizes of binaries, see test results
below).  The time of compilation is increased by about 3% when using
it.  Bootstrapped (with -ftsp-ordering) & regtested on i686.

Zdenek

Peak is with -ftsp-ordering.

Without profile feedback:

                   Base      Base      Base      Peak      Peak      Peak
   Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
   ------------  --------  --------  --------  --------  --------  --------
   164.gzip          1400   190       736    *     1400   188       743    *
   175.vpr           1400   379       370    *     1400   374       375    *
   176.gcc           1100        --          X     1100        --          X
   181.mcf           1800   733       246    *     1800   733       245    *
   186.crafty        1000   114       878    *     1000   114       874    *
   197.parser        1800   396       454    *     1800   398       453    *
   252.eon           1300        --          X     1300        --          X
   253.perlbmk       1800   199       906    *     1800   199       903    *
   254.gap           1100   178       618    *     1100   175       629    *
   255.vortex        1900   238       797    *     1900   240       791    *
   256.bzip2         1500   326       459    *     1500   328       458    *
   300.twolf         3000   781       384    *     3000   773       388    *

Sizes of binaries:

  53142 gzip_base.default-linux
  52822 gzip_peak.default-linux

 152806 vpr_base.default-linux
 151142 vpr_peak.default-linux

  24619 mcf_base.default-linux
  24619 mcf_peak.default-linux

 256204 crafty_base.default-linux
 254060 crafty_peak.default-linux

 120386 parser_base.default-linux
 118082 parser_peak.default-linux

 573441 perlbmk_base.default-linux
 564577 perlbmk_peak.default-linux

 477278 gap_base.default-linux
 470174 gap_peak.default-linux

 664688 vortex_base.default-linux
 653936 vortex_peak.default-linux

  46364 bzip2_base.default-linux
  46332 bzip2_peak.default-linux

 212799 twolf_base.default-linux
 209855 twolf_peak.default-linux

With profile feedback:

                   Base      Base      Base      Peak      Peak      Peak
   Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
   ------------  --------  --------  --------  --------  --------  --------
   164.gzip          1400   186       752    *     1400   187       751    *
   175.vpr           1400   379       370    *     1400   376       373    *
   176.gcc           1100        --          X     1100        --          X
   181.mcf           1800   732       246    *     1800   734       245    *
   186.crafty        1000   111       902    *     1000   111       899    *
   197.parser        1800   396       455    *     1800   394       456    *
   252.eon           1300        --          X     1300        --          X
   253.perlbmk       1800   189       951    *     1800   190       949    *
   254.gap           1100   174       633    *     1100   174       634    *
   255.vortex        1900   223       852    *     1900   214       886    *
   256.bzip2         1500   322       466    *     1500   323       464    *
   300.twolf         3000   786       382    *     3000   792       379    *

Sizes of binaries:

  52126 gzip_base.FDO
  51774 gzip_peak.FDO

 149358 vpr_base.FDO
 148846 vpr_peak.FDO

  24755 mcf_base.FDO
  24755 mcf_peak.FDO

 255444 crafty_base.FDO
 253364 crafty_peak.FDO

 117450 parser_base.FDO
 116426 parser_peak.FDO

 553193 perlbmk_base.FDO
 545737 perlbmk_peak.FDO

 457606 gap_base.FDO
 453158 gap_peak.FDO

 641336 vortex_base.FDO
 632024 vortex_peak.FDO

  45732 bzip2_base.FDO
  45316 bzip2_peak.FDO

 208839 twolf_base.FDO
 205255 twolf_peak.FDO

	* tsp.c: New file.
	* Makefile.in (tsp.o): New.
	* bb-reorder.c (JUMP_COST, HIT_TAKEN_COST, MISS_TAKEN_COST,
	MISS_FALLTHRU_COST): New.
	(reorder_using_tsp, cost_of_uncond_jump, cost_of_cond_jump,
	cost_of_cond_branch, cost_b_after_a, record_edges): New functions.
	(reorder_basic_blocks): Call reorder_using_tsp.
	* cfglayout.h (struct vertex): New.
	(MAX_TSP_SIZE): New.
	(solve_tsp): Declare.
	* common.opt (ftsp-ordering): New.
	* flags.h (flag_tsp_ordering): Declare.
	* opts.c (decode_options, common_handle_option): Handle
	flag_tsp_ordering.
	* toplev.c (flag_tsp_ordering): New.
	(process_options): Enable flag_reorder_blocks if flag_tsp_ordering is
	set.
	* doc/invoke.texi: Document -ftsp-ordering.
	* doc/passes.texi: Document tsp based bb reordering pass.

Index: Makefile.in
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Makefile.in,v
retrieving revision 1.1266
diff -c -3 -p -r1.1266 Makefile.in
*** Makefile.in	24 Mar 2004 18:03:15 -0000	1.1266
--- Makefile.in	31 Mar 2004 07:02:05 -0000
*************** C_OBJS = c-parse.o c-lang.o c-pretty-pri
*** 846,852 ****
  
  OBJS-common = \
   alias.o bb-reorder.o bitmap.o builtins.o caller-save.o calls.o	  	   \
!  cfg.o cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o cfgloop.o		   \
   cfgloopanal.o cfgloopmanip.o loop-init.o loop-unswitch.o loop-unroll.o	   \
   cfgrtl.o combine.o conflict.o convert.o coverage.o cse.o cselib.o 	   \
   dbxout.o debug.o df.o diagnostic.o dojump.o dominance.o loop-doloop.o	   \
--- 846,852 ----
  
  OBJS-common = \
   alias.o bb-reorder.o bitmap.o builtins.o caller-save.o calls.o	  	   \
!  cfg.o cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o tsp.o cfgloop.o	   \
   cfgloopanal.o cfgloopmanip.o loop-init.o loop-unswitch.o loop-unroll.o	   \
   cfgrtl.o combine.o conflict.o convert.o coverage.o cse.o cselib.o 	   \
   dbxout.o debug.o df.o diagnostic.o dojump.o dominance.o loop-doloop.o	   \
*************** tracer.o : tracer.c $(CONFIG_H) $(SYSTEM
*** 1849,1854 ****
--- 1849,1856 ----
  cfglayout.o : cfglayout.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
     $(RTL_H) $(TREE_H) insn-config.h $(BASIC_BLOCK_H) hard-reg-set.h output.h \
     function.h cfglayout.h cfgloop.h $(TARGET_H) gt-cfglayout.h $(GGC_H)
+ tsp.o: tsp.c $(CONFIG_H) $(SYSTEM_H) cfglayout.h $(RTL_H) $(BASIC_BLOCK_H) \
+    output.h coretypes.h $(TM_H)
  timevar.o : timevar.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TIMEVAR_H) flags.h \
     intl.h toplev.h
  regrename.o : regrename.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
Index: bb-reorder.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/bb-reorder.c,v
retrieving revision 1.66
diff -c -3 -p -r1.66 bb-reorder.c
*** bb-reorder.c	20 Mar 2004 16:50:30 -0000	1.66
--- bb-reorder.c	31 Mar 2004 07:02:05 -0000
*************** static int exec_threshold[N_ROUNDS] = {5
*** 91,96 ****
--- 91,121 ----
     block the edge destination is not duplicated while connecting traces.  */
  #define DUPLICATION_THRESHOLD 100
  
+ /* The parameters of the cost function for the tsp based reordering.  These
+    can be redefined by targets as needed.  The values here correspond simply
+    to minimising the total number of jumps executed, and a slight preference
+    for the taken branch to be the less frequent one, which is usually a good
+    thing to do.  Making the value of HIT_TAKEN_COST and MISS_TAKEN_COST
+    different makes sense only on targets with static branch prediction.  */
+ 
+ /* Cost of an unconditional jump.  */
+ #define JUMP_COST 100
+ 
+ /* The cost of correctly predicted branch that is taken.  */
+ #ifndef HIT_TAKEN_COST
+ #define HIT_TAKEN_COST 3
+ #endif
+ 
+ /* The cost of incorrectly predicted branch that is taken.  */
+ #ifndef MISS_TAKEN_COST
+ #define MISS_TAKEN_COST 3
+ #endif
+ 
+ /* The cost of incorrectly predicted branch that is not taken.  */
+ #ifndef MISS_FALLTHRU_COST
+ #define MISS_FALLTHRU_COST 0
+ #endif
+ 
  /* Length of unconditional jump instruction.  */
  static int uncond_jump_length;
  
*************** static bool better_edge_p (basic_block, 
*** 153,158 ****
--- 178,189 ----
  static void connect_traces (int, struct trace *);
  static bool copy_bb_p (basic_block, int);
  static int get_uncond_jump_length (void);
+ static void reorder_using_tsp (void);
+ static int cost_of_uncond_jump (edge);
+ static int cost_of_cond_jump (edge);
+ static int cost_of_cond_branch (basic_block);
+ static int cost_b_after_a (basic_block, basic_block);
+ static void record_edges (basic_block, struct vertex *, int last);
  
  /* Find the traces for Software Trace Cache.  Chain each trace through
     RBI()->next.  Store the number of traces to N_TRACES and description of
*************** get_uncond_jump_length (void)
*** 1063,1068 ****
--- 1094,1345 ----
    return length;
  }
  
+ /* Returns a cost of unconditional jump corresponding to edge E.  */
+ 
+ static int
+ cost_of_uncond_jump (edge e)
+ {
+   if (e->dest == EXIT_BLOCK_PTR)
+     return 0;
+ 
+   if (optimize_size)
+     return 1;
+ 
+   /* If the area is cold, do flat optimization for size, with low
+      priority.  */
+   if (probably_cold_bb_p (e->src))
+     return JUMP_COST;
+ 
+   return JUMP_COST * e->src->frequency;
+ }
+ 
+ /* Returns a cost of conditional jump corresponding to edge E.  */
+ 
+ static int
+ cost_of_cond_jump (edge e)
+ {
+   int cost = 0;
+ 
+   if (optimize_size
+       || probably_cold_bb_p (e->src))
+     return 0;
+ 
+   /* We assume the more probable branch is predicted.  */
+   if (2 * e->probability > REG_BR_PROB_BASE)
+     {
+       /* E is predicted to be taken.  */
+       cost += EDGE_FREQUENCY (e) * HIT_TAKEN_COST;
+       cost += (e->src->frequency - EDGE_FREQUENCY (e)) * MISS_FALLTHRU_COST;
+     }
+   else
+     {
+       /* E is predicted not to be taken.  */
+       cost += EDGE_FREQUENCY (e) * MISS_TAKEN_COST;
+ 
+       /* HIT_FALLTHRU has no cost.  */
+     }
+ 
+   return cost;
+ }
+ 
+ /* Return a cost of jumps from basic block BB to two locations not adjacent
+    to it.  */
+ static int
+ cost_of_cond_branch (basic_block bb)
+ {
+   int more_freq, less_freq, tmp, cost = 0;
+ 
+   if (optimize_size)
+     return 1;
+ 
+   /* If the area is cold, do flat optimization for size, with low
+      priority.  */
+   if (probably_cold_bb_p (bb))
+     return JUMP_COST;
+ 
+   /* It is assumed to be transformed into
+ 
+      bb  -- helper
+       \           \
+        \           \
+         more        less  */
+ 
+   more_freq = EDGE_FREQUENCY (bb->succ);
+   less_freq = EDGE_FREQUENCY (bb->succ->succ_next);
+ 
+   if (less_freq > more_freq)
+     {
+       tmp = less_freq;
+       less_freq = more_freq;
+       more_freq = tmp;
+     }
+ 
+   cost += more_freq * HIT_TAKEN_COST;
+   cost += less_freq * (MISS_FALLTHRU_COST + JUMP_COST);
+ 
+   return cost;
+ }
+ 
+ /* Returns cost for placing B immediately after A.  */
+ static int
+ cost_b_after_a (basic_block a, basic_block b)
+ {
+   if (!a->succ)
+     return 0;
+ 
+   if (!a->succ->succ_next)
+     return a->succ->dest == b ? 0 : cost_of_uncond_jump (a->succ);
+ 
+   if (!any_condjump_p (BB_END (a)))
+     return 0;
+ 
+   if (a->succ->succ_next->succ_next)
+     abort ();
+ 
+   if (a->succ->dest == b)
+     return cost_of_cond_jump (a->succ->succ_next);
+   
+   if (a->succ->succ_next->dest == b)
+     return cost_of_cond_jump (a->succ);
+ 
+   return cost_of_cond_branch (a);
+ }
+ 
+ /* Record costs for edges coming out of FROM to VERTEX; LAST is true
+    if it is the tail of the considered segment.  */
+ static void
+ record_edges (basic_block from, struct vertex *vertex, int last)
+ {
+   edge e;
+   basic_block tgt;
+ 
+   vertex->n_spec = 0;
+ 
+   if (last
+       || !from->succ
+       || (from->succ->succ_next
+ 	  && !any_condjump_p (BB_END (from))))
+     {
+       vertex->def_cost = 0;
+       return;
+     }
+ 
+   for (e = from->succ; e; e = e->succ_next)
+     {
+       tgt = e->dest;
+       if (tgt == EXIT_BLOCK_PTR)
+ 	{
+ 	  if (vertex->n_spec != 0)
+ 	    abort ();
+ 	  vertex->def_cost = 0;
+ 	  return;
+ 	}
+ 
+       /* We indeed want <= here, as edges to the head of the current segment
+ 	 are irrelevant.  */
+       if (tgt->rbi->visited <= 0)
+ 	continue;
+ 
+       if (tgt == from)
+ 	continue;
+ 
+       vertex->spec_tgt[vertex->n_spec] = tgt->rbi->visited;
+       vertex->spec_cost[vertex->n_spec] = cost_b_after_a (from, tgt);
+       vertex->n_spec++;
+     }
+   if (vertex->n_spec > 2)
+     abort ();
+ 
+   vertex->def_cost = (from->succ->succ_next
+ 		      ? cost_of_cond_branch (from)
+ 		      : cost_of_uncond_jump (from->succ));
+ }
+ 
+ /* Reorder blocks using tsp solver.  */
+ 
+ static void
+ reorder_using_tsp (void)
+ {
+   basic_block block[MAX_TSP_SIZE + 2];
+   struct vertex graph[MAX_TSP_SIZE + 2];
+   int tour[MAX_TSP_SIZE + 2];
+   basic_block start, old_start;
+   int n, i, a;
+   basic_block bb;
+ 
+   if (n_basic_blocks <= 1)
+     return;
+ 
+   FOR_EACH_BB (bb)
+     {
+       bb->rbi->visited = -1;
+     }
+ 
+ #define NEXT_BLOCK(BB)			\
+   ((BB) == EXIT_BLOCK_PTR		\
+    ? NULL				\
+    : ((BB)->rbi->next == NULL		\
+       ? EXIT_BLOCK_PTR			\
+       : (BB)->rbi->next))
+   start = ENTRY_BLOCK_PTR->next_bb;
+   while (1)
+     {
+       old_start = start;
+ 
+       if (dump_file)
+ 	fprintf (dump_file, "Old order:");
+       for (n = 0;
+ 	   start && n < MAX_TSP_SIZE + 2;
+ 	   n++, start = NEXT_BLOCK (start))
+ 	{
+ 	  block[n] = start;
+ 	  start->rbi->visited = n;
+ 	  if (dump_file)
+ 	    fprintf (dump_file, " %d", start->index);
+ 	}
+       if (dump_file)
+ 	fprintf (dump_file, "\n");
+ 
+       for (i = 0; i < n; i++)
+ 	{
+ 	  record_edges (block[i], graph + i, i == n - 1);
+ 	  tour[i] = i + 1;
+ 	}
+       solve_tsp (n, tour, graph, optimize >= 3);
+       
+       if (block[n - 1] == EXIT_BLOCK_PTR)
+ 	block[n - 1] = NULL;
+ 
+       if (dump_file)
+ 	fprintf (dump_file, "New order:");
+       for (i = 0, a = 0; i < n - 1; i++, a = tour[a])
+ 	{
+ 	  block[a]->rbi->visited = -1;
+ 	  block[a]->rbi->next = block[tour[a]];
+ 
+ 	  if (dump_file)
+ 	    fprintf (dump_file, " %d", block[a]->index);
+ 	}
+       if (block[a])
+ 	{
+ 	  block[a]->rbi->visited = -1;
+ 	  if (dump_file)
+ 	    fprintf (dump_file, " %d\n", block[a]->index);
+ 	}
+       else if (dump_file)
+ 	fprintf (dump_file, " EXIT\n");
+ 
+       if (!start)
+ 	break;
+ 
+       /* Ensure some overlap between the instances.  */
+       start = old_start;
+       for (; n > 0; n -= 2)
+ 	start = NEXT_BLOCK (start);
+     }
+ #undef NEXT_BLOCK
+ }
+ 
  /* Reorder basic blocks.  The main entry point to this file.  */
  
  void
*************** reorder_basic_blocks (void)
*** 1110,1115 ****
--- 1387,1395 ----
  
    if (dump_file)
      dump_flow_info (dump_file);
+ 
+   if (flag_tsp_ordering)
+     reorder_using_tsp ();
  
    cfg_layout_finalize ();
  
Index: cfglayout.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfglayout.h,v
retrieving revision 1.10
diff -c -3 -p -r1.10 cfglayout.h
*** cfglayout.h	30 Dec 2003 10:40:51 -0000	1.10
--- cfglayout.h	31 Mar 2004 07:02:05 -0000
*************** typedef struct reorder_block_def
*** 35,40 ****
--- 35,49 ----
  
  extern rtx cfg_layout_function_footer;
  
+ /* A type for vertices in tsp solver.  */
+ struct vertex
+ {
+   int n_spec;			/* Number of special edges.  */
+   int spec_tgt[2];		/* Special edge targets.  */
+   int spec_cost[2];		/* Special edge costs.  */
+   int def_cost;			/* Cost of remaining edges.  */
+ };
+ 
  extern void cfg_layout_initialize (void);
  extern void cfg_layout_finalize (void);
  extern bool cfg_layout_can_duplicate_bb_p (basic_block);
*************** extern bool can_copy_bbs_p (basic_block 
*** 45,47 ****
--- 54,62 ----
  extern void copy_bbs (basic_block *, unsigned, basic_block *,
  		      edge *, unsigned, edge *, struct loop *);
  extern void cfg_layout_initialize_rbi (basic_block);
+ 
+ /* Maximum number of vertices of tsp instance solved (in fact 2 less, as source
+    and target are not included).  */
+ #define MAX_TSP_SIZE 200
+ 
+ extern void solve_tsp (int, int *, struct vertex *, int);
Index: common.opt
===================================================================
RCS file: /cvs/gcc/gcc/gcc/common.opt,v
retrieving revision 1.30
diff -c -3 -p -r1.30 common.opt
*** common.opt	10 Mar 2004 06:02:50 -0000	1.30
--- common.opt	31 Mar 2004 07:02:05 -0000
*************** ftrapv
*** 704,709 ****
--- 704,713 ----
  Common
  Trap for signed overflow in addition, subtraction and multiplication
  
+ ftsp-ordering
+ Common
+ Reorder basic blocks to improve code placement using reduction to tsp
+ 
  funit-at-a-time
  Common
  Compile whole compilation unit at a time
Index: flags.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/flags.h,v
retrieving revision 1.134
diff -c -3 -p -r1.134 flags.h
*** flags.h	10 Mar 2004 06:02:51 -0000	1.134
--- flags.h	31 Mar 2004 07:02:05 -0000
*************** extern int flag_branch_probabilities;
*** 210,215 ****
--- 210,219 ----
  
  extern int flag_reorder_blocks;
  
+ /* Nonzero if blocks should be reordered using tsp.  */
+ 
+ extern int flag_tsp_ordering;
+ 
  /* Nonzero if functions should be reordered.  */
  
  extern int flag_reorder_functions;
Index: opts.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/opts.c,v
retrieving revision 1.62
diff -c -3 -p -r1.62 opts.c
*** opts.c	14 Mar 2004 22:26:06 -0000	1.62
--- opts.c	31 Mar 2004 07:02:05 -0000
*************** decode_options (unsigned int argc, const
*** 570,575 ****
--- 570,576 ----
  
    if (optimize >= 3)
      {
+       flag_tsp_ordering = 1;
        flag_inline_functions = 1;
        flag_rename_registers = 1;
        flag_unswitch_loops = 1;
*************** common_handle_option (size_t scode, cons
*** 1418,1423 ****
--- 1419,1428 ----
  
      case OPT_ftrapv:
        flag_trapv = value;
+       break;
+ 
+     case OPT_ftsp_ordering:
+       flag_tsp_ordering = value;
        break;
  
      case OPT_funit_at_a_time:
Index: toplev.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/toplev.c,v
retrieving revision 1.889
diff -c -3 -p -r1.889 toplev.c
*** toplev.c	29 Mar 2004 23:00:27 -0000	1.889
--- toplev.c	31 Mar 2004 07:02:05 -0000
*************** int flag_branch_probabilities = 0;
*** 249,254 ****
--- 249,258 ----
  
  int flag_reorder_blocks = 0;
  
+ /* Nonzero if basic blocks should be reordered using tsp.  */
+ 
+ int flag_tsp_ordering = 0;
+ 
  /* Nonzero if functions should be reordered.  */
  
  int flag_reorder_functions = 0;
*************** process_options (void)
*** 2276,2281 ****
--- 2280,2288 ----
  
    if (flag_value_profile_transformations)
      flag_profile_values = 1;
+ 
+   if (flag_tsp_ordering)
+     flag_reorder_blocks = 1;
  
    /* Warn about options that are not supported on this machine.  */
  #ifndef INSN_SCHEDULING
Index: tsp.c
===================================================================
RCS file: tsp.c
diff -N tsp.c
*** /dev/null	1 Jan 1970 00:00:00 -0000
--- tsp.c	31 Mar 2004 07:02:05 -0000
***************
*** 0 ****
--- 1,540 ----
+ /* A simple aproximate ATSP solver.
+    Copyright (C) 2004 Free Software Foundation, Inc.
+ 
+ This file is part of GCC.
+ 
+ GCC is free software; you can redistribute it and/or modify it under
+ the terms of the GNU General Public License as published by the Free
+ Software Foundation; either version 2, or (at your option) any later
+ version.
+ 
+ GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+ WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ for more details.
+ 
+ You should have received a copy of the GNU General Public License
+ along with GCC; see the file COPYING.  If not, write to the Free
+ Software Foundation, 59 Temple Place - Suite 330, Boston, MA
+ 02111-1307, USA.  */
+ 
+ #include "config.h"
+ #include "system.h"
+ #include "coretypes.h"
+ #include "tm.h"
+ #include "rtl.h"
+ #include "basic-block.h"
+ #include "cfglayout.h"
+ #include "output.h"
+ 
+ /* A simple assymetric tsp solver.  It uses the following heuristic (referred
+    as OPT-3 + HyperOPT in literature):
+ 
+    (Tour is made so that it always starts in 0 and ends in n - 1).
+    
+    All improving 3-swaps (reordering of tour ABCD into ACBD, where A, B,
+       C and D are parts of the tour) are performed.  To decrease the amount
+       of work done, we use a small observation -- there are at most two cheap
+       edges coming from each vertex (all other edges out of the vertex
+       have the same cost) and at least one such edge must obviously be
+       involved in the swap; moreover, it must be one of the newly added edges.
+    
+    For every two non-overlapping segments of length 3, relink them in
+    an optimal manner.  */
+ 
+ #define HOPT_H 3
+ #define HOPT_H_NORDS 6
+ #define HOPT_H_VERT (2 * HOPT_H - 2)
+ #define ORDS_PER_EDGE (HOPT_H_NORDS / (HOPT_H_VERT - 1))
+ 
+ /* All orders HyperOPT heuristics tries.  */
+ static int hopt_h_ords[HOPT_H_NORDS][HOPT_H_VERT] =
+ {
+   {0, 1, 2, 3},
+   {0, 1, 3, 2},
+   {0, 2, 1, 3},
+   {0, 2, 3, 1},
+   {0, 3, 1, 2},
+   {0, 3, 2, 1}
+ };
+ 
+ /* For each edge indices of sums in the HyperOPT heuristics to that the
+    weight of the edge should be added.  */
+ static int hopt_h_sums[HOPT_H_VERT][HOPT_H_VERT][ORDS_PER_EDGE] =
+ {
+   {					/* (0, *) */
+     {HOPT_H_NORDS, HOPT_H_NORDS},	/* (0, 0) */
+     {0, 1},				/* (0, 1) */
+     {2, 3},				/* (0, 2) */
+     {4, 5}				/* (0, 3) */
+   },
+   {					/* (1, *) */
+     {3, 5},				/* (1, 0) */
+     {HOPT_H_NORDS, HOPT_H_NORDS}, 	/* (1, 1) */
+     {0, 4},				/* (1, 2) */
+     {1, 2}				/* (1, 3) */
+   },
+   {					/* (2, *) */
+     {1, 4},				/* (2, 0) */
+     {2, 5},				/* (2, 1) */
+     {HOPT_H_NORDS, HOPT_H_NORDS}, 	/* (2, 2) */
+     {0, 3}				/* (2, 3) */
+   },
+   {					/* (3, *) */
+     {0, 2},				/* (3, 0) */
+     {3, 4},				/* (3, 1) */
+     {1, 5},				/* (3, 2) */
+     {HOPT_H_NORDS, HOPT_H_NORDS}	/* (3, 3) */
+   }
+ };
+ 
+ /* Weights of edges are cached here.  */
+ 
+ static int graph[MAX_TSP_SIZE + 2][MAX_TSP_SIZE + 2];
+ 
+ /* Returns weight of edge A --> B.  */
+ 
+ #define edge_weight(A, B) graph[A][B]
+ 
+ static void optimize_tsp (int, int *, struct vertex *, int);
+ static int tsp_value (int, int *);
+ static int three_swap_delta (int *, int, int, int);
+ void dump_tour (FILE *, int *, int);
+ static int hopt_delta (int [], int [], int [], sbitmap, int *, struct vertex *);
+ static void best_three_swap (int, int, int *, int *, int *, int *, int, int *);
+ static void hopt_perform (int *, int, int, int);
+ 
+ /* Solves a tsp instance of size N.  WEIGHTS is a matrix of weights of
+    edges.  Result is retunrned in TOUR; the initial tour is also obtained
+    from there.  */
+ 
+ void
+ solve_tsp (int n, int *tour, struct vertex *weights, int iterate)
+ {
+   int tspvalue;
+   int i, j;
+ 
+   if (dump_file)
+     fprintf (dump_file, "TSP:\n");
+ 
+   for (i = 0; i < n; i++)
+     {
+       for (j = 0; j < n; j++)
+ 	graph[i][j] = weights[i].def_cost;
+       for (j = 0; j < weights[i].n_spec; j++)
+ 	graph[i][weights[i].spec_tgt[j]] = weights[i].spec_cost[j];
+     }
+ 
+   tspvalue = tsp_value (n, tour);
+ 
+   if (dump_file)
+     {
+       dump_tour (dump_file, tour, n);
+       fprintf (dump_file, "%d\n", tspvalue);
+     }
+ 
+   if (tspvalue)
+     optimize_tsp (n, tour, weights, iterate);
+ 
+   if (dump_file)
+     {
+       dump_tour (dump_file, tour, n);
+       fprintf (dump_file, "FINAL: %d\n", tsp_value (n, tour));
+     }
+ }
+ 
+ /* Dumps TOUR of length N to FILE.  */
+ 
+ void
+ dump_tour (FILE *file, int *tour, int n)
+ {
+   int i, a;
+ 
+   for (i = 0, a = 0; i < n - 1; i++, a = tour[a])
+     fprintf (file, "%d ", a);
+   fprintf (file, "%d\n", a);
+ }
+ 
+ /* Return value of a tsp TOUR on N vertices.  */
+ 
+ static int
+ tsp_value (int n, int *tour)
+ {
+   int i, act, sum = 0;
+ 
+   act = 0;
+   for (i = 0; i < n - 1; i++, act = tour[act])
+     sum += edge_weight (act, tour[act]);
+ 
+   return sum;
+ }
+ 
+ /* Returns gain of performing a 3-swap on TOUR in graph WEIGHTS at
+    positions I1, I2 and I3.  */
+ 
+ static int
+ three_swap_delta (int *tour, int i1, int i2, int i3)
+ {
+   int old, new;
+ 
+   old = (edge_weight (i1, tour[i1])
+ 	 + edge_weight (i2, tour[i2])
+ 	 + edge_weight (i3, tour[i3]));
+ 
+   new = (edge_weight (i1, tour[i2])
+ 	 + edge_weight (i2, tour[i3])
+ 	 + edge_weight (i3, tour[i1]));
+ 
+   return new - old;
+ }
+ 
+ /* Returns gain of performing a HyperOPT optimization on segments stored in V1 
+    and V2.  Index of the best permutation is returned in BEST_PERM.  WEIGHTS
+    is the graph in that we work.  V2S is a bitmap of members of V2.  TOUR is
+    the current tour.  */
+ 
+ static int
+ hopt_delta (int tour[], int v1[], int v2[], sbitmap v2s, int *best_perm,
+ 	    struct vertex *weights)
+ {
+   int i, j, old, best, bi, a, av, ew;
+   int sums[HOPT_H_NORDS + 1];
+ 
+   for (i = 0; i < HOPT_H_VERT; i++)
+     for (j = 0; j < weights[v1[i]].n_spec; j++)
+       {
+ 	a = weights[v1[i]].spec_tgt[j];
+ 	if (a != tour[v1[i]] && TEST_BIT (v2s, a))
+ 	  goto cont;
+       }
+   return 0;
+ 
+ cont: ;
+ 
+   /* Loop is manually unrolled here based on assumption that noone will ever
+      increase the value of HOPT_H.  Just make sure that if someone insane
+      enough does it, he notices this place.  */
+ #if (ORDS_PER_EDGE != 2)
+   abort ();
+ #endif
+ 
+   memset (sums, 0, sizeof (sums));
+   for (i = 0; i < HOPT_H_VERT; i++)
+     {
+       av = v1[i];
+       for (j = 0; j < HOPT_H_VERT; j++)
+ 	{
+ 	  ew = edge_weight (av, v2[j]);
+ 	  sums[hopt_h_sums[i][j][0]] += ew;
+ 	  sums[hopt_h_sums[i][j][1]] += ew;
+ 	}
+     }
+ 
+   bi = 0;
+   old = best = sums[0];
+   for (i = 1; i < HOPT_H_NORDS; i++)
+     {
+       a = sums[i];
+ 
+       if (a >= best)
+ 	continue;
+ 
+       best = a;
+       bi = i;
+     }
+ 
+   *best_perm = bi;
+   return best - old;
+ }
+ 
+ /* Perform PERMth HyperOPT swap on TOUR at positions I1 and I2.  */
+ 
+ static void
+ hopt_perform (int *tour, int i1, int i2, int perm)
+ {
+   int i, j, a;
+   int v1[HOPT_H_VERT], v2[HOPT_H_VERT];
+ 
+   for (a = i1, i = 0; i < HOPT_H - 1; i++, a = tour[a])
+     v1[i] = a;
+   for (a = i2; i < HOPT_H_VERT; i++, a = tour[a])
+     v1[i] = a;
+ 
+   for (a = tour[i1], i = 1; i < HOPT_H; i++, a = tour[a])
+     v2[i] = a;
+   for (a = tour[i2]; i < HOPT_H_VERT; i++, a = tour[a])
+     v2[i] = a;
+   v2[0] = a;
+ 
+   a = hopt_h_ords[perm][0];
+   for (i = 1; i < HOPT_H_VERT; i++, a = j)
+     {
+       j = hopt_h_ords[perm][i];
+       tour[v1[a]] = v2[j];
+     }
+   tour[v1[a]] = v2[hopt_h_ords[perm][0]];
+ }
+ 
+ /* Find the best 3-swap that introduces I1 --> I2 into TOUR in graph WEIGHTS
+    on N vertices.  Endpoints of the segments are returned in BI1, BI2 and BI3,
+    value in BEST_3OPT.  */
+ 
+ static void
+ best_three_swap (int i1, int i2, int *bi1, int *bi2, int *bi3, int *best_3opt,
+ 		 int n, int *tour)
+ {
+   int a, seen_i1 = false, m, v;
+ 
+   if (tour[i1] == i2)
+     return;
+ 
+   for (a = 0; tour[a] != i2; a = tour[a])
+     if (a == i1)
+       seen_i1 = true;
+ 
+   if (!seen_i1)
+     {
+       /* Backward edge.  The middle region must be between i1 and i2.  */
+       for (m = i2; m != i1; m = tour[m])
+ 	{
+ 	  v = three_swap_delta (tour, a, m, i1);
+ 	  if (v >= *best_3opt)
+ 	    continue;
+ 
+ 	  *bi1 = a;
+ 	  *bi2 = m;
+ 	  *bi3 = i1;
+ 	  *best_3opt = v;
+ 	}
+     }
+   else
+     {
+       /* Either this is the arc from A to C...  */
+       for (m = i2; tour[m] != n; m = tour[m])
+ 	{
+ 	  v = three_swap_delta (tour, i1, a, m);
+ 	  if (v >= *best_3opt)
+ 	    continue;
+ 
+ 	  *bi1 = i1;
+ 	  *bi2 = a;
+ 	  *bi3 = m;
+ 	  *best_3opt = v;
+ 	}
+ 
+       /* ...or from B to D.  */
+       for (m = 0; m != i1; m = tour[m])
+ 	{
+ 	  v = three_swap_delta (tour, m, i1, a);
+ 	  if (v >= *best_3opt)
+ 	    continue;
+ 
+ 	  *bi1 = m;
+ 	  *bi2 = i1;
+ 	  *bi3 = a;
+ 	  *best_3opt = v;
+ 	}
+     }
+ }
+ 
+ /* Improves TOUR using the heuristics.  N is number of vertices, the graph is
+    stored in WEIGHTS.  ITERATE is true if we should iterate until convergence
+    rather than making just one pass.  */
+ 
+ static void
+ optimize_tsp (int n, int *tour, struct vertex *weights, int iterate)
+ {
+   int i1, i2, best_3opt, bi1, bi2, bi3, a, best_hopt, hi2, hperm, aperm, f, i;
+   int changed = true;
+   int temp;
+   int value = 0, new_value;
+   int work[MAX_TSP_SIZE + 2];
+   int wbot, wtop;
+   sbitmap in_work, v2s;
+   int v1[HOPT_H_VERT], v2[HOPT_H_VERT];
+ 
+ #ifndef ENABLE_CHECKING
+   if (dump_file)
+     {
+ #endif
+       value = tsp_value (n, tour);
+ #ifndef ENABLE_CHECKING
+     }
+ #endif
+ 
+   in_work = sbitmap_alloc (n);
+   v2s = sbitmap_alloc (n);
+   sbitmap_zero (v2s);
+ 
+ #define PUT(X)					\
+   do						\
+     {						\
+       if (!TEST_BIT (in_work, (X)))		\
+ 	{					\
+ 	  SET_BIT (in_work, (X));		\
+ 	  work[wtop++] = (X);			\
+ 	  if (wtop == MAX_TSP_SIZE + 2)		\
+ 	    wtop = 0;				\
+ 	}					\
+     }						\
+   while (0)
+ 
+   while (changed)
+     {
+       changed = false;
+ 
+       for (i1 = 0; i1 < n - 1; i1++)
+ 	work[i1] = i1;
+       sbitmap_ones (in_work);
+       wbot = 0;
+       wtop = n - 1;
+ 
+       while (wbot != wtop)
+ 	{
+ 	  i1 = work[wbot++];
+ 	  if (wbot == MAX_TSP_SIZE + 2)
+ 	    wbot = 0;
+ 	  RESET_BIT (in_work, i1);
+ 	  
+ 	  /* Find a best 3-OPT move starting with I1.  */
+ 	  best_3opt = 0;
+ 	  bi1 = bi2 = bi3 = 0;
+ 
+ 	  for (a = 0; a < weights[i1].n_spec; a++)
+ 	    {
+ 	      i2 = weights[i1].spec_tgt[a];
+ 
+ 	      best_three_swap (i1, i2, &bi1, &bi2, &bi3, &best_3opt,
+ 			       n, tour);
+ 	    }
+ 
+ 	  /* Find a best HyperOPT move starting with I1.  */
+ 	  best_hopt = 0;
+ 	  hperm = 0;
+ 	  hi2 = 0;
+ 	  for (i2 = i1, a = 0; i2 != n && a < HOPT_H - 1; a++)
+ 	    i2 = tour[i2];
+ 	  for (f = i2, a = 0; f != n && a < HOPT_H - 1; a++)
+ 	    f = tour[f];
+ 
+ 	  if (f != n)
+ 	    {
+ 	      for (a = i1, i = 0; i < HOPT_H - 1; i++, a = tour[a])
+ 		v1[i] = a;
+ 	      for (a = i2; i < HOPT_H_VERT; i++, a = tour[a])
+ 		v1[i] = a;
+ 
+ 	      for (a = tour[i1], i = 1; i < HOPT_H; i++, a = tour[a])
+ 		{
+ 		  v2[i] = a;
+ 		  SET_BIT (v2s, a);
+ 		}
+ 	      for (a = tour[i2]; i < HOPT_H_VERT; i++, a = tour[a])
+ 		{
+ 		  v2[i] = a;
+ 		  SET_BIT (v2s, a);
+ 		}
+ 	      v2[0] = a;
+ 	      SET_BIT (v2s, a);
+ 
+ 	      while (1)
+ 		{
+ 		  a = hopt_delta (tour, v1, v2, v2s, &aperm, weights);
+ 		  if (a < best_hopt)
+ 		    {
+ 		      hi2 = i2;
+ 		      best_hopt = a;
+ 		      hperm = aperm;
+ 		    }
+ 
+ 		  i2 = tour[i2];
+ 		  f = tour[f];
+ 		  if (f == n)
+ 		    break;
+ 
+ 		  RESET_BIT (v2s, i2);
+ 		  SET_BIT (v2s, f);
+ 
+ 		  for (a = i2, i = HOPT_H - 1; i < HOPT_H_VERT; i++, a = tour[a])
+ 		    v1[i] = a;
+ 		  for (a = tour[i2], i = HOPT_H; i < HOPT_H_VERT; i++, a = tour[a])
+ 		    v2[i] = a;
+ 		  v2[0] = a;
+ 		}
+ 
+ 	      for (i = 0; i < HOPT_H_VERT; i++)
+ 		RESET_BIT (v2s, v2[i]);
+ 	    }
+ 
+ 	  if (best_3opt < best_hopt)
+ 	    {
+ 	      temp = tour[bi1];
+ 	      tour[bi1] = tour[bi2];
+ 	      tour[bi2] = tour[bi3];
+ 	      tour[bi3] = temp;
+ 
+ #ifndef ENABLE_CHECKING
+ 	      if (dump_file)
+ 		{
+ #endif
+     		  new_value = value + best_3opt;
+     		  value = tsp_value (n, tour);
+ #ifndef ENABLE_CHECKING
+ 		}
+ #endif
+ 
+ 	      PUT (bi1);
+ 	      PUT (bi2);
+ 	      PUT (bi3);
+ 
+ 	      if (dump_file)
+ 		{
+ 		  fprintf (dump_file,
+ 			   "3 swap %d %d %d -- new tour value %d\n",
+ 			   bi1, bi2, bi3, value);
+ 		  dump_tour (dump_file, tour, n);
+ 		}
+ 	    }
+ 	  else if (best_hopt < 0)
+ 	    {
+ 	      for (a = i1, i2 = 0; i2 < HOPT_H; i2++, a = tour[a])
+ 		PUT (a);
+ 	      for (a = hi2, i2 = 0; i2 < HOPT_H; i2++, a = tour[a])
+ 		PUT (a);
+ 
+ 	      hopt_perform (tour, i1, hi2, hperm);
+ 
+ #ifndef ENABLE_CHECKING
+ 	      if (dump_file)
+ 		{
+ #endif
+     		  new_value = value + best_hopt;
+     		  value = tsp_value (n, tour);
+ #ifndef ENABLE_CHECKING
+ 		}
+ #endif
+ 
+ 	      if (dump_file)
+ 		{
+ 		  fprintf (dump_file,
+ 			   "hopt %d %d -- new tour value %d\n", i1, hi2, value);
+ 		  dump_tour (dump_file, tour, n);
+ 		}
+ 	    }
+ 	  else
+ 	    continue;
+ 
+ #ifdef ENABLE_CECKING
+ 	  if (value != new_value)
+ 	    abort ();
+ #endif
+ 	  changed = true;
+ 	}
+ 
+       if (!iterate)
+ 	break;
+     }
+ 
+   sbitmap_free (in_work);
+   sbitmap_free (v2s);
+ }
Index: doc/invoke.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/invoke.texi,v
retrieving revision 1.438
diff -c -3 -p -r1.438 invoke.texi
*** doc/invoke.texi	25 Mar 2004 17:43:24 -0000	1.438
--- doc/invoke.texi	31 Mar 2004 07:02:05 -0000
*************** in the following sections.
*** 284,290 ****
  -foptimize-sibling-calls  -fprefetch-loop-arrays @gol
  -fprofile-generate -fprofile-use @gol
  -freduce-all-givs  -fregmove  -frename-registers @gol
! -freorder-blocks  -freorder-functions @gol
  -frerun-cse-after-loop  -frerun-loop-opt @gol
  -frounding-math -fschedule-insns  -fschedule-insns2 @gol
  -fno-sched-interblock  -fno-sched-spec  -fsched-spec-load @gol
--- 284,290 ----
  -foptimize-sibling-calls  -fprefetch-loop-arrays @gol
  -fprofile-generate -fprofile-use @gol
  -freduce-all-givs  -fregmove  -frename-registers @gol
! -freorder-blocks -ftsp-ordering -freorder-functions @gol
  -frerun-cse-after-loop  -frerun-loop-opt @gol
  -frounding-math -fschedule-insns  -fschedule-insns2 @gol
  -fno-sched-interblock  -fno-sched-spec  -fsched-spec-load @gol
*************** invoking @option{-O2} on programs that u
*** 3655,3661 ****
  @opindex O3
  Optimize yet more.  @option{-O3} turns on all optimizations specified by
  @option{-O2} and also turns on the @option{-finline-functions},
! @option{-fweb} and @option{-frename-registers} options.
  
  @item -O0
  @opindex O0
--- 3655,3661 ----
  @opindex O3
  Optimize yet more.  @option{-O3} turns on all optimizations specified by
  @option{-O2} and also turns on the @option{-finline-functions},
! @option{-ftsp-ordering}, @option{-fweb} and @option{-frename-registers} options.
  
  @item -O0
  @opindex O0
*************** Reorder basic blocks in the compiled fun
*** 4194,4199 ****
--- 4194,4208 ----
  taken branches and improve code locality.
  
  Enabled at levels @option{-O2}, @option{-O3}.
+ 
+ @item -ftsp-ordering
+ @opindex ftsp-ordering
+ In addition to heuristic used at @option{-freorder-blocks}, use also a more
+ powerful method based on the reduction to the Travelling Salesman Problem
+ to determine the good order of basic blocks in the function.  This generally
+ produces smaller and faster programs, at the expense of longer compile times.
+ 
+ Enabled at level @option{-O3}.
  
  @item -freorder-functions
  @opindex freorder-functions
Index: doc/passes.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/passes.texi,v
retrieving revision 1.31
diff -c -3 -p -r1.31 passes.texi
*** doc/passes.texi	7 Feb 2004 14:14:54 -0000	1.31
--- doc/passes.texi	31 Mar 2004 07:02:05 -0000
*************** Basic block reordering.  This pass imple
*** 499,507 ****
  positioning.  If profile information is not available, various types of
  static analysis are performed to make the predictions normally coming
  from the profile feedback (IE execution frequency, branch probability,
! etc).  It is implemented in the file @file{bb-reorder.c}, and the
! various prediction routines are in @file{predict.c}.
! 
  @opindex dB
  The option @option{-dB} causes a debugging dump of the RTL code after
  this pass.  This dump file's name is made by appending @samp{.bbro} to
--- 499,510 ----
  positioning.  If profile information is not available, various types of
  static analysis are performed to make the predictions normally coming
  from the profile feedback (IE execution frequency, branch probability,
! etc).  To determine the ordering, an algorithm of Software Trace Cache
! is used.  In addition, it is also possible to use the method based on
! reduction to the Travelling Salesman Problem.  The core of the pass is
! implemented in the file @file{bb-reorder.c}, various prediction routines
! are in @file{predict.c} and the TSP solver is in @file{tsp.c}.
!   
  @opindex dB
  The option @option{-dB} causes a debugging dump of the RTL code after
  this pass.  This dump file's name is made by appending @samp{.bbro} to
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]