This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[hammer] TSP based bb reordering


Hello,

I have implemented bb reordering pass based on reduction to the
Travelling Salesman Problem (see
http://citeseer.nj.nec.com/young97nearoptimal.html). I am commiting it
to hammer branch for now, since it is obviously too late for mainline.

Some numbers:

Compile time costs -- the time to compile combine.c is increased by
  about 1-2% with -ftsp-ordering (about 3% when we iterate the tsp
  solver heuristics until convergence, almost unmeasurable when HyperOPT
  heuristics is disabled)

Tested on Athlon.  Binaries are a bit smaller (by 0.5-2%), the specint
  2000 results see below.  The parameters for the optimization are set
  so that it basically just minimizes the number of executions of
  unconditional jumps, since Athlon's branch prediction is mostly
  unafected by branch layout.

Zdenek

                                     Estimated                     Estimated
                   Base      Base      Base      Peak      Peak      Peak
   Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
   ------------  --------  --------  --------  --------  --------  --------
   164.gzip          1400   189       742    *     1400   188       744    *
   175.vpr           1400   372       377    *     1400   374       375    *
   176.gcc           1100   225       488    *     1100   227       485    *
   181.mcf           1800   731       246    *     1800   732       246    *
   186.crafty        1000   109       913    *     1000   109       918    *
   197.parser        1800   402       448    *     1800   398       452    *
   252.eon           1300        --          X     1300        --          X
   253.perlbmk       1800   203       888    *     1800   204       881    *
   254.gap           1100   179       616    *     1100   175       628    *
   255.vortex        1900   233       814    *     1900   231       822    *
   256.bzip2         1500   328       458    *     1500   330       454    *
   300.twolf         3000   785       382    *     3000   780       385    *

	* tsp.c: New file.
	* Makefile.in (tsp.o): Add.
	(bb-reorder.o): Add PARAMS_H dependency.
	* bb-reorder.c: Include params.h.
	(JUMP_COST, HIT_TAKEN_COST, MISS_TAKEN_COST, MISS_FALLTHRU_COST): New
	macros.
	(reorder_using_tsp, cost_of_uncond_jump, cost_of_cond_jump,
	cost_of_cond_branch, cost_b_after_a, record_edges): New functions.
	(reorder_basic_blocks): Call reorder_using_tsp.
	* cfglayout.h (struct vertex): New.
	(MAX_TSP_SIZE): New macro.
	(solve_tsp): Declare.
	* flags.h (flag_tsp_ordering): Declare.
	* params.def (PARAM_HIT_TAKEN_COST, PARAM_MISS_TAKEN_COST,
	PARAM_MISS_FALLTHRU_COST): New.
	* toplev.c (dump_file): Fix order of bbro and vartrack dumps.
	(flag_tsp_ordering): New.
	(f_options): Add -ftsp-ordering.
	* doc/invoke.texi (-ftsp-ordering): Document.
	* doc/passes.texi (tsp.c): Document.


/* A simple aproximate ATSP solver.
   Copyright (C) 2003 Free Software Foundation, Inc.

This file is part of GCC.

GCC is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2, or (at your option) any later
version.

GCC is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
for more details.

You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING.  If not, write to the Free
Software Foundation, 59 Temple Place - Suite 330, Boston, MA
02111-1307, USA.  */

#include "config.h"
#include "system.h"
#include "rtl.h"
#include "basic-block.h"
#include "cfglayout.h"
#include "output.h"

/* A simple assymetric tsp solver.  It uses the following heuristic (referred
   as OPT-3 + HyperOPT in literature):

   (Tour is made so that it always starts in 0 and ends in n - 1).
   
   All improving 3-swaps (reordering of tour ABCD into ACBD, where A, B,
      C and D are parts of the tour) are performed.  To decrease the amount
      of work done, we use a small observation -- there are at most two cheap
      edges coming from each vertex (all other edges out of the vertex
      have the same cost) and at least one such edge must obviously be
      involved in the swap; moreover, it must be one of the newly added edges.
   
   For every two non-overlapping segments of length 3, relink them in
   an optimal manner.  */

#define HOPT_H 3
#define HOPT_H_NORDS 6
#define HOPT_H_VERT (2 * HOPT_H - 2)
#define ORDS_PER_EDGE (HOPT_H_NORDS / (HOPT_H_VERT - 1))

/* All orders HyperOPT heuristics tries.  */
static int hopt_h_ords[HOPT_H_NORDS][HOPT_H_VERT] =
{
  {0, 1, 2, 3},
  {0, 1, 3, 2},
  {0, 2, 1, 3},
  {0, 2, 3, 1},
  {0, 3, 1, 2},
  {0, 3, 2, 1}
};

/* For each edge indices of sums in the HyperOPT heuristics to that the
   weight of the edge should be added.  */
static int hopt_h_sums[HOPT_H_VERT][HOPT_H_VERT][ORDS_PER_EDGE] =
{
  {					/* (0, *) */
    {HOPT_H_NORDS, HOPT_H_NORDS},	/* (0, 0) */
    {0, 1},				/* (0, 1) */
    {2, 3},				/* (0, 2) */
    {4, 5}				/* (0, 3) */
  },
  {					/* (1, *) */
    {3, 5},				/* (1, 0) */
    {HOPT_H_NORDS, HOPT_H_NORDS}, 	/* (1, 1) */
    {0, 4},				/* (1, 2) */
    {1, 2}				/* (1, 3) */
  },
  {					/* (2, *) */
    {1, 4},				/* (2, 0) */
    {2, 5},				/* (2, 1) */
    {HOPT_H_NORDS, HOPT_H_NORDS}, 	/* (2, 2) */
    {0, 3}				/* (2, 3) */
  },
  {					/* (3, *) */
    {0, 2},				/* (3, 0) */
    {3, 4},				/* (3, 1) */
    {1, 5},				/* (3, 2) */
    {HOPT_H_NORDS, HOPT_H_NORDS}	/* (3, 3) */
  }
};

/* Weights of edges are cached here.  */

static int graph[MAX_TSP_SIZE + 2][MAX_TSP_SIZE + 2];

/* Returns weight of edge A --> B.  */

#define edge_weight(A, B) graph[A][B]

static void optimize_tsp (int, int *, struct vertex *, int);
static int tsp_value (int, int *);
static int three_swap_delta (int *, int, int, int);
void dump_tour (FILE *, int *, int);
static int hopt_delta (int [], int [], int [], sbitmap, int *, struct vertex *);
static void best_three_swap (int, int, int *, int *, int *, int *, int, int *);
static void hopt_perform (int *, int, int, int);

/* Solves a tsp instance of size N.  WEIGHTS is a matrix of weights of
   edges.  Result is retunrned in TOUR; the initial tour is also obtained
   from there.  */

void
solve_tsp (int n, int *tour, struct vertex *weights, int iterate)
{
  int tspvalue;
  int i, j;

  if (rtl_dump_file)
    fprintf (rtl_dump_file, "TSP:\n");

  for (i = 0; i < n; i++)
    {
      for (j = 0; j < n; j++)
	graph[i][j] = weights[i].def_cost;
      for (j = 0; j < weights[i].n_spec; j++)
	graph[i][weights[i].spec_tgt[j]] = weights[i].spec_cost[j];
    }

  tspvalue = tsp_value (n, tour);

  if (rtl_dump_file)
    {
      dump_tour (rtl_dump_file, tour, n);
      fprintf (rtl_dump_file, "%d\n", tspvalue);
    }

  if (tspvalue)
    optimize_tsp (n, tour, weights, iterate);

  if (rtl_dump_file)
    {
      dump_tour (rtl_dump_file, tour, n);
      fprintf (rtl_dump_file, "FINAL: %d\n", tsp_value (n, tour));
    }
}

/* Dumps TOUR of length N to FILE.  */

void
dump_tour (FILE *file, int *tour, int n)
{
  int i, a;

  for (i = 0, a = 0; i < n - 1; i++, a = tour[a])
    fprintf (file, "%d ", a);
  fprintf (file, "%d\n", a);
}

/* Return value of a tsp TOUR on N vertices.  */

static int
tsp_value (int n, int *tour)
{
  int i, act, sum = 0;

  act = 0;
  for (i = 0; i < n - 1; i++, act = tour[act])
    sum += edge_weight (act, tour[act]);

  return sum;
}

/* Returns gain of performing a 3-swap on TOUR in graph WEIGHTS at
   positions I1, I2 and I3.  */

static int
three_swap_delta (int *tour, int i1, int i2, int i3)
{
  int old, new;

  old = (edge_weight (i1, tour[i1])
	 + edge_weight (i2, tour[i2])
	 + edge_weight (i3, tour[i3]));

  new = (edge_weight (i1, tour[i2])
	 + edge_weight (i2, tour[i3])
	 + edge_weight (i3, tour[i1]));

  return new - old;
}

/* Returns gain of performing a HyperOPT optimization on segments stored in V1 
   and V2.  Index of the best permutation is returned in BEST_PERM.  WEIGHTS
   is the graph in that we work.  V2S is a bitmap of members of V2.  TOUR is
   the current tour.  */

static int
hopt_delta (int tour[], int v1[], int v2[], sbitmap v2s, int *best_perm,
	    struct vertex *weights)
{
  int i, j, old, best, bi, a, av, ew;
  int sums[HOPT_H_NORDS + 1];

  for (i = 0; i < HOPT_H_VERT; i++)
    for (j = 0; j < weights[v1[i]].n_spec; j++)
      {
	a = weights[v1[i]].spec_tgt[j];
	if (a != tour[v1[i]] && TEST_BIT (v2s, a))
	  goto cont;
      }
  return 0;

cont: ;

  /* Loop is manually unrolled here based on assumption that noone will ever
     increase the value of HOPT_H.  Just make sure that if someone insane
     enough does it, he notices this place.  */
#if (ORDS_PER_EDGE != 2)
  abort ();
#endif

  memset (sums, 0, sizeof (sums));
  for (i = 0; i < HOPT_H_VERT; i++)
    {
      av = v1[i];
      for (j = 0; j < HOPT_H_VERT; j++)
	{
	  ew = edge_weight (av, v2[j]);
	  sums[hopt_h_sums[i][j][0]] += ew;
	  sums[hopt_h_sums[i][j][1]] += ew;
	}
    }

  bi = 0;
  old = best = sums[0];
  for (i = 1; i < HOPT_H_NORDS; i++)
    {
      a = sums[i];

      if (a >= best)
	continue;

      best = a;
      bi = i;
    }

  *best_perm = bi;
  return best - old;
}

/* Perform PERMth HyperOPT swap on TOUR at positions I1 and I2.  */

static void
hopt_perform (int *tour, int i1, int i2, int perm)
{
  int i, j, a;
  int v1[HOPT_H_VERT], v2[HOPT_H_VERT];

  for (a = i1, i = 0; i < HOPT_H - 1; i++, a = tour[a])
    v1[i] = a;
  for (a = i2; i < HOPT_H_VERT; i++, a = tour[a])
    v1[i] = a;

  for (a = tour[i1], i = 1; i < HOPT_H; i++, a = tour[a])
    v2[i] = a;
  for (a = tour[i2]; i < HOPT_H_VERT; i++, a = tour[a])
    v2[i] = a;
  v2[0] = a;

  a = hopt_h_ords[perm][0];
  for (i = 1; i < HOPT_H_VERT; i++, a = j)
    {
      j = hopt_h_ords[perm][i];
      tour[v1[a]] = v2[j];
    }
  tour[v1[a]] = v2[hopt_h_ords[perm][0]];
}

/* Find the best 3-swap that introduces I1 --> I2 into TOUR in graph WEIGHTS
   on N vertices.  Endpoints of the segments are returned in BI1, BI2 and BI3,
   value in BEST_3OPT.  */

static void
best_three_swap (int i1, int i2, int *bi1, int *bi2, int *bi3, int *best_3opt,
		 int n, int *tour)
{
  int a, seen_i1 = false, m, v;

  if (tour[i1] == i2)
    return;

  for (a = 0; tour[a] != i2; a = tour[a])
    if (a == i1)
      seen_i1 = true;

  if (!seen_i1)
    {
      /* Backward edge.  The middle region must be between i1 and i2.  */
      for (m = i2; m != i1; m = tour[m])
	{
	  v = three_swap_delta (tour, a, m, i1);
	  if (v >= *best_3opt)
	    continue;

	  *bi1 = a;
	  *bi2 = m;
	  *bi3 = i1;
	  *best_3opt = v;
	}
    }
  else
    {
      /* Either this is the arc from A to C...  */
      for (m = i2; tour[m] != n; m = tour[m])
	{
	  v = three_swap_delta (tour, i1, a, m);
	  if (v >= *best_3opt)
	    continue;

	  *bi1 = i1;
	  *bi2 = a;
	  *bi3 = m;
	  *best_3opt = v;
	}

      /* ...or from B to D.  */
      for (m = 0; m != i1; m = tour[m])
	{
	  v = three_swap_delta (tour, m, i1, a);
	  if (v >= *best_3opt)
	    continue;

	  *bi1 = m;
	  *bi2 = i1;
	  *bi3 = a;
	  *best_3opt = v;
	}
    }
}

/* Improves TOUR using the heuristics.  N is number of vertices, the graph is
   stored in WEIGHTS.  ITERATE is true if we should iterate until convergence
   rather than making just one pass.  */

static void
optimize_tsp (int n, int *tour, struct vertex *weights, int iterate)
{
  int i1, i2, best_3opt, bi1, bi2, bi3, a, best_hopt, hi2, hperm, aperm, f, i;
  int changed = true;
  int temp;
  int value = 0, new_value;
  int work[MAX_TSP_SIZE + 2];
  int wbot, wtop;
  sbitmap in_work, v2s;
  int v1[HOPT_H_VERT], v2[HOPT_H_VERT];

#ifndef ENABLE_CHECKING
  if (rtl_dump_file)
    {
#endif
      value = tsp_value (n, tour);
#ifndef ENABLE_CHECKING
    }
#endif

  in_work = sbitmap_alloc (n);
  v2s = sbitmap_alloc (n);
  sbitmap_zero (v2s);

#define PUT(X)					\
  do						\
    {						\
      if (!TEST_BIT (in_work, (X)))		\
	{					\
	  SET_BIT (in_work, (X));		\
	  work[wtop++] = (X);			\
	  if (wtop == MAX_TSP_SIZE + 2)		\
	    wtop = 0;				\
	}					\
    }						\
  while (0)

  while (changed)
    {
      changed = false;

      for (i1 = 0; i1 < n - 1; i1++)
	work[i1] = i1;
      sbitmap_ones (in_work);
      wbot = 0;
      wtop = n - 1;

      while (wbot != wtop)
	{
	  i1 = work[wbot++];
	  if (wbot == MAX_TSP_SIZE + 2)
	    wbot = 0;
	  RESET_BIT (in_work, i1);
	  
	  /* Find a best 3-OPT move starting with I1.  */
	  best_3opt = 0;
	  bi1 = bi2 = bi3 = 0;

	  for (a = 0; a < weights[i1].n_spec; a++)
	    {
	      i2 = weights[i1].spec_tgt[a];

	      best_three_swap (i1, i2, &bi1, &bi2, &bi3, &best_3opt,
			       n, tour);
	    }

	  /* Find a best HyperOPT move starting with I1.  */
	  best_hopt = 0;
	  hperm = 0;
	  hi2 = 0;
	  for (i2 = i1, a = 0; i2 != n && a < HOPT_H - 1; a++)
	    i2 = tour[i2];
	  for (f = i2, a = 0; f != n && a < HOPT_H - 1; a++)
	    f = tour[f];

	  if (f != n)
	    {
	      for (a = i1, i = 0; i < HOPT_H - 1; i++, a = tour[a])
		v1[i] = a;
	      for (a = i2; i < HOPT_H_VERT; i++, a = tour[a])
		v1[i] = a;

	      for (a = tour[i1], i = 1; i < HOPT_H; i++, a = tour[a])
		{
		  v2[i] = a;
		  SET_BIT (v2s, a);
		}
	      for (a = tour[i2]; i < HOPT_H_VERT; i++, a = tour[a])
		{
		  v2[i] = a;
		  SET_BIT (v2s, a);
		}
	      v2[0] = a;
	      SET_BIT (v2s, a);

	      while (1)
		{
		  a = hopt_delta (tour, v1, v2, v2s, &aperm, weights);
		  if (a < best_hopt)
		    {
		      hi2 = i2;
		      best_hopt = a;
		      hperm = aperm;
		    }

		  i2 = tour[i2];
		  f = tour[f];
		  if (f == n)
		    break;

		  RESET_BIT (v2s, i2);
		  SET_BIT (v2s, f);

		  for (a = i2, i = HOPT_H - 1; i < HOPT_H_VERT; i++, a = tour[a])
		    v1[i] = a;
		  for (a = tour[i2], i = HOPT_H; i < HOPT_H_VERT; i++, a = tour[a])
		    v2[i] = a;
		  v2[0] = a;
		}

	      for (i = 0; i < HOPT_H_VERT; i++)
		RESET_BIT (v2s, v2[i]);
	    }

	  if (best_3opt < best_hopt)
	    {
	      temp = tour[bi1];
	      tour[bi1] = tour[bi2];
	      tour[bi2] = tour[bi3];
	      tour[bi3] = temp;

#ifndef ENABLE_CHECKING
	      if (rtl_dump_file)
		{
#endif
    		  new_value = value + best_3opt;
    		  value = tsp_value (n, tour);
#ifndef ENABLE_CHECKING
		}
#endif

	      PUT (bi1);
	      PUT (bi2);
	      PUT (bi3);

	      if (rtl_dump_file)
		{
		  fprintf (rtl_dump_file,
			   "3 swap %d %d %d -- new tour value %d\n",
			   bi1, bi2, bi3, value);
		  dump_tour (rtl_dump_file, tour, n);
		}
	    }
	  else if (best_hopt < 0)
	    {
	      for (a = i1, i2 = 0; i2 < HOPT_H; i2++, a = tour[a])
		PUT (a);
	      for (a = hi2, i2 = 0; i2 < HOPT_H; i2++, a = tour[a])
		PUT (a);

	      hopt_perform (tour, i1, hi2, hperm);

#ifndef ENABLE_CHECKING
	      if (rtl_dump_file)
		{
#endif
    		  new_value = value + best_hopt;
    		  value = tsp_value (n, tour);
#ifndef ENABLE_CHECKING
		}
#endif

	      if (rtl_dump_file)
		{
		  fprintf (rtl_dump_file,
			   "hopt %d %d -- new tour value %d\n", i1, hi2, value);
		  dump_tour (rtl_dump_file, tour, n);
		}
	    }
	  else
	    continue;

#ifdef ENABLE_CECKING
	  if (value != new_value)
	    abort ();
#endif
	  changed = true;
	}

      if (!iterate)
	break;
    }

  sbitmap_free (in_work);
  sbitmap_free (v2s);
}

Index: Makefile.in
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Makefile.in,v
retrieving revision 1.955.2.28
diff -c -3 -p -r1.955.2.28 Makefile.in
*** Makefile.in	11 Oct 2003 19:43:55 -0000	1.955.2.28
--- Makefile.in	21 Oct 2003 08:39:34 -0000
*************** C_OBJS = c-parse.o c-lang.o c-pretty-pri
*** 747,753 ****
  # Language-independent object files.
  
  OBJS = alias.o bb-reorder.o bitmap.o builtins.o caller-save.o calls.o	   \
!  cfg.o cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o cfgloop.o		   \
   cfgrtl.o combine.o conflict.o convert.o cse.o cselib.o dbxout.o	   \
   cfgloopanal.o cfgloopmanip.o loop-init.o loop-unswitch.o loop-unroll.o	   \
   debug.o df.o diagnostic.o doloop.o dominance.o		                   \
--- 747,753 ----
  # Language-independent object files.
  
  OBJS = alias.o bb-reorder.o bitmap.o builtins.o caller-save.o calls.o	   \
!  cfg.o cfganal.o cfgbuild.o cfgcleanup.o cfglayout.o tsp.o cfgloop.o	   \
   cfgrtl.o combine.o conflict.o convert.o cse.o cselib.o dbxout.o	   \
   cfgloopanal.o cfgloopmanip.o loop-init.o loop-unswitch.o loop-unroll.o	   \
   debug.o df.o diagnostic.o doloop.o dominance.o		                   \
*************** predict.o: predict.c $(CONFIG_H) $(SYSTE
*** 1678,1684 ****
  lists.o: lists.c $(CONFIG_H) $(SYSTEM_H) toplev.h $(RTL_H) $(GGC_H)
  bb-reorder.o : bb-reorder.c $(CONFIG_H) $(SYSTEM_H) $(RTL_H) $(TREE_H) \
     flags.h $(BASIC_BLOCK_H) hard-reg-set.h output.h cfglayout.h $(TARGET_H) \
!    profile.h $(FIBHEAP_H)
  var-tracking.o : var-tracking.c $(CONFIG_H) $(SYSTEM_H) flags.h \
     $(RTL_H) $(TREE_H) hard-reg-set.h insn-config.h reload.h $(BASIC_BLOCK_H) \
     output.h sbitmap.h alloc-pool.h $(FIBHEAP_H) $(HASHTAB_H)
--- 1678,1684 ----
  lists.o: lists.c $(CONFIG_H) $(SYSTEM_H) toplev.h $(RTL_H) $(GGC_H)
  bb-reorder.o : bb-reorder.c $(CONFIG_H) $(SYSTEM_H) $(RTL_H) $(TREE_H) \
     flags.h $(BASIC_BLOCK_H) hard-reg-set.h output.h cfglayout.h $(TARGET_H) \
!    profile.h $(FIBHEAP_H) $(PARAMS_H)
  var-tracking.o : var-tracking.c $(CONFIG_H) $(SYSTEM_H) flags.h \
     $(RTL_H) $(TREE_H) hard-reg-set.h insn-config.h reload.h $(BASIC_BLOCK_H) \
     output.h sbitmap.h alloc-pool.h $(FIBHEAP_H) $(HASHTAB_H)
*************** tracer.o : tracer.c $(CONFIG_H) $(SYSTEM
*** 1691,1696 ****
--- 1691,1698 ----
  cfglayout.o : cfglayout.c $(CONFIG_H) $(SYSTEM_H) $(RTL_H) $(TREE_H) \
     insn-config.h $(BASIC_BLOCK_H) hard-reg-set.h output.h function.h \
     cfglayout.h cfgloop.h $(TARGET_H)
+ tsp.o: tsp.c $(CONFIG_H) $(SYSTEM_H) cfglayout.h $(RTL_H) $(BASIC_BLOCK_H) \
+    output.h
  timevar.o : timevar.c $(CONFIG_H) $(SYSTEM_H) $(TIMEVAR_H) flags.h intl.h
  regrename.o : regrename.c $(CONFIG_H) $(SYSTEM_H) $(RTL_H) insn-config.h \
     $(BASIC_BLOCK_H) $(REGS_H) hard-reg-set.h output.h $(RECOG_H) function.h \
Index: bb-reorder.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/bb-reorder.c,v
retrieving revision 1.50.12.9
diff -c -3 -p -r1.50.12.9 bb-reorder.c
*** bb-reorder.c	24 Jul 2003 17:11:54 -0000	1.50.12.9
--- bb-reorder.c	21 Oct 2003 08:39:35 -0000
***************
*** 77,82 ****
--- 77,83 ----
  #include "fibheap.h"
  #include "target.h"
  #include "profile.h"
+ #include "params.h"
  
  /* The number of rounds.  */
  #define N_ROUNDS 4
*************** static int exec_threshold[N_ROUNDS] = {0
*** 91,96 ****
--- 92,109 ----
     block the edge destination is not duplicated while connecting traces.  */
  #define DUPLICATION_THRESHOLD 100
  
+ /* Cost of an unconditional jump.  */
+ #define JUMP_COST 100
+ 
+ /* The cost of correctly predicted branch that is taken.  */
+ #define HIT_TAKEN_COST (PARAM_VALUE (PARAM_HIT_TAKEN_COST))
+ 
+ /* The cost of incorrectly predicted branch that is taken.  */
+ #define MISS_TAKEN_COST (PARAM_VALUE (PARAM_MISS_TAKEN_COST))
+ 
+ /* The cost of incorrectly predicted branch that is not taken.  */
+ #define MISS_FALLTHRU_COST (PARAM_VALUE (PARAM_MISS_FALLTHRU_COST))
+ 
  /* Length of unconditional jump instruction.  */
  static int uncond_jump_length;
  
*************** static bool better_edge_p		PARAMS ((basi
*** 145,150 ****
--- 158,169 ----
  static void connect_traces		PARAMS ((int, struct trace *));
  static bool copy_bb_p			PARAMS ((basic_block, int));
  static int get_uncond_jump_length	PARAMS ((void));
+ static void reorder_using_tsp (void);
+ static int cost_of_uncond_jump (edge);
+ static int cost_of_cond_jump (edge);
+ static int cost_of_cond_branch (basic_block);
+ static int cost_b_after_a (basic_block, basic_block);
+ static void record_edges (basic_block, struct vertex *, int last);
  
  /* Find the traces for Software Trace Cache.  Chain each trace through
     RBI()->next.  Store the number of traces to N_TRACES and description of
*************** get_uncond_jump_length ()
*** 1070,1075 ****
--- 1089,1339 ----
    return length;
  }
  
+ /* Returns a cost of unconditional jump corresponding to edge E.  */
+ 
+ static int
+ cost_of_uncond_jump (edge e)
+ {
+   if (e->dest == EXIT_BLOCK_PTR)
+     return 0;
+ 
+   if (optimize_size)
+     return 1;
+ 
+   /* If the area is cold, do flat optimization for size, with low
+      priority.  */
+   if (probably_cold_bb_p (e->src))
+     return JUMP_COST;
+ 
+   return JUMP_COST * e->src->frequency;
+ }
+ 
+ /* Returns a cost of conditional jump corresponding to edge E.  */
+ 
+ static int
+ cost_of_cond_jump (edge e)
+ {
+   int cost = 0;
+ 
+   if (optimize_size
+       || probably_cold_bb_p (e->src))
+     return 0;
+ 
+   /* We assume the more probable branch is predicted.  */
+   if (2 * e->probability > REG_BR_PROB_BASE)
+     {
+       /* E is predicted to be taken.  */
+       cost += EDGE_FREQUENCY (e) * HIT_TAKEN_COST;
+       cost += (e->src->frequency - EDGE_FREQUENCY (e)) * MISS_FALLTHRU_COST;
+     }
+   else
+     {
+       /* E is predicted not to be taken.  */
+       cost += EDGE_FREQUENCY (e) * MISS_TAKEN_COST;
+ 
+       /* HIT_FALLTHRU has no cost.  */
+     }
+ 
+   return cost;
+ }
+ 
+ /* Return a cost of jumps from basic block BB to two locations not adjacent
+    to it.  */
+ static int
+ cost_of_cond_branch (basic_block bb)
+ {
+   int more_freq, less_freq, tmp, cost = 0;
+ 
+   if (optimize_size)
+     return 1;
+ 
+   /* If the area is cold, do flat optimization for size, with low
+      priority.  */
+   if (probably_cold_bb_p (bb))
+     return JUMP_COST;
+ 
+   /* It is assumed to be transformed into
+ 
+      bb  -- helper
+       \           \
+        \           \
+         more        less  */
+ 
+   more_freq = EDGE_FREQUENCY (bb->succ);
+   less_freq = EDGE_FREQUENCY (bb->succ->succ_next);
+ 
+   if (less_freq > more_freq)
+     {
+       tmp = less_freq;
+       less_freq = more_freq;
+       more_freq = tmp;
+     }
+ 
+   cost += more_freq * HIT_TAKEN_COST;
+   cost += less_freq * (MISS_FALLTHRU_COST + JUMP_COST);
+ 
+   return cost;
+ }
+ 
+ /* Returns cost for placing B immediately after A.  */
+ static int
+ cost_b_after_a (basic_block a, basic_block b)
+ {
+   if (!a->succ)
+     return 0;
+ 
+   if (!a->succ->succ_next)
+     return a->succ->dest == b ? 0 : cost_of_uncond_jump (a->succ);
+ 
+   if (!any_condjump_p (a->end))
+     return 0;
+ 
+   if (a->succ->succ_next->succ_next)
+     abort ();
+ 
+   if (a->succ->dest == b)
+     return cost_of_cond_jump (a->succ->succ_next);
+   
+   if (a->succ->succ_next->dest == b)
+     return cost_of_cond_jump (a->succ);
+ 
+   return cost_of_cond_branch (a);
+ }
+ 
+ /* Record costs for edges coming out of FROM to VERTEX; LAST is true
+    if it is the tail of the considered segment.  */
+ static void
+ record_edges (basic_block from, struct vertex *vertex, int last)
+ {
+   edge e;
+   basic_block tgt;
+ 
+   vertex->n_spec = 0;
+ 
+   if (last
+       || !from->succ
+       || (from->succ->succ_next
+ 	  && !any_condjump_p (from->end)))
+     {
+       vertex->def_cost = 0;
+       return;
+     }
+ 
+   for (e = from->succ; e; e = e->succ_next)
+     {
+       tgt = e->dest;
+       if (tgt == EXIT_BLOCK_PTR)
+ 	{
+ 	  if (vertex->n_spec != 0)
+ 	    abort ();
+ 	  vertex->def_cost = 0;
+ 	  return;
+ 	}
+ 
+       /* We indeed want <= here, as edges to the head of the current segment
+ 	 are irrelevant.  */
+       if (RBI (tgt)->visited <= 0)
+ 	continue;
+ 
+       if (tgt == from)
+ 	continue;
+ 
+       vertex->spec_tgt[vertex->n_spec] = RBI (tgt)->visited;
+       vertex->spec_cost[vertex->n_spec] = cost_b_after_a (from, tgt);
+       vertex->n_spec++;
+     }
+   if (vertex->n_spec > 2)
+     abort ();
+ 
+   vertex->def_cost = (from->succ->succ_next
+ 		      ? cost_of_cond_branch (from)
+ 		      : cost_of_uncond_jump (from->succ));
+ }
+ 
+ /* Reorder blocks using tsp solver.  */
+ 
+ static void
+ reorder_using_tsp ()
+ {
+   basic_block block[MAX_TSP_SIZE + 2];
+   struct vertex graph[MAX_TSP_SIZE + 2];
+   int tour[MAX_TSP_SIZE + 2];
+   basic_block start, old_start;
+   int n, i, a;
+   basic_block bb;
+ 
+   FOR_EACH_BB (bb)
+     {
+       RBI (bb)->visited = -1;
+     }
+   if (n_basic_blocks <= 1)
+     return;
+ 
+ #define NEXT_BLOCK(BB)			\
+   ((BB) == EXIT_BLOCK_PTR		\
+    ? NULL				\
+    : (RBI (BB)->next == NULL		\
+       ? EXIT_BLOCK_PTR			\
+       : RBI (BB)->next))
+   start = ENTRY_BLOCK_PTR->next_bb;
+   while (1)
+     {
+       old_start = start;
+ 
+       if (rtl_dump_file)
+ 	fprintf (rtl_dump_file, "Old order:");
+       for (n = 0;
+ 	   start && n < MAX_TSP_SIZE + 2;
+ 	   n++, start = NEXT_BLOCK (start))
+ 	{
+ 	  block[n] = start;
+ 	  RBI (start)->visited = n;
+ 	  if (rtl_dump_file)
+ 	    fprintf (rtl_dump_file, " %d", start->index);
+ 	}
+       if (rtl_dump_file)
+ 	fprintf (rtl_dump_file, "\n");
+ 
+       for (i = 0; i < n; i++)
+ 	{
+ 	  record_edges (block[i], graph + i, i == n - 1);
+ 	  tour[i] = i + 1;
+ 	}
+       solve_tsp (n, tour, graph, optimize >= 3);
+       
+       if (block[n - 1] == EXIT_BLOCK_PTR)
+ 	block[n - 1] = NULL;
+ 
+       if (rtl_dump_file)
+ 	fprintf (rtl_dump_file, "New order:");
+       for (i = 0, a = 0; i < n - 1; i++, a = tour[a])
+ 	{
+ 	  RBI (block[a])->visited = -1;
+ 	  RBI (block[a])->next = block[tour[a]];
+ 
+ 	  if (rtl_dump_file)
+ 	    fprintf (rtl_dump_file, " %d", block[a]->index);
+ 	}
+       if (block[a])
+ 	{
+ 	  RBI (block[a])->visited = -1;
+ 	  if (rtl_dump_file)
+ 	    fprintf (rtl_dump_file, " %d\n", block[a]->index);
+ 	}
+       else if (rtl_dump_file)
+ 	fprintf (rtl_dump_file, " EXIT\n");
+ 
+       if (!start)
+ 	break;
+ 
+       /* Ensure some overlap between the instances.  */
+       start = old_start;
+       for (; n > 0; n -= 2)
+ 	start = NEXT_BLOCK (start);
+     }
+ #undef NEXT_BLOCK
+ }
+ 
  /* Reorder basic blocks.  The main entry point to this file.  */
  
  void
*************** reorder_basic_blocks ()
*** 1098,1103 ****
--- 1362,1370 ----
  
    if (rtl_dump_file)
      dump_flow_info (rtl_dump_file);
+ 
+   if (flag_tsp_ordering)
+     reorder_using_tsp ();
  
    cfg_layout_finalize ();
  }
Index: cfglayout.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cfglayout.h,v
retrieving revision 1.3.20.3
diff -c -3 -p -r1.3.20.3 cfglayout.h
*** cfglayout.h	2 Jul 2003 22:16:08 -0000	1.3.20.3
--- cfglayout.h	21 Oct 2003 08:39:35 -0000
*************** typedef struct reorder_block_def
*** 35,40 ****
--- 35,49 ----
  
  #define RBI(BB)	((reorder_block_def) (BB)->aux)
  
+ /* A type for vertices in tsp solver.  */
+ struct vertex
+ {
+   int n_spec;			/* Number of special edges.  */
+   int spec_tgt[2];		/* Special edge targets.  */
+   int spec_cost[2];		/* Special edge costs.  */
+   int def_cost;			/* Cost of remaining edges.  */
+ };
+ 
  extern void cfg_layout_initialize	PARAMS ((struct loops *));
  extern void cfg_layout_finalize		PARAMS ((void));
  extern bool cfg_layout_can_duplicate_bb_p PARAMS ((basic_block));
*************** extern bool can_copy_bbs_p		PARAMS ((bas
*** 47,49 ****
--- 56,64 ----
  extern void copy_bbs	PARAMS ((basic_block *, unsigned, basic_block *,
  				 edge *, unsigned, edge *, struct loop *,
  				 struct loops *));
+ 
+ /* Maximum number of vertices of tsp instance solved (in fact 2 less, as source
+    and target are not included).  */
+ #define MAX_TSP_SIZE 200
+ 
+ extern void solve_tsp (int, int *, struct vertex *, int);
Index: flags.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/flags.h,v
retrieving revision 1.93.2.6
diff -c -3 -p -r1.93.2.6 flags.h
*** flags.h	24 Jun 2003 22:02:03 -0000	1.93.2.6
--- flags.h	21 Oct 2003 08:39:35 -0000
*************** extern int flag_branch_probabilities;
*** 217,222 ****
--- 217,226 ----
  
  extern int flag_reorder_blocks;
  
+ /* Nonzero if blocks should be reordered using tsp.  */
+ 
+ extern int flag_tsp_ordering;
+ 
  /* Nonzero if functions should be reordered.  */
  
  extern int flag_reorder_functions;
Index: params.def
===================================================================
RCS file: /cvs/gcc/gcc/gcc/params.def,v
retrieving revision 1.16.2.8
diff -c -3 -p -r1.16.2.8 params.def
*** params.def	7 Aug 2003 12:17:36 -0000	1.16.2.8
--- params.def	21 Oct 2003 08:39:35 -0000
*************** DEFPARAM(PARAM_MAX_CROSSJUMP_EDGES,
*** 270,275 ****
--- 270,299 ----
  	 "The maximum number of incoming edges to consider for crossjumping",
  	 100)
  
+ /* The cost of correctly predicted branch that is taken, relatively to
+    unconditional jump cost (100).  */
+    
+ DEFPARAM(PARAM_HIT_TAKEN_COST,
+ 	 "hit-taken-cost",
+ 	 "The cost of correctly predicted branch that is taken",
+ 	 3)
+ 
+ /* The cost of incorrectly predicted branch that is taken, relatively to
+    unconditional jump cost (100).  */
+    
+ DEFPARAM(PARAM_MISS_TAKEN_COST,
+ 	 "miss-taken-cost",
+ 	 "The cost of incorrectly predicted branch that is taken",
+ 	 3)
+ 
+ /* The cost of incorrectly predicted branch that is not taken, relatively to
+    unconditional jump cost (100).  */
+    
+ DEFPARAM(PARAM_MISS_FALLTHRU_COST,
+ 	 "miss-fallthru-cost",
+ 	 "The cost of incorrectly predicted branch that is not taken",
+ 	 0)
+ 
  #ifdef ENABLE_GC_ALWAYS_COLLECT
  # define GGC_MIN_EXPAND_DEFAULT 0
  # define GGC_MIN_HEAPSIZE_DEFAULT 0
Index: toplev.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/toplev.c,v
retrieving revision 1.688.2.33
diff -c -3 -p -r1.688.2.33 toplev.c
*** toplev.c	11 Oct 2003 19:44:06 -0000	1.688.2.33
--- toplev.c	21 Oct 2003 08:39:35 -0000
*************** static struct dump_file_info dump_file[D
*** 310,317 ****
    { "ce3",	'E', 1, 0, 0 },
    { "sched2",	'R', 1, 0, 0 },
    { "stack",	'k', 1, 0, 0 },
-   { "vartrack",	'V', 1, 0, 0 },
    { "bbro",	'B', 1, 0, 0 },
    { "mach",	'M', 1, 0, 0 },
    { "dbr",	'd', 0, 0, 0 },
  };
--- 310,317 ----
    { "ce3",	'E', 1, 0, 0 },
    { "sched2",	'R', 1, 0, 0 },
    { "stack",	'k', 1, 0, 0 },
    { "bbro",	'B', 1, 0, 0 },
+   { "vartrack",	'V', 1, 0, 0 },
    { "mach",	'M', 1, 0, 0 },
    { "dbr",	'd', 0, 0, 0 },
  };
*************** int flag_speculative_prefetching = 0;
*** 414,419 ****
--- 414,423 ----
  
  int flag_reorder_blocks = 0;
  
+ /* Nonzero if basic blocks should be reordered using tsp.  */
+ 
+ int flag_tsp_ordering = 0;
+ 
  /* Nonzero if functions should be reordered.  */
  
  int flag_reorder_functions = 0;
*************** static const lang_independent_options f_
*** 1178,1183 ****
--- 1182,1189 ----
     N_("Enable basic program profiling code") },
    {"reorder-blocks", &flag_reorder_blocks, 1,
     N_("Reorder basic blocks to improve code placement") },
+   {"tsp-ordering", &flag_tsp_ordering, 1,
+    N_("Reorder basic blocks using tsp") },
    {"reorder-functions", &flag_reorder_functions, 1,
     N_("Reorder functions to improve code placement") },
    {"rename-registers", &flag_rename_registers, 1,
Index: doc/invoke.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/invoke.texi,v
retrieving revision 1.198.2.32
diff -c -3 -p -r1.198.2.32 invoke.texi
*** doc/invoke.texi	10 Oct 2003 05:55:44 -0000	1.198.2.32
--- doc/invoke.texi	21 Oct 2003 08:46:33 -0000
*************** in the following sections.
*** 280,286 ****
  -fsched-spec-load-dangerous  -fsignaling-nans @gol
  -fsingle-precision-constant  -fssa  -fssa-ccp  -fssa-dce -fvrp @gol
  -fstrength-reduce  -fstrict-aliasing  -ftracer  -fthread-jumps @gol
! -funroll-all-loops  -funroll-loops  @gol
  --param @var{name}=@var{value} @gol
  -O  -O0  -O1  -O2  -O3  -Os}
  
--- 280,286 ----
  -fsched-spec-load-dangerous  -fsignaling-nans @gol
  -fsingle-precision-constant  -fssa  -fssa-ccp  -fssa-dce -fvrp @gol
  -fstrength-reduce  -fstrict-aliasing  -ftracer  -fthread-jumps @gol
! -ftsp-ordering -funroll-all-loops  -funroll-loops  @gol
  --param @var{name}=@var{value} @gol
  -O  -O0  -O1  -O2  -O3  -Os}
  
*************** Reorder basic blocks in the compiled fun
*** 3963,3968 ****
--- 3963,3975 ----
  taken branches and improve code locality.
  
  Enabled at levels @option{-O2}, @option{-O3}.
+ 
+ @item -ftsp-ordering
+ @opindex ftsp-ordering
+ In addition to heuristic used at @option{-freorder-blocks}, use also a more
+ powerful method based on the reduction to the Travelling Salesman Problem
+ to determine the good order of basic blocks in the function.  This generally
+ produces smaller and faster programs, at the expense of longer compile times.
  
  @item -freorder-functions
  @opindex freorder-functions
Index: doc/passes.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/passes.texi,v
retrieving revision 1.10.12.8
diff -c -3 -p -r1.10.12.8 passes.texi
*** doc/passes.texi	10 Oct 2003 05:55:44 -0000	1.10.12.8
--- doc/passes.texi	21 Oct 2003 08:46:33 -0000
*************** Basic block reordering.  This pass imple
*** 567,574 ****
  positioning.  If profile information is not available, various types of
  static analysis are performed to make the predictions normally coming
  from the profile feedback (IE execution frequency, branch probability,
! etc).  It is implemented in the file @file{bb-reorder.c}, and the
! various prediction routines are in @file{predict.c}.
  
  @opindex dB
  The option @option{-dB} causes a debugging dump of the RTL code after
--- 567,577 ----
  positioning.  If profile information is not available, various types of
  static analysis are performed to make the predictions normally coming
  from the profile feedback (IE execution frequency, branch probability,
! etc).  To determine the ordering, an algorithm of Software Trace Cache
! is used.  In addition, it is also possible to use the method based on
! reduction to the Travelling Salesman Problem.  The core of the pass is
! implemented in the file @file{bb-reorder.c}, various prediction routines
! are in @file{predict.c} and the TSP solver is in @file{tsp.c}.
  
  @opindex dB
  The option @option{-dB} causes a debugging dump of the RTL code after


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]