This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH][RFC] Add new ipa-reorder pass


Hi.

Function reordering has been around for quite some time and a naive
implementation was also part of my diploma thesis some time ago.
Currently, the GCC can reorder function based on first execution, which
happens with PGO and LTO of course. Known limitation is that the order
is preserved only partially as various symbols go into different LTRANS partitions.

There has been some research in the area and I would point out the Facebook paper
([1]) and Sony presentation ([2]). Based on that, I decided to make a new implementation
in the GCC that does the same (in a proper way). First part of the enablement are patches
to ld.bfd and ld.gold that come up with a new section .text.sorted, that is always sorted.
There's a snippet from the modified default linker script:

  .text           :
  {
    *(.text.unlikely .text.*_unlikely .text.unlikely.*)
    *(.text.exit .text.exit.*)
    *(.text.startup .text.startup.*)
    *(.text.hot .text.hot.*)
    *(SORT(.text.sorted.*))
    *(.text .stub .text.* .gnu.linkonce.t.*)
    /* .gnu.warning sections are handled specially by elf.em.  */
    *(.gnu.warning)
  }

Second part is the reordering pass where I implemented both approaches:
the existing one (-freorder-functions-algorithm=first-run) and the new one
based on the Facebook paper (-freorder-functions-algorithm=call-chain-clustering).
The implementation is straightforward and utilizes the newly introduced text subsection.

Results:
I made a PGO LTO bootstrap of the GCC with the patch and -freorder-functions-algorithm=call-chain-clustering.
The pass makes sorting order for about 7500 functions and from these about 7300 are really put in the desired
order. The remaining are the functions that are later observed as cold and are eventually put into .text.unlikely
section. I verified the order with nm (sorted by address) and my -fdump-ipa-reorder dump. The aforementioned
sorted function occupy 5.0MB (out of 18.3MB) of the final binary of cc1plus. I made a simple benchmark of:

./cc1plus /home/marxin/Programming/tramp3d/tramp3d-v4.ii -O2 -quiet and run it 20x before and after my patch:
run #020: old:15.033 s, new: 14.975 s, cmp: 99.61%

The numbers are stable and present a speed up of about 0.4%. To verify that I also measured iTLB misses
before:

18,475,504      iTLB-load-misses:u
and after:
13,779,347      iTLB-load-misses:u

Both these numbers are very stable based on my measurements. That said, I believe the patch is correct
and can help in huge binaries that have a flat profile.

Notes and limitations:
- The call-chain-clustering algorithm requires to fit as many as possible functions into page size (4K).
  Based on my measurements that should correspond to ~1000 GIMPLE statements (IPA inliner size). I can
  make it a param in the future.
- One needs modified binutils and I that would probably require a configure detection. The only way
  which I see is based on ld --version. I'm planning to make the binutils submission soon.
- Right now, I prefer to put functions into .text.sorted.* rather than .text.hot. That's required by the
  algorithm to put hot function calls next to each other.
- Observation: Very hot calls are likely inlined and so that the clustering reached quite big functions.

Thoughts? I would definitely welcome any interesting measurement on a bigger load.

Martin

[1] https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf
[2] https://llvm.org/devmtg/2018-10/slides/Spencer-Profile%20Guided%20Function%20Layout%20in%20LLVM%20and%20LLD.pdf
>From 69aa0fd381d8a1362f68812a70c9ccdeaa1e51b0 Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Wed, 18 Sep 2019 13:23:34 +0200
Subject: [PATCH] Introduce new .text.sorted.* sections.

---
 gold/layout.cc                        | 9 ++++++---
 ld/scripttempl/arclinux.sc            | 1 +
 ld/scripttempl/elf.sc                 | 1 +
 ld/scripttempl/elf64bpf.sc            | 1 +
 ld/scripttempl/nds32elf.sc            | 1 +
 ld/testsuite/ld-arm/arm-no-rel-plt.ld | 1 +
 ld/testsuite/ld-arm/fdpic-main.ld     | 1 +
 ld/testsuite/ld-arm/fdpic-shared.ld   | 1 +
 8 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/gold/layout.cc b/gold/layout.cc
index fc7cdf8b8b..b3a668544f 100644
--- a/gold/layout.cc
+++ b/gold/layout.cc
@@ -1131,12 +1131,15 @@ Layout::special_ordering_of_input_section(const char* name)
     ".text.hot"
   };
 
-  for (size_t i = 0;
-       i < sizeof(text_section_sort) / sizeof(text_section_sort[0]);
-       i++)
+  unsigned scount = sizeof(text_section_sort) / sizeof(text_section_sort[0]);
+  for (size_t i = 0; i < scount; i++)
     if (is_prefix_of(text_section_sort[i], name))
       return i;
 
+  int order;
+  if (sscanf (name, ".text.sorted.%d", &order) == 1)
+    return order + scount;
+
   return -1;
 }
 
diff --git a/ld/scripttempl/arclinux.sc b/ld/scripttempl/arclinux.sc
index e13969ede0..41e8ccdf1b 100644
--- a/ld/scripttempl/arclinux.sc
+++ b/ld/scripttempl/arclinux.sc
@@ -491,6 +491,7 @@ cat <<EOF
     ${RELOCATING+*(.text.exit .text.exit.*)}
     ${RELOCATING+*(.text.startup .text.startup.*)}
     ${RELOCATING+*(.text.hot .text.hot.*)}
+    ${RELOCATING+*(SORT(.text.sorted.*))}
     *(.text .stub${RELOCATING+ .text.* .gnu.linkonce.t.*})
     /* .gnu.warning sections are handled specially by elf.em.  */
     *(.gnu.warning)
diff --git a/ld/scripttempl/elf.sc b/ld/scripttempl/elf.sc
index c3ad467bff..0d61881185 100644
--- a/ld/scripttempl/elf.sc
+++ b/ld/scripttempl/elf.sc
@@ -514,6 +514,7 @@ cat <<EOF
     ${RELOCATING+*(.text.exit .text.exit.*)}
     ${RELOCATING+*(.text.startup .text.startup.*)}
     ${RELOCATING+*(.text.hot .text.hot.*)}
+    ${RELOCATING+*(SORT(.text.sorted.*))}
     *(.text .stub${RELOCATING+ .text.* .gnu.linkonce.t.*})
     /* .gnu.warning sections are handled specially by elf.em.  */
     *(.gnu.warning)
diff --git a/ld/scripttempl/elf64bpf.sc b/ld/scripttempl/elf64bpf.sc
index de73775349..7937a41d2c 100644
--- a/ld/scripttempl/elf64bpf.sc
+++ b/ld/scripttempl/elf64bpf.sc
@@ -512,6 +512,7 @@ cat <<EOF
     ${RELOCATING+*(.text.exit .text.exit.*)}
     ${RELOCATING+*(.text.startup .text.startup.*)}
     ${RELOCATING+*(.text.hot .text.hot.*)}
+    ${RELOCATING+*(SORT(.text.sorted.*))}
     *(.text .stub${RELOCATING+ .text.* .gnu.linkonce.t.*})
     /* .gnu.warning sections are handled specially by elf.em.  */
     *(.gnu.warning)
diff --git a/ld/scripttempl/nds32elf.sc b/ld/scripttempl/nds32elf.sc
index 065c984f70..8d8d6e3f74 100644
--- a/ld/scripttempl/nds32elf.sc
+++ b/ld/scripttempl/nds32elf.sc
@@ -438,6 +438,7 @@ cat <<EOF
     ${RELOCATING+*(.text.exit .text.exit.*)}
     ${RELOCATING+*(.text.startup .text.startup.*)}
     ${RELOCATING+*(.text.hot .text.hot.*)}
+    ${RELOCATING+*(SORT(.text.sorted.*))}
     *(.text .stub${RELOCATING+ .text.* .gnu.linkonce.t.*})
     /* .gnu.warning sections are handled specially by elf.em.  */
     *(.gnu.warning)
diff --git a/ld/testsuite/ld-arm/arm-no-rel-plt.ld b/ld/testsuite/ld-arm/arm-no-rel-plt.ld
index d56e8203ad..14c1aeb439 100644
--- a/ld/testsuite/ld-arm/arm-no-rel-plt.ld
+++ b/ld/testsuite/ld-arm/arm-no-rel-plt.ld
@@ -65,6 +65,7 @@ SECTIONS
     *(.text.exit .text.exit.*)
     *(.text.startup .text.startup.*)
     *(.text.hot .text.hot.*)
+    *(SORT(.text.sorted.*))
     *(.text .stub .text.* .gnu.linkonce.t.*)
     /* .gnu.warning sections are handled specially by elf.em.  */
     *(.gnu.warning)
diff --git a/ld/testsuite/ld-arm/fdpic-main.ld b/ld/testsuite/ld-arm/fdpic-main.ld
index d19a589d6c..b01b630fea 100644
--- a/ld/testsuite/ld-arm/fdpic-main.ld
+++ b/ld/testsuite/ld-arm/fdpic-main.ld
@@ -76,6 +76,7 @@ SECTIONS
     *(.text.exit .text.exit.*)
     *(.text.startup .text.startup.*)
     *(.text.hot .text.hot.*)
+    *(SORT(.text.sorted.*))
     *(.text .stub .text.* .gnu.linkonce.t.*)
     /* .gnu.warning sections are handled specially by elf.em.  */
     *(.gnu.warning)
diff --git a/ld/testsuite/ld-arm/fdpic-shared.ld b/ld/testsuite/ld-arm/fdpic-shared.ld
index b1e262d829..b710ffa745 100644
--- a/ld/testsuite/ld-arm/fdpic-shared.ld
+++ b/ld/testsuite/ld-arm/fdpic-shared.ld
@@ -67,6 +67,7 @@ SECTIONS
     *(.text.exit .text.exit.*)
     *(.text.startup .text.startup.*)
     *(.text.hot .text.hot.*)
+    *(SORT(.text.sorted.*))
     *(.text .stub .text.* .gnu.linkonce.t.*)
     /* .gnu.warning sections are handled specially by elf.em.  */
     *(.gnu.warning)
-- 
2.23.0

>From d9c81529ae2a66bd51c49d83e19edb58b19cbf97 Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Thu, 5 Sep 2019 13:32:41 +0200
Subject: [PATCH] Add new ipa-reorder pass.

---
 gcc/Makefile.in         |   1 +
 gcc/cgraph.c            |   2 +
 gcc/cgraph.h            |   2 +
 gcc/cgraphclones.c      |   1 +
 gcc/common.opt          |  14 ++
 gcc/flag-types.h        |   8 +
 gcc/ipa-reorder.c       | 333 ++++++++++++++++++++++++++++++++++++++++
 gcc/lto-cgraph.c        |   2 +
 gcc/lto/lto-partition.c |  18 ---
 gcc/passes.def          |   1 +
 gcc/timevar.def         |   1 +
 gcc/tree-pass.h         |   1 +
 gcc/varasm.c            |   9 ++
 13 files changed, 375 insertions(+), 18 deletions(-)
 create mode 100644 gcc/ipa-reorder.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 152df9fa9b3..c0efef626c5 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1368,6 +1368,7 @@ OBJS = \
 	init-regs.o \
 	internal-fn.o \
 	ipa-cp.o \
+	ipa-reorder.o \
 	ipa-devirt.o \
 	ipa-fnsummary.o \
 	ipa-polymorphic-call.o \
diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 843891e9e56..a0f350c26c0 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2041,6 +2041,8 @@ cgraph_node::dump (FILE *f)
     }
   if (tp_first_run > 0)
     fprintf (f, " first_run:%i", tp_first_run);
+  if (text_sorted_order > 0)
+    fprintf (f, " text_sorted_order:%i", text_sorted_order);
   if (origin)
     fprintf (f, " nested in:%s", origin->asm_name ());
   if (gimple_has_body_p (decl))
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 4c54210123a..24c0b2be321 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1446,6 +1446,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node
   unsigned int profile_id;
   /* Time profiler: first run of function.  */
   int tp_first_run;
+  /* Order in .text.sorted.* section.  */
+  int text_sorted_order;
 
   /* Set when decl is an abstract function pointed to by the
      ABSTRACT_DECL_ORIGIN of a reachable function.  */
diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index fa753697c78..4d47b3e7574 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -462,6 +462,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
   new_node->rtl = rtl;
   new_node->frequency = frequency;
   new_node->tp_first_run = tp_first_run;
+  new_node->text_sorted_order = text_sorted_order;
   new_node->tm_clone = tm_clone;
   new_node->icf_merged = icf_merged;
   new_node->merged_comdat = merged_comdat;
diff --git a/gcc/common.opt b/gcc/common.opt
index 1b9e0f3c802..e68f28f6401 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2252,6 +2252,20 @@ freorder-functions
 Common Report Var(flag_reorder_functions) Optimization
 Reorder functions to improve code placement.
 
+freorder-functions-algorithm=
+Common Joined RejectNegative Enum(reorder_functions_algorithm) Var(flag_reorder_functions_algorithm) Init(REORDER_FUNCTIONS_ALGORITHM_CALL_CHAIN_CLUSTERING) Optimization
+-freorder-functions-algorithm=[first-run|call-chain-clustering]	Set the used function reordering algorithm.
+
+Enum
+Name(reorder_functions_algorithm) Type(enum reorder_functions_algorithm) UnknownError(unknown function reordering algorithm %qs)
+
+EnumValue
+Enum(reorder_functions_algorithm) String(first-run) Value(REORDER_FUNCTIONS_ALGORITHM_FIRST_RUN)
+
+EnumValue
+Enum(reorder_functions_algorithm) String(call-chain-clustering) Value(REORDER_FUNCTIONS_ALGORITHM_CALL_CHAIN_CLUSTERING)
+
+
 frerun-cse-after-loop
 Common Report Var(flag_rerun_cse_after_loop) Optimization
 Add a common subexpression elimination pass after loop optimizations.
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index a2103282d46..9658aa99ea5 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -138,6 +138,14 @@ enum reorder_blocks_algorithm
   REORDER_BLOCKS_ALGORITHM_STC
 };
 
+/* The algorithm used for function reordering.  */
+enum reorder_functions_algorithm
+{
+  REORDER_FUNCTIONS_ALGORITHM_FIRST_RUN,
+  REORDER_FUNCTIONS_ALGORITHM_CALL_CHAIN_CLUSTERING
+};
+
+
 /* The algorithm used for the integrated register allocator (IRA).  */
 enum ira_algorithm
 {
diff --git a/gcc/ipa-reorder.c b/gcc/ipa-reorder.c
new file mode 100644
index 00000000000..d9ddc468968
--- /dev/null
+++ b/gcc/ipa-reorder.c
@@ -0,0 +1,333 @@
+/* Reorder functions based on profile.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "tree-pass.h"
+#include "cgraph.h"
+#include "symbol-summary.h"
+#include "tree-vrp.h"
+#include "alloc-pool.h"
+#include "ipa-prop.h"
+#include "ipa-fnsummary.h"
+#include "fibonacci_heap.h"
+#include <limits>
+
+using namespace std;
+
+namespace {
+
+#define C3_CLUSTER_THRESHOLD 1024
+
+struct cluster_edge;
+
+/* Cluster is set of functions considered in C3 algorithm.  */
+
+struct cluster
+{
+  cluster (cgraph_node *node, int size, sreal time):
+    m_functions (), m_callers (), m_size (size), m_time (time)
+  {
+    m_functions.safe_push (node);
+  }
+
+  vec<cgraph_node *> m_functions;
+  hash_map <cluster *, cluster_edge *> m_callers;
+  int m_size;
+  sreal m_time;
+};
+
+/* Cluster edge is an oriented edge in between two clusters.  */
+
+struct cluster_edge
+{
+  cluster_edge (cluster *caller, cluster *callee, uint32_t count):
+    m_caller (caller), m_callee (callee), m_count (count), m_heap_node (NULL)
+  {}
+
+
+  uint32_t inverted_count ()
+  {
+    return numeric_limits<uint32_t>::max () - m_count;
+  }
+
+  cluster *m_caller;
+  cluster *m_callee;
+  uint32_t m_count;
+  fibonacci_node<uint32_t, cluster_edge> *m_heap_node;
+};
+
+/* Sort functions based of first execution of the function.  */
+
+static void
+sort_functions_by_first_run (void)
+{
+  cgraph_node *node;
+
+  FOR_EACH_DEFINED_FUNCTION (node)
+    if (node->tp_first_run && !node->alias)
+      node->text_sorted_order = node->tp_first_run;
+}
+
+/* Compare clusters by density after that are established.  */
+
+static int
+cluster_cmp (const void *a_p, const void *b_p)
+{
+  const cluster *a = *(cluster * const *)a_p;
+  const cluster *b = *(cluster * const *)b_p;
+
+  unsigned fncounta = a->m_functions.length ();
+  unsigned fncountb = b->m_functions.length ();
+  if (fncounta <= 1 || fncountb <= 1)
+    return fncountb - fncounta;
+
+  sreal r = b->m_time * a->m_size - a->m_time * b->m_size;
+  return (r < 0) ? -1 : ((r > 0) ? 1 : 0);
+}
+
+/* Sort functions based on call chain clustering, which is an algorithm
+   mentioned in the following article:
+   https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf
+   .  */
+
+static void
+sort_functions_by_c3 (void)
+{
+  cgraph_node *node;
+  auto_vec<cluster *> clusters;
+
+  /* Create a cluster for each function.  */
+  FOR_EACH_DEFINED_FUNCTION (node)
+    if (!node->alias
+	&& !node->global.inlined_to)
+      {
+	ipa_fn_summary *summary = ipa_fn_summaries->get (node);
+	cluster *c = new cluster (node, summary->size, summary->time);
+	node->aux = c;
+	clusters.safe_push (c);
+      }
+
+  auto_vec<cluster_edge *> edges;
+
+  /* Insert edges between clusters that have a profile.  */
+  for (unsigned i = 0; i < clusters.length (); i++)
+    {
+      cgraph_node *node = clusters[i]->m_functions[0];
+      for (cgraph_edge *cs = node->callers; cs; cs = cs->next_caller)
+	if (cs->count.reliable_p ()
+	    && cs->count.to_gcov_type () > 0)
+	  {
+	    cluster *caller = (cluster *)cs->caller->aux;
+	    cluster *callee = (cluster *)cs->callee->aux;
+	    gcov_type count = cs->count.to_gcov_type ();
+
+	    cluster_edge **cedge = callee->m_callers.get (caller);
+	    if (cedge != NULL)
+	      (*cedge)->m_count += count;
+	    else
+	      {
+		cluster_edge *cedge = new cluster_edge (caller, callee, count);
+		edges.safe_push (cedge);
+		callee->m_callers.put (caller, cedge);
+	      }
+	  }
+    }
+
+  /* Now insert all created edges into a heap.  */
+  fibonacci_heap <uint32_t, cluster_edge> heap (0);
+
+  for (unsigned i = 0; i < clusters.length (); i++)
+    {
+      cluster *c = clusters[i];
+      for (hash_map<cluster *, cluster_edge *>::iterator it
+	   = c->m_callers.begin (); it != c->m_callers.end (); ++it)
+	{
+	  cluster_edge *cedge = (*it).second;
+	  cedge->m_heap_node = heap.insert (cedge->inverted_count (), cedge);
+	}
+    }
+
+  while (!heap.empty ())
+    {
+      cluster_edge *cedge = heap.extract_min ();
+      cluster *caller = cedge->m_caller;
+      cluster *callee = cedge->m_callee;
+      cedge->m_heap_node = NULL;
+
+      if (caller == callee)
+	continue;
+      if (caller->m_size + callee->m_size <= C3_CLUSTER_THRESHOLD)
+	{
+	  caller->m_size += callee->m_size;
+	  caller->m_time += callee->m_time;
+
+	  /* Append all cgraph_nodes from callee to caller.  */
+	  for (unsigned i = 0; i < callee->m_functions.length (); i++)
+	    caller->m_functions.safe_push (callee->m_functions[i]);
+
+	  callee->m_functions.truncate (0);
+
+	  /* Iterate all cluster_edges of callee and add them to the caller.  */
+	  for (hash_map<cluster *, cluster_edge *>::iterator it
+	       = callee->m_callers.begin (); it != callee->m_callers.end ();
+	       ++it)
+	    {
+	      (*it).second->m_callee = caller;
+	      cluster_edge **ce = caller->m_callers.get ((*it).first);
+
+	      if (ce != NULL)
+		{
+		  (*ce)->m_count += (*it).second->m_count;
+		  if ((*ce)->m_heap_node != NULL)
+		    heap.decrease_key ((*ce)->m_heap_node,
+				       (*ce)->inverted_count ());
+		}
+	      else
+		caller->m_callers.put ((*it).first, (*it).second);
+	    }
+	}
+    }
+
+  /* Sort the candidate clusters.  */
+  clusters.qsort (cluster_cmp);
+
+  /* Dump clusters.  */
+  if (dump_file)
+    {
+      for (unsigned i = 0; i < clusters.length (); i++)
+	{
+	  cluster *c = clusters[i];
+	  if (c->m_functions.length () <= 1)
+	    continue;
+
+	  fprintf (dump_file, "Cluster %d with functions: %d, size: %d,"
+		   " density: %f\n", i, c->m_functions.length (), c->m_size,
+		   (c->m_time / c->m_size).to_double ());
+	  fprintf (dump_file, "  functions: ");
+	  for (unsigned j = 0; j < c->m_functions.length (); j++)
+	    fprintf (dump_file, "%s ", c->m_functions[j]->dump_name ());
+	  fprintf (dump_file, "\n");
+	}
+      fprintf (dump_file, "\n");
+    }
+
+  /* Assign .text.sorted.* section names.  */
+  int counter = 1;
+  for (unsigned i = 0; i < clusters.length (); i++)
+    {
+      cluster *c = clusters[i];
+      if (c->m_functions.length () <= 1)
+	continue;
+
+      for (unsigned j = 0; j < c->m_functions.length (); j++)
+	{
+	  cgraph_node *node = c->m_functions[j];
+
+	  if (dump_file)
+	    fprintf (dump_file, "setting: %d for %s with size:%d\n",
+		     counter, node->dump_asm_name (),
+		     ipa_fn_summaries->get (node)->size);
+	  node->text_sorted_order = counter++;
+	}
+    }
+
+  /* Release memory.  */
+  FOR_EACH_DEFINED_FUNCTION (node)
+    if (!node->alias)
+      node->aux = NULL;
+
+  for (unsigned i = 0; i < clusters.length (); i++)
+    delete clusters[i];
+
+  for (unsigned i = 0; i < edges.length (); i++)
+    delete edges[i];
+}
+
+/* The main function for function sorting.  */
+
+static unsigned int
+ipa_reorder (void)
+{
+  switch (flag_reorder_functions_algorithm)
+    {
+    case REORDER_FUNCTIONS_ALGORITHM_CALL_CHAIN_CLUSTERING:
+      sort_functions_by_c3 ();
+      break;
+    case REORDER_FUNCTIONS_ALGORITHM_FIRST_RUN:
+      sort_functions_by_first_run ();
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  return 0;
+}
+
+const pass_data pass_data_ipa_reorder =
+{
+  IPA_PASS, /* type */
+  "reorder", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_IPA_REORDER, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_ipa_reorder : public ipa_opt_pass_d
+{
+public:
+  pass_ipa_reorder (gcc::context *ctxt)
+    : ipa_opt_pass_d (pass_data_ipa_reorder, ctxt,
+		      NULL, /* generate_summary */
+		      NULL, /* write_summary */
+		      NULL, /* read_summary */
+		      NULL, /* write_optimization_summary */
+		      NULL, /* read_optimization_summary */
+		      NULL, /* stmt_fixup */
+		      0, /* function_transform_todo_flags_start */
+		      NULL, /* function_transform */
+		      NULL) /* variable_transform */
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *);
+  virtual unsigned int execute (function *) { return ipa_reorder (); }
+
+}; // class pass_ipa_reorder
+
+bool
+pass_ipa_reorder::gate (function *)
+{
+  return flag_profile_reorder_functions && flag_profile_use && flag_wpa;
+}
+
+} // anon namespace
+
+ipa_opt_pass_d *
+make_pass_ipa_reorder (gcc::context *ctxt)
+{
+  return new pass_ipa_reorder (ctxt);
+}
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index bc0f0107333..0a424a2ad12 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -505,6 +505,7 @@ lto_output_node (struct lto_simple_output_block *ob, struct cgraph_node *node,
     section = "";
 
   streamer_write_hwi_stream (ob->main_stream, node->tp_first_run);
+  streamer_write_hwi_stream (ob->main_stream, node->text_sorted_order);
 
   bp = bitpack_create (ob->main_stream);
   bp_pack_value (&bp, node->local.local, 1);
@@ -1277,6 +1278,7 @@ input_node (struct lto_file_decl_data *file_data,
 		    "node with uid %d", node->get_uid ());
 
   node->tp_first_run = streamer_read_uhwi (ib);
+  node->text_sorted_order = streamer_read_uhwi (ib);
 
   bp = streamer_read_bitpack (ib);
 
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 6972e6e9ef3..8937e64ef72 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -381,24 +381,6 @@ node_cmp (const void *pa, const void *pb)
   const struct cgraph_node *a = *(const struct cgraph_node * const *) pa;
   const struct cgraph_node *b = *(const struct cgraph_node * const *) pb;
 
-  /* Profile reorder flag enables function reordering based on first execution
-     of a function. All functions with profile are placed in ascending
-     order at the beginning.  */
-
-  if (flag_profile_reorder_functions)
-  {
-    /* Functions with time profile are sorted in ascending order.  */
-    if (a->tp_first_run && b->tp_first_run)
-      return a->tp_first_run != b->tp_first_run
-	? a->tp_first_run - b->tp_first_run
-        : a->order - b->order;
-
-    /* Functions with time profile are sorted before the functions
-       that do not have the profile.  */
-    if (a->tp_first_run || b->tp_first_run)
-      return b->tp_first_run - a->tp_first_run;
-  }
-
   return b->order - a->order;
 }
 
diff --git a/gcc/passes.def b/gcc/passes.def
index 93879223a8b..6caf2ac5cda 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -153,6 +153,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_ipa_fn_summary);
   NEXT_PASS (pass_ipa_inline);
   NEXT_PASS (pass_ipa_pure_const);
+  NEXT_PASS (pass_ipa_reorder);
   NEXT_PASS (pass_ipa_free_fn_summary, false /* small_p */);
   NEXT_PASS (pass_ipa_reference);
   /* This pass needs to be scheduled after any IP code duplication.   */
diff --git a/gcc/timevar.def b/gcc/timevar.def
index 3551a433b1f..068c902424f 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -76,6 +76,7 @@ DEFTIMEVAR (TV_IPA_INHERITANCE       , "ipa inheritance graph")
 DEFTIMEVAR (TV_IPA_VIRTUAL_CALL      , "ipa virtual call target")
 DEFTIMEVAR (TV_IPA_DEVIRT	     , "ipa devirtualization")
 DEFTIMEVAR (TV_IPA_CONSTANT_PROP     , "ipa cp")
+DEFTIMEVAR (TV_IPA_REORDER	     , "ipa reorder")
 DEFTIMEVAR (TV_IPA_INLINING          , "ipa inlining heuristics")
 DEFTIMEVAR (TV_IPA_FNSPLIT           , "ipa function splitting")
 DEFTIMEVAR (TV_IPA_COMDATS	     , "ipa comdats")
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 7106eba54af..10bc4cc125d 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -500,6 +500,7 @@ extern ipa_opt_pass_d *make_pass_ipa_inline (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_free_lang_data (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_free_fn_summary (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_cp (gcc::context *ctxt);
+extern ipa_opt_pass_d *make_pass_ipa_reorder (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_icf (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_devirt (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_reference (gcc::context *ctxt);
diff --git a/gcc/varasm.c b/gcc/varasm.c
index a7c22523f9f..498b9ed59fd 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -601,6 +601,15 @@ default_function_section (tree decl, enum node_frequency freq,
   if (exit && freq != NODE_FREQUENCY_UNLIKELY_EXECUTED)
     return get_named_text_section (decl, ".text.exit", NULL);
 
+  cgraph_node *node = cgraph_node::get (decl);
+  if (node->text_sorted_order > 0 && freq != NODE_FREQUENCY_UNLIKELY_EXECUTED)
+    {
+      char section_name[64];
+      sprintf (section_name, ".text.sorted.%010d",
+	       node->text_sorted_order);
+      return get_named_text_section (decl, section_name, NULL);
+    }
+
   /* Group cold functions together, similarly for hot code.  */
   switch (freq)
     {
-- 
2.23.0


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]