This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH] profile feedback: -fprofile-use= and -fprofile-correction, correctness fixes and option semantic changes.
- From: "Seongbae Park (박성배, 朴成培)" <seongbae dot park at gmail dot com>
- To: gcc-patches <gcc-patches at gcc dot gnu dot org>, "Jan Hubicka" <hubicka at ucw dot cz>
- Date: Thu, 20 Mar 2008 13:58:31 -0700
- Subject: [PATCH] profile feedback: -fprofile-use= and -fprofile-correction, correctness fixes and option semantic changes.
Hi profile feedback maintainer(s?),
The following two patches try to make profile feedback directed optimization
more usable in different situations.
The first patch (profile-use) adds two new options and changes the
existing behavior
regarding the feedback datafile reading.
The first new option -fprofile-use= allows profile feedback data files
(gcda files) to be picked up from different directories than where
object files are.
This makes it more convenient to use FDO
for programs that are not run on the same system as the build system
and also makes it easier to maintain feedback datafiles separately
from the object files (which are usually placed in some scratch area).
The other new option -fprofile-datafile-relative-path prevents
gcc from adding getpwd() to the baked gcda file path in each object file.
This has two benefits: one is that the content of the object files are no more
dependent on the absolute path of the object files - hence
we have one less file location dependent contents in the object
making the compiler more "purely functional" - i.e. the compiler output
don't change unless any input files change.
Another benefit is that you don't need to use path dependent
GCOV_PREFIX/GCOV_PREFIX_STRIP options at the profile collection run time
to redirect the profile data files somewhere other than the original
object paths, and thus makes it easier to share the same profile
collected from object built from one particular path to be shared
by builds running on different paths.
Combined with the new -fprofile-use=,
this option makes it easier to switch between pre-collected feedback data files.
The patch also changes the behavior regarding when we read gcda files.
Currently, we *always* read gcda file if it exists where we expect it to be.
The patch makes it so that we'll read gcda files
if and only if -fprofile-use option is explicitly used.
This avoids the compiler from trying to access another file that
usually does not exist,
and even when exists, users may NOT want to use it.
Lastly, the patch disables writing of .gcno files unless
-ftest-coverage option is used.
The second patch (profile-correction) adds a new option -fprofile-correction
which will essentially smooth out any small inconsistencies caused by
missed counter updates in multithreaded profile collection.
Using atomic updates for counter increment adds significant overhead,
especially as the core count grows.
Making counters TLS adds more memory overhead
and more complicated interaction with threading system.
This ad-hoc correction algorithm is an interim solution
until we have more proper algorithm implemented
(based on variants of minimum cost network flow algorithms),
which we'll add for sampled profile support some of us are working on.
The patch ended up with more code change than I anticipated,
because of some refactoring, to avoid certain code duplication.
This patch also contains a fix for "Dead histogram" error
reported by visit_hist()/verify_histograms() in value-prof.c.
When a call with a value histogram got optimized by fold_stmt
and subsequently replaced by set_rhs(), the associated histogram
was not removed, leaving a histogram without a home statement.
The simplest fix is to remove the histogram whenever set_rhs()
replaces the whole statement - I think there are very few, if any,
meaningful and successful fold_stmt/set_rhs where keeping the existing
histogram will be useful. The alternative is to do more complicated
analysis of whether it's worth keeping the histogram and
move the histogram over to the new statement,
but I think that's not worth the effort.
After these two patches, I expect users to do:
BUILD PROFILE GENERATE: gcc -fprofile-generate
-fprofile-datafile-relative-path ...
RUN GENERATE: cd /somewhere/i/want/feedback/stored/; run-the-executable
PROFILE USE: gcc -fprofile-use=/somewhere/i/want/feedback/stored/
-fprofile-datafile-relative-path [optionally -fprofile-correction]
Profiledbootstrap'ed and regtested on i686.
Ok for mainline ?
Seongbae
Changelog entries for profile-use patch:
ChangeLog:
2008-03-20 Seongbae Park <seongbae.park@gmail.com>
* Makefile.tpl (.NOTPARALLEL): Serialize stageprofile libiberty.
gcc/ChangeLog:
2008-03-20 Seongbae Park <seongbae.park@gmail.com>
* common.opt (fprofile-datafile-relative-path, fprofile-use=):
New options
(fprofile-use): Add var flag_profile_use
* coverage.c (coverage_begin_output): Do not open a gcno file for output
unless we're doing a test coverage.
(build_gcov_info): Check the new flag
flag_profile_datafile_relative_path.
(coverage_init): Use profile_data_prefix.
Read profile counter only if flag_profile_use is set.
* opts.c (common_handle_option): New option fprofile-use=
* toplev.c (profile_data_prefix): New variable definition.
* toplev.h (profile_data_prefix): New declaration.
* doc/invoke.tex (Option Summary, Optimization Options):
Add new options.
gcc/testsuite/ChanceLog:
2008-03-20 Seongbae Park <seongbae.park@gmail.com>
* g++.db/bprob/bprob.exp: Do not check gcno files.
Use -fprofile-use for profile use.
* gcc.misc-tests/bprob.exp: Ditto.
* g++.dg/tree-pro/tree-prof.exp: Do not check gcno files.
* gcc.dg/matrix/matrix.exp: Ditto.
* gcc.dg/struct/struct-reorg.exp: Ditto.
* gcc.dg/tree-prof/tree-prof.exp: Ditto.
* gcc.dg/profile-datafile-relative-path-1.c: New test.
* gcc.dg/profile-datafile-relative-path-2.c: New test.
Changelog entries for profile-correction patch:
2008-03-20 Seongbae Park <seongbae.park@gmail.com>
* common.opt (fprofile-correction): New option.
* profile.c (bb_info): Add new fields is_pushed,
block, order and next_bi.
(sum_edge_counts, propagate_through_basic_block,
sort_edge_by_order, make_baic_block_flow_consistent,
make_flow_consistent, initialize_blok_aux_as_bb_info,
compute_all_counts, verify_consistency):
New functions.
(compute_branch_probabilities): Refactored.
* tree-profile.c (tree_gen_ic_func_profiler):
Clear __gcov_indirect_call_callee to 0 after the call
to ic profiler to prevent mis-attribution.
* tree-ssa-propagate.c (value-prof.h): New include.
(set_rhs): Remove the histogram when replacing a whole statement.
* value-prof.c (check_counter): Fix the counter
if flag_profile_correction is true.
(tree_divmod_fixed_value_transform, tree_mod_pow2_value_transform,
tree_mod_subtract_transform):
Follow check_counter parameter change.
diff -r 15f766bddc43 Makefile.tpl
--- a/Makefile.tpl Fri Mar 14 17:35:41 2008 -0700
+++ b/Makefile.tpl Wed Mar 19 14:42:50 2008 -0700
@@ -398,6 +398,8 @@ PICFLAG_FOR_TARGET =
# The first rule in the file had better be this one. Don't put any above it.
# This lives here to allow makefile fragments to contain dependencies.
all:
+
+.NOTPARALLEL: all-stageprofile-libiberty
#### host and target specific makefile fragments come in here.
@target_makefile_frag@
diff -r 15f766bddc43 gcc/common.opt
--- a/gcc/common.opt Fri Mar 14 17:35:41 2008 -0700
+++ b/gcc/common.opt Wed Mar 19 14:42:50 2008 -0700
@@ -815,9 +815,20 @@ Common
Common
Enable common options for generating profile info for profile feedback directed optimizations
+fprofile-datafile-relative-path
+Common Report Var(flag_profile_datafile_relative_path)
+Use the same path as the object file for the profile/coverage data file output paths.
+When set to no (default), if the path specified in the -o doesn't
+start with "/", the current directory will be prepended to the object file path.
+
fprofile-use
-Common
+Common Var(flag_profile_use)
Enable common options for performing profile feedback directed optimizations
+
+fprofile-use=
+Common Joined RejectNegative
+Enable common options for performing profile feedback directed optimizations.
+Pick up the profile feedback data from the given path.
fprofile-values
Common Report Var(flag_profile_values)
diff -r 15f766bddc43 gcc/coverage.c
--- a/gcc/coverage.c Fri Mar 14 17:35:41 2008 -0700
+++ b/gcc/coverage.c Wed Mar 19 14:42:50 2008 -0700
@@ -545,7 +545,9 @@ int
int
coverage_begin_output (void)
{
- if (no_coverage)
+ /* We don't need to output .gcno file unless we're under -ftest-coverage
+ (e.g. -fprofile-arcs/generate/use don't need .gcno to work). */
+ if (no_coverage || !flag_test_coverage)
return 0;
if (!bbg_function_announced)
@@ -842,7 +844,7 @@ build_gcov_info (void)
field = build_decl (FIELD_DECL, NULL_TREE, string_type);
TREE_CHAIN (field) = fields;
fields = field;
- filename = getpwd ();
+ filename = (flag_profile_datafile_relative_path) ? NULL : getpwd ();
filename = (filename && da_file_name[0] != '/'
? concat (filename, "/", da_file_name, NULL)
: da_file_name);
@@ -972,6 +974,8 @@ create_coverage (void)
cgraph_build_static_cdtor ('I', body, DEFAULT_INIT_PRIORITY);
}
+
+
/* Perform file-level initialization. Read in data file, generate name
of graph file. */
@@ -979,10 +983,22 @@ coverage_init (const char *filename)
coverage_init (const char *filename)
{
int len = strlen (filename);
+ /* + 1 for extra '/', in case prefix doesn't end with /. */
+ int prefix_len = (profile_data_prefix) ? strlen (profile_data_prefix) + 1 : 0;
/* Name of da file. */
- da_file_name = XNEWVEC (char, len + strlen (GCOV_DATA_SUFFIX) + 1);
- strcpy (da_file_name, filename);
+ da_file_name = XNEWVEC (char, len + strlen (GCOV_DATA_SUFFIX)
+ + prefix_len + 1);
+
+ if (profile_data_prefix)
+ {
+ strcpy (da_file_name, profile_data_prefix);
+ da_file_name[prefix_len - 1] = '/';
+ da_file_name[prefix_len] = 0;
+ }
+ else
+ da_file_name[0] = 0;
+ strcat (da_file_name, filename);
strcat (da_file_name, GCOV_DATA_SUFFIX);
/* Name of bbg file. */
@@ -990,7 +1006,8 @@ coverage_init (const char *filename)
strcpy (bbg_file_name, filename);
strcat (bbg_file_name, GCOV_NOTE_SUFFIX);
- read_counts_file ();
+ if (flag_profile_use)
+ read_counts_file ();
}
/* Performs file-level cleanup. Close graph file, generate coverage
diff -r 15f766bddc43 gcc/doc/invoke.texi
--- a/gcc/doc/invoke.texi Fri Mar 14 17:35:41 2008 -0700
+++ b/gcc/doc/invoke.texi Wed Mar 19 14:42:50 2008 -0700
@@ -341,8 +341,9 @@ Objective-C and Objective-C++ Dialects}.
-fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol
-fomit-frame-pointer -foptimize-register-move -foptimize-sibling-calls @gol
-fpeel-loops -fpredictive-commoning -fprefetch-loop-arrays @gol
--fprofile-generate -fprofile-use -fprofile-values -freciprocal-math @gol
--fregmove -frename-registers -freorder-blocks @gol
+-fprofile-datafile-relative-path -fprofile-generate @gol
+-fprofile-use -fprofile-use=@var{path} -fprofile-values @gol
+-freciprocal-math -fregmove -frename-registers -freorder-blocks @gol
-freorder-blocks-and-partition -freorder-functions @gol
-frerun-cse-after-loop -freschedule-modulo-scheduled-loops @gol
-frounding-math -frtl-abstract-sequences -fsched2-use-superblocks @gol
@@ -6364,6 +6365,17 @@ and occasionally eliminate the copy.
Enabled at levels @option{-O}, @option{-O2}, @option{-O3}, @option{-Os}.
+@item -fprofile-datafile-relative-path
+@opindex fprofie-datafile-relative-path
+
+Use the object file path as the gcda file output path as is.
+When this is off and the specified path does not begin with '/',
+then the compiler will prepend the current directory to the object file path.
+Enabling this flag will disable that behavior.
+The user has to make sure that the object file paths
+in a single executable are all unique,
+otherwise the output gcda files with the same paths will conflict.
+
@item -fprofile-generate
@opindex fprofile-generate
@@ -6375,6 +6387,7 @@ The following options are enabled: @code
The following options are enabled: @code{-fprofile-arcs}, @code{-fprofile-values}, @code{-fvpt}.
@item -fprofile-use
+@itemx -fprofile-use=@var{path}
@opindex fprofile-use
Enable profile feedback directed optimizations, and optimizations
generally profitable only with profile feedback available.
@@ -6386,6 +6399,9 @@ match the source code. This error can b
match the source code. This error can be turned into a warning by using
@option{-Wcoverage-mismatch}. Note this may result in poorly optimized
code.
+
+If @var{path} is specified, GCC will look at the @var{path} to find
+the profile feedback data files.
@end table
The following options control compiler behavior regarding floating
diff -r 15f766bddc43 gcc/opts.c
--- a/gcc/opts.c Fri Mar 14 17:35:41 2008 -0700
+++ b/gcc/opts.c Wed Mar 19 14:42:50 2008 -0700
@@ -1716,6 +1716,12 @@ common_handle_option (size_t scode, cons
flag_inline_functions_set = true;
break;
+ case OPT_fprofile_use_:
+ profile_data_prefix = xstrdup (arg);
+ flag_profile_use = true;
+ value = true;
+ /* No break - same flag processing as -fprofile-use. */
+
case OPT_fprofile_use:
if (!flag_branch_probabilities_set)
flag_branch_probabilities = value;
diff -r 15f766bddc43 gcc/testsuite/g++.dg/bprob/bprob.exp
--- a/gcc/testsuite/g++.dg/bprob/bprob.exp Fri Mar 14 17:35:41 2008 -0700
+++ b/gcc/testsuite/g++.dg/bprob/bprob.exp Wed Mar 19 14:42:50 2008 -0700
@@ -27,7 +27,7 @@ if { ![check_profiling_available "-fprof
# The procedures in profopt.exp need these parameters.
set tool g++
-set prof_ext "gcda gcno"
+set prof_ext "gcda"
if $tracelevel then {
strace $tracelevel
@@ -51,7 +51,7 @@ load_lib profopt.exp
load_lib profopt.exp
set profile_options "-fprofile-arcs"
-set feedback_options "-fbranch-probabilities"
+set feedback_options "-fprofile-use"
# Main loop.
foreach profile_option $profile_options feedback_option $feedback_options {
diff -r 15f766bddc43 gcc/testsuite/g++.dg/tree-prof/tree-prof.exp
--- a/gcc/testsuite/g++.dg/tree-prof/tree-prof.exp Fri Mar 14 17:35:41 2008 -0700
+++ b/gcc/testsuite/g++.dg/tree-prof/tree-prof.exp Wed Mar 19 14:42:50 2008 -0700
@@ -26,7 +26,7 @@ if { ![check_profiling_available ""] } {
# The procedures in profopt.exp need these parameters.
set tool g++
-set prof_ext "gcda gcno"
+set prof_ext "gcda"
# Override the list defined in profopt.exp.
set PROFOPT_OPTIONS [list {}]
diff -r 15f766bddc43 gcc/testsuite/gcc.dg/matrix/matrix.exp
--- a/gcc/testsuite/gcc.dg/matrix/matrix.exp Fri Mar 14 17:35:41 2008 -0700
+++ b/gcc/testsuite/gcc.dg/matrix/matrix.exp Wed Mar 19 14:42:51 2008 -0700
@@ -36,7 +36,7 @@ if { ![check_profiling_available ""] } {
# The procedures in profopt.exp need these parameters.
set tool gcc
-set prof_ext "gcda gcno"
+set prof_ext "gcda"
# Override the list defined in profopt.exp.
set PROFOPT_OPTIONS [list {}]
diff -r 15f766bddc43 gcc/testsuite/gcc.dg/profile-datafile-relative-path-1.c
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/gcc/testsuite/gcc.dg/profile-datafile-relative-path-1.c Wed Mar 19 14:42:51 2008 -0700
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fprofile-generate -fprofile-datafile-relative-path" } */
+/* { dg-final { scan-assembler "\"profile-datafile-relative-path-1.gcda\"" } } */
+
+
+int
+main(void)
+{
+ return 0;
+}
+
+/* { dg-final { cleanup-coverage-files } } */
diff -r 15f766bddc43 gcc/testsuite/gcc.dg/profile-datafile-relative-path-2.c
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/gcc/testsuite/gcc.dg/profile-datafile-relative-path-2.c Wed Mar 19 14:42:51 2008 -0700
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fprofile-generate" } */
+/* { dg-final { scan-assembler "/profile-datafile-relative-path-2.gcda" } } */
+
+int
+main(void)
+{
+ return 0;
+}
+
+/* { dg-final { cleanup-coverage-files } } */
diff -r 15f766bddc43 gcc/testsuite/gcc.dg/struct/struct-reorg.exp
--- a/gcc/testsuite/gcc.dg/struct/struct-reorg.exp Fri Mar 14 17:35:41 2008 -0700
+++ b/gcc/testsuite/gcc.dg/struct/struct-reorg.exp Wed Mar 19 14:42:51 2008 -0700
@@ -36,7 +36,7 @@ if { ![check_profiling_available ""] } {
# The procedures in profopt.exp need these parameters.
set tool gcc
-set prof_ext "gcda gcno"
+set prof_ext "gcda"
# Override the list defined in profopt.exp.
set PROFOPT_OPTIONS [list {}]
diff -r 15f766bddc43 gcc/testsuite/gcc.dg/tree-prof/tree-prof.exp
--- a/gcc/testsuite/gcc.dg/tree-prof/tree-prof.exp Fri Mar 14 17:35:41 2008 -0700
+++ b/gcc/testsuite/gcc.dg/tree-prof/tree-prof.exp Wed Mar 19 14:42:51 2008 -0700
@@ -26,7 +26,7 @@ if { ![check_profiling_available ""] } {
# The procedures in profopt.exp need these parameters.
set tool gcc
-set prof_ext "gcda gcno"
+set prof_ext "gcda"
# Override the list defined in profopt.exp.
set PROFOPT_OPTIONS [list {}]
diff -r 15f766bddc43 gcc/testsuite/gcc.misc-tests/bprob.exp
--- a/gcc/testsuite/gcc.misc-tests/bprob.exp Fri Mar 14 17:35:41 2008 -0700
+++ b/gcc/testsuite/gcc.misc-tests/bprob.exp Wed Mar 19 14:42:51 2008 -0700
@@ -27,7 +27,7 @@ if { ![check_profiling_available "-fprof
# The procedures in profopt.exp need these parameters.
set tool gcc
-set prof_ext "gcda gcno"
+set prof_ext "gcda"
set perf_ext tim
# Override the list defined in profopt.exp.
@@ -48,7 +48,7 @@ load_lib profopt.exp
load_lib profopt.exp
set profile_options "-fprofile-arcs"
-set feedback_options "-fbranch-probabilities"
+set feedback_options "-fprofile-use"
foreach profile_option $profile_options feedback_option $feedback_options {
foreach src [lsort [glob -nocomplain $srcdir/$subdir/bprob-*.c]] {
diff -r 15f766bddc43 gcc/toplev.c
--- a/gcc/toplev.c Fri Mar 14 17:35:41 2008 -0700
+++ b/gcc/toplev.c Wed Mar 19 14:42:51 2008 -0700
@@ -151,6 +151,9 @@ const char *dump_base_name;
/* Name to use as a base for auxiliary output files. */
const char *aux_base_name;
+
+/* Prefix for profile data files */
+const char *profile_data_prefix;
/* A mask of target_flags that includes bit X if X was set or cleared
on the command line. */
diff -r 15f766bddc43 gcc/toplev.h
--- a/gcc/toplev.h Fri Mar 14 17:35:41 2008 -0700
+++ b/gcc/toplev.h Wed Mar 19 14:42:51 2008 -0700
@@ -111,6 +111,7 @@ extern const char *dump_base_name;
extern const char *dump_base_name;
extern const char *aux_base_name;
extern const char *aux_info_file_name;
+extern const char *profile_data_prefix;
extern const char *asm_file_name;
extern bool exit_after_options;
diff -r 986583ed535e gcc/common.opt
--- a/gcc/common.opt Wed Mar 19 14:42:54 2008 -0700
+++ b/gcc/common.opt Thu Mar 20 10:01:56 2008 -0700
@@ -811,6 +811,10 @@ Common Report Var(profile_arc_flag)
Common Report Var(profile_arc_flag)
Insert arc-based program profiling code
+fprofile-correction
+Common Report Var(flag_profile_correction)
+Enable correction of flow inconsistent profile data input
+
fprofile-generate
Common
Enable common options for generating profile info for profile feedback directed optimizations
diff -r 986583ed535e gcc/profile.c
--- a/gcc/profile.c Wed Mar 19 14:42:54 2008 -0700
+++ b/gcc/profile.c Thu Mar 20 10:01:56 2008 -0700
@@ -85,8 +85,10 @@ struct edge_info {
};
struct bb_info {
- unsigned int count_valid : 1;
+ unsigned int count_valid : 1; /* 1 iff the count is valid. */
+ int order;
+ basic_block block;
/* Number of successor and predecessor edges. */
gcov_type succ_count;
gcov_type pred_count;
@@ -279,22 +281,283 @@ get_exec_counts (void)
}
-/* Compute the branch probabilities for the various branches.
- Annotate them accordingly. */
+
+/* Return the sum of count of all edges of the given "edges" set */
+
+static gcov_type
+sum_edge_counts (VEC(edge,gc) *edges)
+{
+ edge e;
+ edge_iterator ei;
+ gcov_type total = 0;
+ FOR_EACH_EDGE (e, ei, edges)
+ {
+ if (!EDGE_INFO (e)->ignore)
+ {
+ total += e->count;
+ gcc_assert ( EDGE_INFO (e)->count_valid );
+ }
+ }
+ return total;
+}
+
+
+/* Propagate the count to one edge.
+ BB is the basic block to propagate the flow.
+ FROM_EDGES is a set of edges where all elements have valid counters
+ - hence it's either BB->succs or BB->preds.
+ TO_EDGES is the opposite of FROM_EDGES
+ - if FROM_EDGES is BB->succs, TO_EDGES has to be BB->preds and vice versa.
+
+ If TO_EDGES has only one edge that doesn't have a valid counter,
+ then we calculate the count of that edge by the conservation rule
+ and make the edge valid.
+
+ Return 1 if the newly calculated edge count was negative,
+ in which case the count is reset to 0 before returning 1.
+ Otherwise return 0.
+
+ When the edge is newly marked valid, then succ_count and pred_count
+ of src and dest blocks of the edge is decremented. */
+
+static int
+propagate_through_basic_block (basic_block bb,
+ VEC(edge,gc) *from_edges,
+ VEC(edge,gc) *to_edges)
+{
+ edge e;
+ edge_iterator ei;
+ edge single;
+ gcov_type out_flow = 0;
+ gcov_type in_flow;
+ int count = 0;
+
+ in_flow = sum_edge_counts (from_edges);
+
+ /* One of the counts will be invalid, but it is zero,
+ so adding it here doesn't hurt. */
+ FOR_EACH_EDGE (e, ei, to_edges)
+ {
+ out_flow += e->count;
+ }
+
+ /* Set SINGLE to the only edge that doesn't have a valid counter. */
+ single = 0;
+ count = 0;
+ FOR_EACH_EDGE (e, ei, to_edges)
+ {
+ if (! EDGE_INFO (e)->count_valid && ! EDGE_INFO (e)->ignore)
+ {
+ single = e;
+ count++;
+ }
+ }
+
+ bb->count = in_flow;
+ if (count == 1)
+ {
+ /* Calculate count for remaining edge by conservation. */
+ single->count = in_flow - out_flow;
+ EDGE_INFO (single)->count_valid = 1;
+ BB_INFO (single->src)->succ_count--;
+ BB_INFO (single->dest)->pred_count--;
+
+ if (single->count < 0)
+ {
+ single->count = 0;
+ /* Blocks with vfork/setjump calls have effectively
+ an alternative entry into the block, but we don't have
+ extra incoming edges but outgoing fake edges to EXIT
+ in those cases, which means we could end up with
+ negative edge count on the fake edge.
+ When that happens, we simply put 0 as the count
+ and leave this inconsistent. */
+ if (single->dest == EXIT_BLOCK_PTR && block_ends_with_call_p (bb))
+ return 0;
+
+ return 1;
+ }
+ }
+
+ return 0;
+}
+
+
+/* Helper for qsort to sort edge by predecessor's order. */
+
+static int
+sort_edge_by_pred_order (const void *a, const void *b)
+{
+ edge ea = *(const edge*)a;
+ edge eb = *(const edge*)b;
+
+ return BB_INFO (ea->src)->order - BB_INFO (eb->src)->order;
+}
+
+
+/* Make the basic block and its cfg edges to have count
+ that are flow-consistent, i.e.
+ basic block count
+ == sum of counts of all incoming edges
+ == sum of counts of all outgoing edges.
+
+ To do that, we assume the current count is correct,
+ and adjust the counts of TO_EDGES to make it consistent. */
static void
-compute_branch_probabilities (void)
+make_basic_block_flow_consistent (basic_block bb, VEC(edge,gc) *to_edges)
+{
+ edge e;
+ edge_iterator ei;
+ edge *edges;
+ gcov_type sum = 0;
+ int n = 0;
+ int i;
+
+ n = VEC_length(edge, to_edges);
+ edges = xmalloc (sizeof (edge) * n);
+ i = 0;
+ FOR_EACH_EDGE (e, ei, to_edges)
+ {
+ edges[i++] = e;
+ }
+ qsort (edges, n, sizeof (edge), sort_edge_by_pred_order);
+
+ for (i = 0; i < n - 1; i++)
+ {
+ edge e = edges[i];
+ if (sum + e->count > bb->count)
+ {
+ break;
+ }
+ sum += e->count;
+ }
+
+ /* I points to either the last edge of edges array
+ or the edge whose count will make the outflow exceeds the inflow.
+ EDGES[I] is assigned the remainder of count.
+ However, if there are multiple edges remaining,
+ different distribution should be possible as well (e.g. equal distribution
+ or weighted based on the original weight of edges).
+ Since this is inexact anyway, just do the simple
+ - assign all remaining count to the first edge and 0 to the rest. */
+ edges[i++]->count = bb->count - sum;
+ for(; i < n; i++)
+ edges[i]->count = 0;
+
+ free (edges);
+}
+
+
+/* Make the block/edge counts of the entire CFG flow-consistent.
+ This is one super hack. The problem is difficult to solve well
+ but we rely on compute_all_counts()
+ to leave no negative number and repeatedly apply the backward propagation
+ of counts, until everything's consistent. */
+
+static void
+make_flow_consistent (void)
+{
+ int consistent = 0;
+ int i;
+ int *block_postorder = XNEWVEC (int, last_basic_block);
+ int n_blocks;
+ basic_block *blocks = xmalloc (n_basic_blocks * sizeof (basic_block));
+ basic_block bb;
+
+ n_blocks = post_order_compute (block_postorder, 1, 0);
+
+ /* Sanity check. */
+ gcc_assert (n_blocks == n_basic_blocks);
+
+ for (i = 0; i < n_blocks; i++)
+ BB_INFO (BASIC_BLOCK (block_postorder[i]))->order = i;
+
+ while (consistent == 0)
+ {
+ consistent = 1;
+ for (i = 0; i < n_blocks; i++)
+ {
+ basic_block bb = BASIC_BLOCK (block_postorder[i]);
+ gcov_type succsum;
+
+ succsum = sum_edge_counts (bb->succs);
+ if (bb != EXIT_BLOCK_PTR && bb->count != succsum)
+ {
+ bb->count = succsum;
+ consistent = 0;
+ }
+ if (bb != ENTRY_BLOCK_PTR && bb->count != sum_edge_counts (bb->preds))
+ {
+ make_basic_block_flow_consistent (bb, bb->preds);
+ consistent = 0;
+ }
+ }
+ }
+
+ EXIT_BLOCK_PTR->count = sum_edge_counts (EXIT_BLOCK_PTR->preds);
+
+ free (blocks);
+
+ FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR->next_bb,
+ EXIT_BLOCK_PTR->prev_bb, next_bb)
+ {
+ gcc_assert (sum_edge_counts (bb->preds) == bb->count);
+ gcc_assert (sum_edge_counts (bb->succs) == bb->count);
+ }
+ gcc_assert (sum_edge_counts (ENTRY_BLOCK_PTR->succs)
+ == ENTRY_BLOCK_PTR->count);
+ gcc_assert (sum_edge_counts (EXIT_BLOCK_PTR->preds)
+ == EXIT_BLOCK_PTR->count);
+}
+
+
+/* Allocate and initialize bb_info for each basic block. */
+
+static void
+initialize_block_aux_as_bb_info (void)
{
basic_block bb;
- int i;
+
+ alloc_aux_for_blocks (sizeof (struct bb_info));
+
+ FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
+ {
+ edge e;
+ edge_iterator ei;
+
+ FOR_EACH_EDGE (e, ei, bb->succs)
+ if (!EDGE_INFO (e)->ignore)
+ BB_INFO (bb)->succ_count++;
+ FOR_EACH_EDGE (e, ei, bb->preds)
+ if (!EDGE_INFO (e)->ignore)
+ BB_INFO (bb)->pred_count++;
+
+ BB_INFO (bb)->block = bb;
+ BB_INFO (bb)->order = 0;
+ BB_INFO (bb)->count_valid = 0;
+ }
+
+ /* Avoid predicting entry on exit nodes. */
+ BB_INFO (EXIT_BLOCK_PTR)->succ_count = 2;
+ BB_INFO (ENTRY_BLOCK_PTR)->pred_count = 2;
+}
+
+
+/* Compute the execution count of edges and blocks
+ based on the profile data.
+ Return non-zero if there's any flow-inconsistency in the profile data. */
+
+static int
+compute_all_counts (void)
+{
+ basic_block bb;
+ VEC (basic_block, heap) *bb_queue = NULL;
+ int inconsistent = 0;
+ int passes = 0;
int num_edges = 0;
- int changes;
- int passes;
- int hist_br_prob[20];
- int num_never_executed;
- int num_branches;
+ int exec_counts_pos = 0;
gcov_type *exec_counts = get_exec_counts ();
- int exec_counts_pos = 0;
/* Very simple sanity checks so we catch bugs in our profiling code. */
if (profile_info)
@@ -312,32 +575,8 @@ compute_branch_probabilities (void)
}
}
- /* Attach extra info block to each bb. */
-
- alloc_aux_for_blocks (sizeof (struct bb_info));
- FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
- {
- edge e;
- edge_iterator ei;
-
- FOR_EACH_EDGE (e, ei, bb->succs)
- if (!EDGE_INFO (e)->ignore)
- BB_INFO (bb)->succ_count++;
- FOR_EACH_EDGE (e, ei, bb->preds)
- if (!EDGE_INFO (e)->ignore)
- BB_INFO (bb)->pred_count++;
- }
-
- /* Avoid predicting entry on exit nodes. */
- BB_INFO (EXIT_BLOCK_PTR)->succ_count = 2;
- BB_INFO (ENTRY_BLOCK_PTR)->pred_count = 2;
-
- /* For each edge not on the spanning tree, set its execution count from
- the .da file. */
-
/* The first count in the .da file is the number of times that the function
was entered. This is the exec_count for block zero. */
-
FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
{
edge e;
@@ -346,22 +585,41 @@ compute_branch_probabilities (void)
FOR_EACH_EDGE (e, ei, bb->succs)
if (!EDGE_INFO (e)->ignore && !EDGE_INFO (e)->on_tree)
{
+ gcov_type new_count;
+
num_edges++;
- if (exec_counts)
- {
- e->count = exec_counts[exec_counts_pos++];
- if (e->count > profile_info->sum_max)
- {
- error ("corrupted profile info: edge from %i to %i exceeds maximal count",
- bb->index, e->dest->index);
- }
- }
- else
- e->count = 0;
+ /* Get the count from the profile data */
+ new_count = (exec_counts) ? exec_counts[exec_counts_pos++] : 0;
+ if (profile_info != NULL && new_count > profile_info->sum_max)
+ error ("corrupted profile info: edge from %i to %i exceeds maximal count",
+ bb->index, e->dest->index);
- EDGE_INFO (e)->count_valid = 1;
- BB_INFO (bb)->succ_count--;
- BB_INFO (e->dest)->pred_count--;
+ /* Set the count and mark the edge count valid */
+ if (EDGE_INFO (e)->count_valid == 0)
+ {
+ EDGE_INFO (e)->count_valid = 1;
+ BB_INFO (e->src)->succ_count--;
+ BB_INFO (e->dest)->pred_count--;
+ }
+ else if (e->count != new_count)
+ inconsistent = 1;
+
+ e->count = new_count;
+
+ /* Propagate the count if possible */
+ if (BB_INFO (e->src)->succ_count == 0)
+ {
+ inconsistent |=
+ propagate_through_basic_block (e->src, e->src->succs,
+ e->src->preds);
+ }
+ if (BB_INFO (e->dest)->pred_count == 0)
+ {
+ inconsistent |=
+ propagate_through_basic_block (e->dest, e->dest->preds,
+ e->dest->succs);
+ }
+
if (dump_file)
{
fprintf (dump_file, "\nRead edge from %i to %i, count:",
@@ -370,123 +628,45 @@ compute_branch_probabilities (void)
(HOST_WIDEST_INT) e->count);
}
}
+
+ if (BB_INFO (bb)->count_valid == 0)
+ {
+ /* Add the block to the work list
+ only if the block doesn't have a valid count. */
+ VEC_safe_insert (basic_block, heap, bb_queue, 0, bb);
+ }
}
if (dump_file)
fprintf (dump_file, "\n%d edge counts read\n", num_edges);
- /* For every block in the file,
- - if every exit/entrance edge has a known count, then set the block count
- - if the block count is known, and every exit/entrance edge but one has
- a known execution count, then set the count of the remaining edge
+ /* Visit blocks by BFS order (through work queue)
+ * and propagate the count throughout the entire CFG. */
+ passes = 0;
+ while (VEC_length (basic_block, bb_queue) > 0)
+ {
+ struct bb_info *bi;
+ bb = VEC_pop (basic_block, bb_queue);
+ bi = BB_INFO (bb);
+ passes++;
- As edge counts are set, decrement the succ/pred count, but don't delete
- the edge, that way we can easily tell when all edges are known, or only
- one edge is unknown. */
+ if (bi->count_valid == 0 && bi->succ_count == 0)
+ {
+ inconsistent |=
+ propagate_through_basic_block (bb, bb->succs, bb->preds);
+ }
+ else if (bi->count_valid == 0 && bi->pred_count == 0)
+ {
+ inconsistent |=
+ propagate_through_basic_block (bb, bb->preds, bb->succs);
+ }
- /* The order that the basic blocks are iterated through is important.
- Since the code that finds spanning trees starts with block 0, low numbered
- edges are put on the spanning tree in preference to high numbered edges.
- Hence, most instrumented edges are at the end. Graph solving works much
- faster if we propagate numbers from the end to the start.
+ /* If we are not done yet, push it to the queue. */
+ if ((bb != EXIT_BLOCK_PTR && bi->succ_count > 0)
+ || (bb != ENTRY_BLOCK_PTR && bi->pred_count > 0))
+ VEC_safe_insert (basic_block, heap, bb_queue, 0, bb);
+ }
- This takes an average of slightly more than 3 passes. */
-
- changes = 1;
- passes = 0;
- while (changes)
- {
- passes++;
- changes = 0;
- FOR_BB_BETWEEN (bb, EXIT_BLOCK_PTR, NULL, prev_bb)
- {
- struct bb_info *bi = BB_INFO (bb);
- if (! bi->count_valid)
- {
- if (bi->succ_count == 0)
- {
- edge e;
- edge_iterator ei;
- gcov_type total = 0;
-
- FOR_EACH_EDGE (e, ei, bb->succs)
- total += e->count;
- bb->count = total;
- bi->count_valid = 1;
- changes = 1;
- }
- else if (bi->pred_count == 0)
- {
- edge e;
- edge_iterator ei;
- gcov_type total = 0;
-
- FOR_EACH_EDGE (e, ei, bb->preds)
- total += e->count;
- bb->count = total;
- bi->count_valid = 1;
- changes = 1;
- }
- }
- if (bi->count_valid)
- {
- if (bi->succ_count == 1)
- {
- edge e;
- edge_iterator ei;
- gcov_type total = 0;
-
- /* One of the counts will be invalid, but it is zero,
- so adding it in also doesn't hurt. */
- FOR_EACH_EDGE (e, ei, bb->succs)
- total += e->count;
-
- /* Seedgeh for the invalid edge, and set its count. */
- FOR_EACH_EDGE (e, ei, bb->succs)
- if (! EDGE_INFO (e)->count_valid && ! EDGE_INFO (e)->ignore)
- break;
-
- /* Calculate count for remaining edge by conservation. */
- total = bb->count - total;
-
- gcc_assert (e);
- EDGE_INFO (e)->count_valid = 1;
- e->count = total;
- bi->succ_count--;
-
- BB_INFO (e->dest)->pred_count--;
- changes = 1;
- }
- if (bi->pred_count == 1)
- {
- edge e;
- edge_iterator ei;
- gcov_type total = 0;
-
- /* One of the counts will be invalid, but it is zero,
- so adding it in also doesn't hurt. */
- FOR_EACH_EDGE (e, ei, bb->preds)
- total += e->count;
-
- /* Search for the invalid edge, and set its count. */
- FOR_EACH_EDGE (e, ei, bb->preds)
- if (!EDGE_INFO (e)->count_valid && !EDGE_INFO (e)->ignore)
- break;
-
- /* Calculate count for remaining edge by conservation. */
- total = bb->count - total + e->count;
-
- gcc_assert (e);
- EDGE_INFO (e)->count_valid = 1;
- e->count = total;
- bi->pred_count--;
-
- BB_INFO (e->src)->succ_count--;
- changes = 1;
- }
- }
- }
- }
if (dump_file)
dump_flow_info (dump_file, dump_flags);
@@ -494,11 +674,73 @@ compute_branch_probabilities (void)
if (dump_file)
fprintf (dump_file, "Graph solving took %d passes.\n\n", passes);
- /* If the graph has been correctly solved, every block will have a
- succ and pred count of zero. */
+ return inconsistent;
+}
+
+
+/* Check consistency.
+ Return non-zero if inconsistency is found. */
+
+static int
+verify_consistency (void)
+{
+ int inconsistent = 0;
+ basic_block bb;
FOR_EACH_BB (bb)
{
+ /* If the graph has been correctly solved, every block will have a
+ succ and pred count of zero. */
gcc_assert (!BB_INFO (bb)->succ_count && !BB_INFO (bb)->pred_count);
+
+ if ( bb->count != sum_edge_counts (bb->preds)
+ || (bb->count != sum_edge_counts (bb->succs) &&
+ !(find_edge (bb, EXIT_BLOCK_PTR) != NULL &&
+ block_ends_with_call_p (bb))))
+ inconsistent = 1;
+ }
+
+ return inconsistent;
+}
+
+
+/* Compute the branch probabilities for the various branches.
+ Annotate them accordingly. */
+
+static void
+compute_branch_probabilities (void)
+{
+ basic_block bb;
+ int i;
+ int hist_br_prob[20];
+ int num_never_executed;
+ int num_branches;
+ int inconsistent = 0;
+
+ /* Attach extra info block to each bb. */
+ initialize_block_aux_as_bb_info ();
+
+ /* For each edge not on the spanning tree, set its execution count from
+ the .da file. */
+ inconsistent = compute_all_counts ();
+
+ /* Verify consistency */
+ inconsistent = verify_consistency () || inconsistent;
+
+ if (inconsistent)
+ {
+ if (flag_profile_correction)
+ {
+ /* Inconsistency detected. Make it flow-consistent. */
+ static int informed = 0;
+ if (informed == 0)
+ {
+ informed = 1;
+ inform ("correcting inconsistent profile data");
+ }
+ make_flow_consistent ();
+ }
+ else
+ error ("corrupted profile info: profile data is not flow-consistent");
}
/* For every edge, calculate its branch probability and add a reg_note
@@ -1251,4 +1493,3 @@ tree_register_profile_hooks (void)
gcc_assert (current_ir_type () == IR_GIMPLE);
profile_hooks = &tree_profile_hooks;
}
-
diff -r 986583ed535e gcc/tree-profile.c
--- a/gcc/tree-profile.c Wed Mar 19 14:42:54 2008 -0700
+++ b/gcc/tree-profile.c Thu Mar 20 10:01:56 2008 -0700
@@ -308,7 +308,7 @@ tree_gen_ic_func_profiler (void)
edge e;
basic_block bb;
edge_iterator ei;
- tree stmt1;
+ tree stmt1, stmt2;
tree tree_uid, cur_func;
if (flag_unit_at_a_time)
@@ -321,8 +321,11 @@ tree_gen_ic_func_profiler (void)
FOR_EACH_EDGE (e, ei, ENTRY_BLOCK_PTR->succs)
{
+ tree void0;
+
bb = split_edge (e);
bsi = bsi_start (bb);
+
cur_func = force_gimple_operand_bsi (&bsi,
build_addr (current_function_decl,
current_function_decl),
@@ -335,6 +338,16 @@ tree_gen_ic_func_profiler (void)
cur_func,
ic_void_ptr_var);
bsi_insert_after (&bsi, stmt1, BSI_NEW_STMT);
+
+ gcc_assert (EDGE_COUNT (bb->succs) == 1);
+ bb = split_edge (EDGE_I (bb->succs, 0));
+ bsi = bsi_start (bb);
+ /* Set __gcov_indirect_call_callee to 0,
+ so that calls from other modules won't get misattributed
+ to the last caller of the current callee. */
+ void0 = build_int_cst (build_pointer_type (void_type_node), 0);
+ stmt2 = build_gimple_modify_stmt (ic_void_ptr_var, void0);
+ bsi_insert_after (&bsi, stmt2, BSI_NEW_STMT);
}
}
diff -r 986583ed535e gcc/tree-ssa-propagate.c
--- a/gcc/tree-ssa-propagate.c Wed Mar 19 14:42:54 2008 -0700
+++ b/gcc/tree-ssa-propagate.c Thu Mar 20 10:01:56 2008 -0700
@@ -40,6 +40,7 @@
#include "langhooks.h"
#include "varray.h"
#include "vec.h"
+#include "value-prof.h"
/* This file implements a generic value propagation engine based on
the same propagation used by the SSA-CCP algorithm [1].
@@ -734,6 +735,7 @@ set_rhs (tree *stmt_p, tree expr)
/* Replace the whole statement with EXPR. If EXPR has no side
effects, then replace *STMT_P with an empty statement. */
ann = stmt_ann (stmt);
+ gimple_remove_stmt_histograms (cfun, stmt);
*stmt_p = TREE_SIDE_EFFECTS (expr) ? expr : build_empty_stmt ();
(*stmt_p)->base.ann = (tree_ann_t) ann;
diff -r 986583ed535e gcc/value-prof.c
--- a/gcc/value-prof.c Wed Mar 19 14:42:54 2008 -0700
+++ b/gcc/value-prof.c Thu Mar 20 10:01:56 2008 -0700
@@ -423,17 +423,31 @@ free_histograms (void)
of basic block. Report it as error rather than internal error as it might
mean that user has misused the profile somehow. */
static bool
-check_counter (tree stmt, const char * name, gcov_type all, gcov_type bb_count)
+check_counter (tree stmt, const char * name,
+ gcov_type *count, gcov_type *all, gcov_type bb_count)
{
- if (all != bb_count)
+ if (*all != bb_count || *count > *all)
{
location_t * locus;
locus = (stmt != NULL && EXPR_HAS_LOCATION (stmt)
? EXPR_LOCUS (stmt)
: &DECL_SOURCE_LOCATION (current_function_decl));
- error ("%HCorrupted value profile: %s profiler overall count (%d) does not match BB count (%d)",
- locus, name, (int)all, (int)bb_count);
- return true;
+ if (flag_profile_correction)
+ {
+ inform ("%HCorrecting inconsistent value profile: "
+ "%s profiler count is inconsistent",
+ locus, name);
+ *all = bb_count;
+ if (*count > *all) *count = *all;
+ return false;
+ }
+ else
+ {
+ error ("%HCorrupted value profile: "
+ "%s profiler count is inconsistent",
+ locus, name);
+ return true;
+ }
}
return false;
}
@@ -627,7 +641,7 @@ tree_divmod_fixed_value_transform (tree
|| !maybe_hot_bb_p (bb_for_stmt (stmt)))
return false;
- if (check_counter (stmt, "value", all, bb_for_stmt (stmt)->count))
+ if (check_counter (stmt, "value", &count, &all, bb_for_stmt (stmt)->count))
return false;
/* Compute probability of taking the optimal path. */
@@ -794,7 +808,7 @@ tree_mod_pow2_value_transform (tree stmt
/* Compute probability of taking the optimal path. */
all = count + wrong_values;
- if (check_counter (stmt, "pow2", all, bb_for_stmt (stmt)->count))
+ if (check_counter (stmt, "pow2", &count, &all, bb_for_stmt (stmt)->count))
return false;
prob = (count * REG_BR_PROB_BASE + all / 2) / all;
@@ -969,11 +983,16 @@ tree_mod_subtract_transform (tree stmt)
count2 = histogram->hvalue.counters[1];
/* Compute probability of taking the optimal path. */
- if (check_counter (stmt, "interval", all, bb_for_stmt (stmt)->count))
+ if (check_counter (stmt, "interval", &count1, &all, bb_for_stmt (stmt)->count))
{
gimple_remove_histogram_value (cfun, stmt, histogram);
return false;
}
+
+ if (flag_profile_correction && count1 + count2 > all)
+ all = count1 + count2;
+
+ gcc_assert (count1 + count2 <= all);
/* We require that we use just subtractions in at least 50% of all
evaluations. */
@@ -1153,7 +1172,7 @@ tree_ic_transform (tree stmt)
tree_ic_transform (tree stmt)
{
histogram_value histogram;
- gcov_type val, count, all;
+ gcov_type val, count, all, bb_all;
int prob;
tree call, callee, modify;
struct cgraph_node *direct_call;
@@ -1178,6 +1197,14 @@ tree_ic_transform (tree stmt)
gimple_remove_histogram_value (cfun, stmt, histogram);
if (4 * count <= 3 * all)
+ return false;
+
+ bb_all = bb_for_stmt (stmt)->count;
+ /* The order of CHECK_COUNTER calls is important -
+ since check_counter can correct the third parameter
+ and we want to make count <= all <= bb_all. */
+ if ( check_counter (stmt, "ic", &all, &bb_all, bb_all)
+ || check_counter (stmt, "ic", &count, &all, all))
return false;
prob = (count * REG_BR_PROB_BASE + all / 2) / all;
@@ -1375,7 +1402,7 @@ tree_stringops_transform (block_stmt_ite
at least 80% of time. */
if ((6 * count / 5) < all || !maybe_hot_bb_p (bb_for_stmt (stmt)))
return false;
- if (check_counter (stmt, "value", all, bb_for_stmt (stmt)->count))
+ if (check_counter (stmt, "value", &count, &all, bb_for_stmt (stmt)->count))
return false;
prob = (count * REG_BR_PROB_BASE + all / 2) / all;
dest = CALL_EXPR_ARG (call, 0);
@@ -1705,4 +1732,3 @@ value_profile_transformations (void)
return (value_prof_hooks->value_profile_transformations) ();
}
-