Compute precise counter histogram at LTO

Teresa Johnson tejohnson@google.com
Fri Apr 5 06:42:00 GMT 2013


On Mon, Apr 1, 2013 at 2:27 PM, Rong Xu <xur@google.com> wrote:
>
>
>
> On Fri, Mar 29, 2013 at 1:06 PM, Teresa Johnson <tejohnson@google.com>
> wrote:
>>
>> On Fri, Mar 29, 2013 at 11:16 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> > Hi,
>> > currently we use Theresa's code to determine hot/cold decisions based on
>> > counter.
>> > Histogram of gcov counters is computed and then threshold is identified
>> > so
>> > 99.9% of counter increments gets done in hot region.  There are some
>> > problems
>> > with this method, most importantly the non-precise way the counters are
>> > merged
>> > across train runs.
>> >
>> > This patch implements histogram construction at LTO time, where we can
>> > do it wihhout
>> > help from libgcov and thus we get more precise results.
>> > Per-compilation unit histograms are computed at compilation stage and
>> > then streamed
>> > into profile section. At WPA time we combine all histograms (this time
>> > precisely)
>> > and determine the cutoff.
>> >
>> > I added dumping facility comparing libgcov based decisions with the new
>> > implementation
>> > and tested it only on cc1 binary and tramp3d. For tramp3d the results
>> > seems to agree
>> > (more or less) after fixing the overflow problem I submitted yesterday.
>> > With cc1 there
>> > is noticeabl difference:
>> >
>> > GCOV min count: 262829 Time:98.97% Size:11.54%
>> > Determined min count: 22990 Time:99.90% Size:20.43%
>> >
>> > i.e. the gcov min count seems 10 times higher than intended and accounts
>> > about
>> > half of the code that should be hot based on current 99.9% threshold.
>>
>> That's a larger error than I had expected from the merging, although
>> as you note it is an approximation so there is going to be some amount
>> of error. If the error is that large then maybe there is a better
>> merging algorithm, since in the non-LTO case we won't be able to
>> recompute it exactly. For cc1, what was your test case -
>> profiledbootstrap or something simpler? I can try to reproduce this
>> and see if it is another bug or just due to the approximation.

I've been using Rong's tool to compute the exactly merged histogram
from the gcov-merged histogram for perlbmk. I tried a couple test
cases - with the 5 train inputs, and with the 3 ref inputs. In both
cases I am seeing up to 200% or so difference in some of the working
set min counter values, although the error is not as high for the
higher working set percentages. But large enough to cause a potential
performance issue nonetheless.

It looks like the histogram min counters are always off in the larger
direction with the gcov merged histogram, so one possibility is to
reduce the min counter value by some divisor when the number of runs
is >1. Unfortunately, at least in the perlbmk case, the magnitude of
the error doesn't seem to correlate obviously with the # runs (in the
5 run case the error is actually a little less than in the 3 run case
for perlbench). Honza, how many runs were merged in your 10x error
case above?

One thing I noticed in the perlbmk case was that there were a number
of large counter values sharing histogram buckets at the high end of
the histogram in some of the individual run profiles, so there would
be a large loss of precision when merging, since the range of counter
values sharing a single histogram is larger at the high end of the
histogram. I'll experiment with increasing the size of the histogram
to see how much that would reduce the error.

Teresa

>>
>
> Current approximation assumes the N-th hottest from profile1 merge with N-th
> hottest count in profile2.  If two profiles have very different hot spots,
> the result min_count for each bucket can off quite a bit.
>
>>
>> >
>> > I guess we need to collect data over more applications to see if there
>> > is anohter
>> > implementation bug in the gcov code or if we want to get different
>> > threshold on LTO
>> > and during libgcov driven decisions.
>>
>> I'll try some test cases. It would probably also be straightforward to
>> read in all the gcda files with an offline tool and recompute the
>> histogram exactly and do the comparison. That way we wouldn't need to
>> use LTO to find the error. Rong Xu from my team is developing an
>> offline profile merging tool that may already have the functionality
>> to do this.
>
>
> My tool reads the complete profile into memory so it can rebuild histogram.
> It's ready now and you can use it the error margin of the on-line merge.
>
>>
>>
>> Thanks,
>> Teresa
>>
>> >
>> > Profiledbootstraped/regtested x86_64-linux, committed.
>> >
>> >         * lto-cgraph.c (output_profile_summary, input_profile_summary):
>> > Use
>> >         gcov streaming; stream hot bb threshold to ltrans.
>> >         * predict.c (get_hot_bb_threshold): Break out from ....
>> >         (maybe_hot_count_p): ... here.
>> >         (set_hot_bb_threshold): New function.
>> >         * lto-section-in.c (lto_section_name): Add profile.
>> >         * profile.h (get_hot_bb_threshold, set_hot_bb_threshold):
>> > Declare.
>> >         * ipa.c: Include hash-table.h, tree-inline.h, profile.h,
>> > lto-streamer.h
>> >         and data-streamer.h
>> >         (histogram_entry): New structure.
>> >         (histogram, histogram_pool): New global vars.
>> >         (histogram_hash): New structure.
>> >         (histogram_hash::hash): New method.
>> >         (histogram_hash::equal): Likewise.
>> >         (account_time_size): New function.
>> >         (cmp_counts): New function.
>> >         (dump_histogram): New function.
>> >         (ipa_profile_generate_summary): New function.
>> >         (ipa_profile_write_summary): New function.
>> >         (ipa_profile_read_summary): New function.
>> >         (ipa_profile): Decide on threshold.
>> >         (pass_ipa_profile): Add ipa_profile_write_summary and
>> > ipa_profile_read_summary.
>> >         * Makefile.in (ipa.o): Update dependencies.
>> >         * lto-streamer.h (LTO_section_ipa_profile): New section.
>> >
>> > Index: lto-cgraph.c
>> > ===================================================================
>> > *** lto-cgraph.c        (revision 197218)
>> > --- lto-cgraph.c        (working copy)
>> > *************** output_profile_summary (struct lto_simpl
>> > *** 604,614 ****
>> >            units.  */
>> >         gcc_assert (profile_info->runs);
>> >         streamer_write_uhwi_stream (ob->main_stream,
>> > profile_info->runs);
>> > !       streamer_write_uhwi_stream (ob->main_stream,
>> > profile_info->sum_max);
>> >
>> >         /* sum_all is needed for computing the working set with the
>> >            histogram.  */
>> > !       streamer_write_uhwi_stream (ob->main_stream,
>> > profile_info->sum_all);
>> >
>> >         /* Create and output a bitpack of non-zero histogram entries
>> > indices.  */
>> >         bp = bitpack_create (ob->main_stream);
>> > --- 604,614 ----
>> >            units.  */
>> >         gcc_assert (profile_info->runs);
>> >         streamer_write_uhwi_stream (ob->main_stream,
>> > profile_info->runs);
>> > !       streamer_write_gcov_count_stream (ob->main_stream,
>> > profile_info->sum_max);
>> >
>> >         /* sum_all is needed for computing the working set with the
>> >            histogram.  */
>> > !       streamer_write_gcov_count_stream (ob->main_stream,
>> > profile_info->sum_all);
>> >
>> >         /* Create and output a bitpack of non-zero histogram entries
>> > indices.  */
>> >         bp = bitpack_create (ob->main_stream);
>> > *************** output_profile_summary (struct lto_simpl
>> > *** 620,632 ****
>> >           {
>> >             if (!profile_info->histogram[h_ix].num_counters)
>> >               continue;
>> > !           streamer_write_uhwi_stream (ob->main_stream,
>> >
>> > profile_info->histogram[h_ix].num_counters);
>> > !           streamer_write_uhwi_stream (ob->main_stream,
>> >
>> > profile_info->histogram[h_ix].min_value);
>> > !           streamer_write_uhwi_stream (ob->main_stream,
>> >
>> > profile_info->histogram[h_ix].cum_value);
>> > !         }
>> >       }
>> >     else
>> >       streamer_write_uhwi_stream (ob->main_stream, 0);
>> > --- 620,637 ----
>> >           {
>> >             if (!profile_info->histogram[h_ix].num_counters)
>> >               continue;
>> > !           streamer_write_gcov_count_stream (ob->main_stream,
>> >
>> > profile_info->histogram[h_ix].num_counters);
>> > !           streamer_write_gcov_count_stream (ob->main_stream,
>> >
>> > profile_info->histogram[h_ix].min_value);
>> > !           streamer_write_gcov_count_stream (ob->main_stream,
>> >
>> > profile_info->histogram[h_ix].cum_value);
>> > !          }
>> > !       /* IPA-profile computes hot bb threshold based on cumulated
>> > !        whole program profile.  We need to stream it down to ltrans.
>> > */
>> > !        if (flag_wpa)
>> > !          streamer_write_gcov_count_stream (ob->main_stream,
>> > !                                          get_hot_bb_threshold ());
>> >       }
>> >     else
>> >       streamer_write_uhwi_stream (ob->main_stream, 0);
>> > *************** input_profile_summary (struct lto_input_
>> > *** 1259,1266 ****
>> >     if (runs)
>> >       {
>> >         file_data->profile_info.runs = runs;
>> > !       file_data->profile_info.sum_max = streamer_read_uhwi (ib);
>> > !       file_data->profile_info.sum_all = streamer_read_uhwi (ib);
>> >
>> >         memset (file_data->profile_info.histogram, 0,
>> >                 sizeof (gcov_bucket_type) * GCOV_HISTOGRAM_SIZE);
>> > --- 1264,1271 ----
>> >     if (runs)
>> >       {
>> >         file_data->profile_info.runs = runs;
>> > !       file_data->profile_info.sum_max = streamer_read_gcov_count (ib);
>> > !       file_data->profile_info.sum_all = streamer_read_gcov_count (ib);
>> >
>> >         memset (file_data->profile_info.histogram, 0,
>> >                 sizeof (gcov_bucket_type) * GCOV_HISTOGRAM_SIZE);
>> > *************** input_profile_summary (struct lto_input_
>> > *** 1279,1290 ****
>> >               continue;
>> >
>> >             file_data->profile_info.histogram[h_ix].num_counters
>> > !               = streamer_read_uhwi (ib);
>> >             file_data->profile_info.histogram[h_ix].min_value
>> > !               = streamer_read_uhwi (ib);
>> >             file_data->profile_info.histogram[h_ix].cum_value
>> > !               = streamer_read_uhwi (ib);
>> >           }
>> >       }
>> >
>> >   }
>> > --- 1284,1299 ----
>> >               continue;
>> >
>> >             file_data->profile_info.histogram[h_ix].num_counters
>> > !               = streamer_read_gcov_count (ib);
>> >             file_data->profile_info.histogram[h_ix].min_value
>> > !               = streamer_read_gcov_count (ib);
>> >             file_data->profile_info.histogram[h_ix].cum_value
>> > !               = streamer_read_gcov_count (ib);
>> >           }
>> > +       /* IPA-profile computes hot bb threshold based on cumulated
>> > +        whole program profile.  We need to stream it down to ltrans.
>> > */
>> > +       if (flag_ltrans)
>> > +       set_hot_bb_threshold (streamer_read_gcov_count (ib));
>> >       }
>> >
>> >   }
>> > Index: predict.c
>> > ===================================================================
>> > *** predict.c   (revision 197203)
>> > --- predict.c   (working copy)
>> > *************** maybe_hot_frequency_p (struct function *
>> > *** 128,152 ****
>> >     return true;
>> >   }
>> >
>> >   /* Return TRUE if frequency FREQ is considered to be hot.  */
>> >
>> >   static inline bool
>> >   maybe_hot_count_p (struct function *fun, gcov_type count)
>> >   {
>> > -   gcov_working_set_t *ws;
>> > -   static gcov_type min_count = -1;
>> >     if (fun && profile_status_for_function (fun) != PROFILE_READ)
>> >       return true;
>> >     /* Code executed at most once is not hot.  */
>> >     if (profile_info->runs >= count)
>> >       return false;
>> > !   if (min_count == -1)
>> > !     {
>> > !       ws = find_working_set (PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE));
>> > !       gcc_assert (ws);
>> > !       min_count = ws->min_counter;
>> > !     }
>> > !   return (count >= min_count);
>> >   }
>> >
>> >   /* Return true in case BB can be CPU intensive and should be optimized
>> > --- 128,169 ----
>> >     return true;
>> >   }
>> >
>> > + static gcov_type min_count = -1;
>> > +
>> > + /* Determine the threshold for hot BB counts.  */
>> > +
>> > + gcov_type
>> > + get_hot_bb_threshold ()
>> > + {
>> > +   gcov_working_set_t *ws;
>> > +   if (min_count == -1)
>> > +     {
>> > +       ws = find_working_set (PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE));
>> > +       gcc_assert (ws);
>> > +       min_count = ws->min_counter;
>> > +     }
>> > +   return min_count;
>> > + }
>> > +
>> > + /* Set the threshold for hot BB counts.  */
>> > +
>> > + void
>> > + set_hot_bb_threshold (gcov_type min)
>> > + {
>> > +   min_count = min;
>> > + }
>> > +
>> >   /* Return TRUE if frequency FREQ is considered to be hot.  */
>> >
>> >   static inline bool
>> >   maybe_hot_count_p (struct function *fun, gcov_type count)
>> >   {
>> >     if (fun && profile_status_for_function (fun) != PROFILE_READ)
>> >       return true;
>> >     /* Code executed at most once is not hot.  */
>> >     if (profile_info->runs >= count)
>> >       return false;
>> > !   return (count >= get_hot_bb_threshold ());
>> >   }
>> >
>> >   /* Return true in case BB can be CPU intensive and should be optimized
>> > Index: lto-section-in.c
>> > ===================================================================
>> > *** lto-section-in.c    (revision 197203)
>> > --- lto-section-in.c    (working copy)
>> > *************** const char *lto_section_name[LTO_N_SECTI
>> > *** 55,60 ****
>> > --- 55,61 ----
>> >     "jmpfuncs",
>> >     "pureconst",
>> >     "reference",
>> > +   "profile",
>> >     "symbol_nodes",
>> >     "opts",
>> >     "cgraphopt",
>> > Index: profile.h
>> > ===================================================================
>> > *** profile.h   (revision 197203)
>> > --- profile.h   (working copy)
>> > *************** extern void del_node_map (void);
>> > *** 48,51 ****
>> > --- 48,55 ----
>> >
>> >   extern void compute_working_sets (void);
>> >
>> > + /* In predict.c.  */
>> > + extern gcov_type get_hot_bb_threshold (void);
>> > + extern void set_hot_bb_threshold (gcov_type);
>> > +
>> >   #endif /* PROFILE_H */
>> > Index: ipa.c
>> > ===================================================================
>> > *** ipa.c       (revision 197203)
>> > --- ipa.c       (working copy)
>> > *************** along with GCC; see the file COPYING3.
>> > *** 32,37 ****
>> > --- 32,43 ----
>> >   #include "ipa-utils.h"
>> >   #include "pointer-set.h"
>> >   #include "ipa-inline.h"
>> > + #include "hash-table.h"
>> > + #include "tree-inline.h"
>> > + #include "profile.h"
>> > + #include "params.h"
>> > + #include "lto-streamer.h"
>> > + #include "data-streamer.h"
>> >
>> >   /* Look for all functions inlined to NODE and update their inlined_to
>> > pointers
>> >      to INLINED_TO.  */
>> > *************** struct ipa_opt_pass_d pass_ipa_whole_pro
>> > *** 1040,1045 ****
>> > --- 1046,1246 ----
>> >    NULL,                                        /* variable_transform */
>> >   };
>> >
>> > + /* Entry in the histogram.  */
>> > +
>> > + struct histogram_entry
>> > + {
>> > +   gcov_type count;
>> > +   int time;
>> > +   int size;
>> > + };
>> > +
>> > + /* Histogram of profile values.
>> > +    The histogram is represented as an ordered vector of entries
>> > allocated via
>> > +    histogram_pool. During construction a separate hashtable is kept to
>> > lookup
>> > +    duplicate entries.  */
>> > +
>> > + vec<histogram_entry *> histogram;
>> > + static alloc_pool histogram_pool;
>> > +
>> > + /* Hashtable support for storing SSA names hashed by their
>> > SSA_NAME_VAR.  */
>> > +
>> > + struct histogram_hash : typed_noop_remove <histogram_entry>
>> > + {
>> > +   typedef histogram_entry value_type;
>> > +   typedef histogram_entry compare_type;
>> > +   static inline hashval_t hash (const value_type *);
>> > +   static inline int equal (const value_type *, const compare_type *);
>> > + };
>> > +
>> > + inline hashval_t
>> > + histogram_hash::hash (const histogram_entry *val)
>> > + {
>> > +   return val->count;
>> > + }
>> > +
>> > + inline int
>> > + histogram_hash::equal (const histogram_entry *val, const
>> > histogram_entry *val2)
>> > + {
>> > +   return val->count == val2->count;
>> > + }
>> > +
>> > + /* Account TIME and SIZE executed COUNT times into HISTOGRAM.
>> > +    HASHTABLE is the on-side hash kept to avoid duplicates.  */
>> > +
>> > + static void
>> > + account_time_size (hash_table <histogram_hash> hashtable,
>> > +                  vec<histogram_entry *> &histogram,
>> > +                  gcov_type count, int time, int size)
>> > + {
>> > +   histogram_entry key = {count, 0, 0};
>> > +   histogram_entry **val = hashtable.find_slot (&key, INSERT);
>> > +
>> > +   if (!*val)
>> > +     {
>> > +       *val = (histogram_entry *) pool_alloc (histogram_pool);
>> > +       **val = key;
>> > +       histogram.safe_push (*val);
>> > +     }
>> > +   (*val)->time += time;
>> > +   (*val)->size += size;
>> > + }
>> > +
>> > + int
>> > + cmp_counts (const void *v1, const void *v2)
>> > + {
>> > +   const histogram_entry *h1 = *(const histogram_entry * const *)v1;
>> > +   const histogram_entry *h2 = *(const histogram_entry * const *)v2;
>> > +   if (h1->count < h2->count)
>> > +     return 1;
>> > +   if (h1->count > h2->count)
>> > +     return -1;
>> > +   return 0;
>> > + }
>> > +
>> > + /* Dump HISTOGRAM to FILE.  */
>> > +
>> > + static void
>> > + dump_histogram (FILE *file, vec<histogram_entry *> histogram)
>> > + {
>> > +   unsigned int i;
>> > +   gcov_type overall_time = 0, cumulated_time = 0, cumulated_size = 0,
>> > overall_size = 0;
>> > +
>> > +   fprintf (dump_file, "Histogram:\n");
>> > +   for (i = 0; i < histogram.length (); i++)
>> > +     {
>> > +       overall_time += histogram[i]->count * histogram[i]->time;
>> > +       overall_size += histogram[i]->size;
>> > +     }
>> > +   if (!overall_time)
>> > +     overall_time = 1;
>> > +   if (!overall_size)
>> > +     overall_size = 1;
>> > +   for (i = 0; i < histogram.length (); i++)
>> > +     {
>> > +       cumulated_time += histogram[i]->count * histogram[i]->time;
>> > +       cumulated_size += histogram[i]->size;
>> > +       fprintf (file, "  "HOST_WIDEST_INT_PRINT_DEC": time:%i (%2.2f)
>> > size:%i (%2.2f)\n",
>> > +              (HOST_WIDEST_INT) histogram[i]->count,
>> > +              histogram[i]->time,
>> > +              cumulated_time * 100.0 / overall_time,
>> > +              histogram[i]->size,
>> > +              cumulated_size * 100.0 / overall_size);
>> > +    }
>> > + }
>> > +
>> > + /* Collect histogram from CFG profiles.  */
>> > +
>> > + static void
>> > + ipa_profile_generate_summary (void)
>> > + {
>> > +   struct cgraph_node *node;
>> > +   gimple_stmt_iterator gsi;
>> > +   hash_table <histogram_hash> hashtable;
>> > +   basic_block bb;
>> > +
>> > +   hashtable.create (10);
>> > +   histogram_pool = create_alloc_pool ("IPA histogram", sizeof (struct
>> > histogram_entry),
>> > +                                     10);
>> > +
>> > +   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
>> > +     FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (node->symbol.decl))
>> > +       {
>> > +       int time = 0;
>> > +       int size = 0;
>> > +         for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next
>> > (&gsi))
>> > +         {
>> > +           time += estimate_num_insns (gsi_stmt (gsi),
>> > &eni_time_weights);
>> > +           size += estimate_num_insns (gsi_stmt (gsi),
>> > &eni_size_weights);
>> > +         }
>> > +       account_time_size (hashtable, histogram, bb->count, time, size);
>> > +       }
>> > +   hashtable.dispose ();
>> > +   histogram.qsort (cmp_counts);
>> > + }
>> > +
>> > + /* Serialize the ipa info for lto.  */
>> > +
>> > + static void
>> > + ipa_profile_write_summary (void)
>> > + {
>> > +   struct lto_simple_output_block *ob
>> > +     = lto_create_simple_output_block (LTO_section_ipa_profile);
>> > +   unsigned int i;
>> > +
>> > +   streamer_write_uhwi_stream (ob->main_stream, histogram.length());
>> > +   for (i = 0; i < histogram.length (); i++)
>> > +     {
>> > +       streamer_write_gcov_count_stream (ob->main_stream,
>> > histogram[i]->count);
>> > +       streamer_write_uhwi_stream (ob->main_stream,
>> > histogram[i]->time);
>> > +       streamer_write_uhwi_stream (ob->main_stream,
>> > histogram[i]->size);
>> > +     }
>> > +   lto_destroy_simple_output_block (ob);
>> > + }
>> > +
>> > + /* Deserialize the ipa info for lto.  */
>> > +
>> > + static void
>> > + ipa_profile_read_summary (void)
>> > + {
>> > +   struct lto_file_decl_data ** file_data_vec
>> > +     = lto_get_file_decl_data ();
>> > +   struct lto_file_decl_data * file_data;
>> > +   hash_table <histogram_hash> hashtable;
>> > +   int j = 0;
>> > +
>> > +   hashtable.create (10);
>> > +   histogram_pool = create_alloc_pool ("IPA histogram", sizeof (struct
>> > histogram_entry),
>> > +                                     10);
>> > +
>> > +   while ((file_data = file_data_vec[j++]))
>> > +     {
>> > +       const char *data;
>> > +       size_t len;
>> > +       struct lto_input_block *ib
>> > +       = lto_create_simple_input_block (file_data,
>> > +                                        LTO_section_ipa_profile,
>> > +                                        &data, &len);
>> > +       if (ib)
>> > +       {
>> > +           unsigned int num = streamer_read_uhwi (ib);
>> > +         unsigned int n;
>> > +         for (n = 0; n < num; n++)
>> > +           {
>> > +             gcov_type count = streamer_read_gcov_count (ib);
>> > +             int time = streamer_read_uhwi (ib);
>> > +             int size = streamer_read_uhwi (ib);
>> > +             account_time_size (hashtable, histogram,
>> > +                                count, time, size);
>> > +           }
>> > +         lto_destroy_simple_input_block (file_data,
>> > +                                         LTO_section_ipa_profile,
>> > +                                         ib, data, len);
>> > +       }
>> > +     }
>> > +   hashtable.dispose ();
>> > +   histogram.qsort (cmp_counts);
>> > + }
>> >
>> >   /* Simple ipa profile pass propagating frequencies across the
>> > callgraph.  */
>> >
>> > *************** ipa_profile (void)
>> > *** 1051,1056 ****
>> > --- 1252,1326 ----
>> >     int order_pos;
>> >     bool something_changed = false;
>> >     int i;
>> > +   gcov_type overall_time = 0, cutoff = 0, cumulated = 0, overall_size
>> > = 0;
>> > +
>> > +   if (dump_file)
>> > +     dump_histogram (dump_file, histogram);
>> > +   for (i = 0; i < (int)histogram.length (); i++)
>> > +     {
>> > +       overall_time += histogram[i]->count * histogram[i]->time;
>> > +       overall_size += histogram[i]->size;
>> > +     }
>> > +   if (overall_time)
>> > +     {
>> > +       gcov_type threshold;
>> > +
>> > +       gcc_assert (overall_size);
>> > +       if (dump_file)
>> > +       {
>> > +         gcov_type min, cumulated_time = 0, cumulated_size = 0;
>> > +
>> > +         fprintf (dump_file, "Overall time:
>> > "HOST_WIDEST_INT_PRINT_DEC"\n",
>> > +                  (HOST_WIDEST_INT)overall_time);
>> > +         min = get_hot_bb_threshold ();
>> > +           for (i = 0; i < (int)histogram.length () &&
>> > histogram[i]->count >= min;
>> > +              i++)
>> > +           {
>> > +             cumulated_time += histogram[i]->count *
>> > histogram[i]->time;
>> > +             cumulated_size += histogram[i]->size;
>> > +           }
>> > +         fprintf (dump_file, "GCOV min count:
>> > "HOST_WIDEST_INT_PRINT_DEC
>> > +                  " Time:%3.2f%% Size:%3.2f%%\n",
>> > +                  (HOST_WIDEST_INT)min,
>> > +                  cumulated_time * 100.0 / overall_time,
>> > +                  cumulated_size * 100.0 / overall_size);
>> > +       }
>> > +       cutoff = (overall_time * PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE)
>> > + 500) / 1000;
>> > +       threshold = 0;
>> > +       for (i = 0; cumulated < cutoff; i++)
>> > +       {
>> > +         cumulated += histogram[i]->count * histogram[i]->time;
>> > +           threshold = histogram[i]->count;
>> > +       }
>> > +       if (!threshold)
>> > +       threshold = 1;
>> > +       if (dump_file)
>> > +       {
>> > +         gcov_type cumulated_time = 0, cumulated_size = 0;
>> > +
>> > +           for (i = 0;
>> > +              i < (int)histogram.length () && histogram[i]->count >=
>> > threshold;
>> > +              i++)
>> > +           {
>> > +             cumulated_time += histogram[i]->count *
>> > histogram[i]->time;
>> > +             cumulated_size += histogram[i]->size;
>> > +           }
>> > +         fprintf (dump_file, "Determined min count:
>> > "HOST_WIDEST_INT_PRINT_DEC
>> > +                  " Time:%3.2f%% Size:%3.2f%%\n",
>> > +                  (HOST_WIDEST_INT)threshold,
>> > +                  cumulated_time * 100.0 / overall_time,
>> > +                  cumulated_size * 100.0 / overall_size);
>> > +       }
>> > +       if (threshold > get_hot_bb_threshold ()
>> > +         || in_lto_p)
>> > +       {
>> > +         if (dump_file)
>> > +           fprintf (dump_file, "Threshold updated.\n");
>> > +           set_hot_bb_threshold (threshold);
>> > +       }
>> > +     }
>> > +   histogram.release();
>> > +   free_alloc_pool (histogram_pool);
>> >
>> >     order_pos = ipa_reverse_postorder (order);
>> >     for (i = order_pos - 1; i >= 0; i--)
>> > *************** struct ipa_opt_pass_d pass_ipa_profile =
>> > *** 1112,1120 ****
>> >     0,                                  /* todo_flags_start */
>> >     0                                     /* todo_flags_finish */
>> >    },
>> > !  NULL,                                        /* generate_summary */
>> > !  NULL,                                        /* write_summary */
>> > !  NULL,                                        /* read_summary */
>> >    NULL,                                        /*
>> > write_optimization_summary */
>> >    NULL,                                        /*
>> > read_optimization_summary */
>> >    NULL,                                        /* stmt_fixup */
>> > --- 1382,1390 ----
>> >     0,                                  /* todo_flags_start */
>> >     0                                     /* todo_flags_finish */
>> >    },
>> > !  ipa_profile_generate_summary,                /* generate_summary */
>> > !  ipa_profile_write_summary,           /* write_summary */
>> > !  ipa_profile_read_summary,            /* read_summary */
>> >    NULL,                                        /*
>> > write_optimization_summary */
>> >    NULL,                                        /*
>> > read_optimization_summary */
>> >    NULL,                                        /* stmt_fixup */
>> > Index: Makefile.in
>> > ===================================================================
>> > *** Makefile.in (revision 197205)
>> > --- Makefile.in (working copy)
>> > *************** varpool.o : varpool.c $(CONFIG_H) $(SYST
>> > *** 2893,2899 ****
>> >      $(TREE_FLOW_H)
>> >   ipa.o : ipa.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(CGRAPH_H)
>> > \
>> >      $(TREE_PASS_H) $(GIMPLE_H) $(TARGET_H) $(GGC_H) pointer-set.h \
>> > !    $(IPA_UTILS_H)
>> >   ipa-prop.o : ipa-prop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
>> >      langhooks.h $(GGC_H) $(TARGET_H) $(CGRAPH_H) $(IPA_PROP_H)
>> > $(DIAGNOSTIC_H) \
>> >      $(TREE_FLOW_H) $(TM_H) $(TREE_PASS_H) $(FLAGS_H) $(TREE_H) \
>> > --- 2893,2900 ----
>> >      $(TREE_FLOW_H)
>> >   ipa.o : ipa.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(CGRAPH_H)
>> > \
>> >      $(TREE_PASS_H) $(GIMPLE_H) $(TARGET_H) $(GGC_H) pointer-set.h \
>> > !    $(IPA_UTILS_H) tree-inline.h $(HASH_TABLE_H) profile.h $(PARAMS_H)
>> > \
>> > !    $(LTO_STREAMER_H) $(DATA_STREAMER_H)
>> >   ipa-prop.o : ipa-prop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
>> >      langhooks.h $(GGC_H) $(TARGET_H) $(CGRAPH_H) $(IPA_PROP_H)
>> > $(DIAGNOSTIC_H) \
>> >      $(TREE_FLOW_H) $(TM_H) $(TREE_PASS_H) $(FLAGS_H) $(TREE_H) \
>> > Index: lto-streamer.h
>> > ===================================================================
>> > *** lto-streamer.h      (revision 197203)
>> > --- lto-streamer.h      (working copy)
>> > *************** enum lto_section_type
>> > *** 243,248 ****
>> > --- 243,249 ----
>> >     LTO_section_jump_functions,
>> >     LTO_section_ipa_pure_const,
>> >     LTO_section_ipa_reference,
>> > +   LTO_section_ipa_profile,
>> >     LTO_section_symtab_nodes,
>> >     LTO_section_opts,
>> >     LTO_section_cgraph_opt_sum,
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>



--
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



More information about the Gcc-patches mailing list