This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [google] New fdo summary-based icache sensitive unrolling (issue 6282045)


On Wed, Jun 6, 2012 at 2:02 PM, Teresa Johnson <tejohnson@google.com> wrote:
> On Tue, Jun 5, 2012 at 11:46 AM, ?<davidxl@google.com> wrote:
>>
>> http://codereview.appspot.com/6282045/diff/1/gcc/gcov-io.h
>> File gcc/gcov-io.h (right):
>>
>> http://codereview.appspot.com/6282045/diff/1/gcc/gcov-io.h#newcode544
>> gcc/gcov-io.h:544: gcov_unsigned_t sum_cutoff_percent;/* sum_all cutoff
>> percentage computed
>> Is there a need to record this?
>
> It isn't used by by the profile-use compile, I thought it might be
> useful to have in knowing how to interpret the count. But I can remove
> it.
>
>>
>> http://codereview.appspot.com/6282045/diff/1/gcc/gcov-io.h#newcode546
>> gcc/gcov-io.h:546: gcov_unsigned_t num_to_cutoff;/* number of counters
>>
>> to reach above cutoff. ?*/
>> This should have a better name -- e.g., hot_arc_count.
>
> Ok. I have changed it to "num_hot_counters" to be less specific to the
> type of counter the the summary represents.
>
>>
>> http://codereview.appspot.com/6282045/diff/1/gcc/loop-unroll.c
>> File gcc/loop-unroll.c (right):
>>
>> http://codereview.appspot.com/6282045/diff/1/gcc/loop-unroll.c#newcode196
>> gcc/loop-unroll.c:196: or peeled loop. ?*/
>> Add more documentation here: 1) for program size < threshold, do not
>> limit 2) for ? ?threshold < psize < 2* threshold, tame the max allows
>> peels/unrolls according to hotness; 3) for huge footprint programs,
>> disable it (by ...).
>
> done.
>
>>
>> http://codereview.appspot.com/6282045/diff/1/gcc/loop-unroll.c#newcode197
>> gcc/loop-unroll.c:197: static limit_type
>> Blank line.
>
> fixed
>
>>
>> http://codereview.appspot.com/6282045/diff/1/gcc/loop-unroll.c#newcode233
>> gcc/loop-unroll.c:233: gcc_assert(profile_info->sum_all > 0);
>> Do not assert on profile data -- either bail or emit info.
>
> done.
>
>>
>> http://codereview.appspot.com/6282045/diff/1/gcc/loop-unroll.c#newcode237
>> gcc/loop-unroll.c:237: if (profile_info->num_to_cutoff <
>> size_threshold*2) {
>> space
>
> where? i added space around the "*" and also fixed up the curly brace
> formatting in this block.
>
>>
>> http://codereview.appspot.com/6282045/diff/1/gcc/loop-unroll.c#newcode238
>> gcc/loop-unroll.c:238: /* For appliations that are less than twice the
>> codesize limit, allow
>> applications
>
> fixed
>
>>
>> http://codereview.appspot.com/6282045/diff/1/gcc/loop-unroll.c#newcode532
>> gcc/loop-unroll.c:532: limit = limit_code_size(loop, &codesize_divisor);
>> space.
>
> fixed (at new callsites, see below)
>
>>
>> http://codereview.appspot.com/6282045/diff/1/gcc/loop-unroll.c#newcode1095
>> gcc/loop-unroll.c:1095: int codesize_divisor)
>> This parameter is not documented. The name of the parameter is also not
>> ideal -. Is it possible to not change the interfaces? -- i.e., split
>> limit_code_size into two helper functions one to get the the suppress
>> flag as before, and the other gets the limit_factor which is called
>> inside each function 'decide_unroll_runtime...' -- it is cleaner and
>> easier to understand that way.
>
> Yes, in fact I have restructured it so that it is only called within
> the decide_unroll and decide_peel
> routines.
>
>>
>> http://codereview.appspot.com/6282045/diff/1/libgcc/libgcov.c
>> File libgcc/libgcov.c (right):
>>
>> http://codereview.appspot.com/6282045/diff/1/libgcc/libgcov.c#newcode832
>> libgcc/libgcov.c:832: #define CUM_CUTOFF_PERCENT_TIMES_10 999
>> Ok now but it should be controllable at compile time with a --param --
>> recorded in the binary;
>
> ok
>
>>
>> http://codereview.appspot.com/6282045/diff/1/libgcc/libgcov.c#newcode839
>> libgcc/libgcov.c:839: for (t_ix = 0; t_ix < GCOV_COUNTERS_SUMMABLE;
>> t_ix++)
>> There does not seem a need for a loop -- only t_ix == GCOV_COUNTER_ARCS
>> is summable.
>
> i wanted to write the code generically, so it would be forward
> compatible and match the rest of the code.


Here it is different -- if there were multiple summable counters,
adding/sorting them together may not make sense. It is the arc counter
that needs to be collected.

>
>>
>> http://codereview.appspot.com/6282045/diff/1/libgcc/libgcov.c#newcode848
>> libgcc/libgcov.c:848: cum_cutoff = (cs_ptr->sum_all * cutoff_perc)/1000;
>> Overflow possibility?
>
> Good point. I changed this to do the divide first and then the
> multiply. Won't have any noticeable effect given the large threshold
> used by the consumer of this info.
>
>>
>> http://codereview.appspot.com/6282045/diff/1/libgcc/libgcov.c#newcode854
>> libgcc/libgcov.c:854: value_array = (gcov_type *) malloc
>> (sizeof(gcov_type)*cs_ptr->num);
>> space
>
> done
>
>>
>> http://codereview.appspot.com/6282045/diff/1/libgcc/libgcov.c#newcode860
>> libgcc/libgcov.c:860: for (i = 0, ctr_info_ix = 0; i < t_ix; i++)
>> No need for this -- the index for ARC counter is always 0 (add assert)
>
> I thought about using that to simplify this code, but was trying to be
> as generic as possible for forward compatibility in case anything
> changed. What do you think?

It is fine to keep this with t_ix == arc_counter -- there is one minor
bug -- before the loop, the following should be added to skip all the
functions:

if (!gi_ptr->merge[t_ix])
  continue;


David

>
>> -- so either skip (use merge function for 47 and mask for 46) or use 0.
>>
>> http://codereview.appspot.com/6282045/diff/1/libgcc/libgcov.c#newcode864
>> libgcc/libgcov.c:864: }
>> in gcc_46 and before, the counters may not be allocated, and it should
>> not
>> directly accessed using t_ix. It needs to be guarded with
>>
>> if ((1 << t_ix) & gi_ptr->ctr_mask)
>
> The code for 4_6 is simpler and looks different because the counters
> are structured differently.
>
>>
>> http://codereview.appspot.com/6282045/diff/1/libgcc/libgcov.c#newcode875
>> libgcc/libgcov.c:875: gcc_assert (index + ci_ptr->num <= cs_ptr->num);
>> Need to relax this ?a little by skipping -- profiling dumping is known
>> to be 'flaky'
>
> fixed, now handle this case gracefully.
>
>>
>> http://codereview.appspot.com/6282045/diff/1/libgcc/libgcov.c#newcode1103
>> libgcc/libgcov.c:1103: cs_prg->sum_max += cs_tprg->run_max;
>> For multiple runs, how should num_to_cutoff merged? Pick the larger
>> value?
>
> that's a good idea - fixed.
>
> New patch coming shortly.
>
> Thanks,
> Teresa
>
>>
>> http://codereview.appspot.com/6282045/
>
>
>
> --
> Teresa Johnson?|?Software Engineer?|?tejohnson@google.com?|?408-460-2413


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]