This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch PR 44576]: Reduce the computation cost in compute_miss_rate for prefetching loop arrays


On Wed, Jun 30, 2010 at 2:06 AM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
> Hi,
>
> Attached is the patch that partially fixes bug 44576: ?testsuite/gfortran.dg/zero_sized_1.f90 with huge compile
> time on prefetching + peeling.
>
> The cost of compute_miss_rate in tree-ssa-loop-prefetch.c is found to be very high, and it results in an exploding
> compilation time when there are thousands of memory references in the loop.
>
> compute_miss_rate is to compute the possibility (miss_rate) that two memory references fall on difference cache
> lines. We will insert prefetches for both references if the computed miss rate is greater than the experimentally
> determined ACCEPTABLE_MISS_RATE.
>
> In this patch, compute_miss_rate is renamed as is_miss_rate_acceptable, and does an early return if the computed
> miss rate is already greater than the ACCEPTABLE_MISS_RATE, and thus significantly reduces the cost for such
> case. Note that the worst case cost is not affected by this patch.
>
> This patch reduces the compile time of the test case from 1m20'' to 1m03'' on an amd-linux64 system.
> Note that without -fprefetching-loop-arrays, the compile time on the same system is 0m30''. The extra 33'' is mostly
> spent in "compute_all_dependences (datarefs, &dependences, vloops, true)". ?I am not sure whether the cost in
> ?dependence analysis for loop should be that high (with several thousands memory references in loop) or there
> is a bug in dependence computation.

Well, it seems that you compute the same information over-and-over
by doing

unsigned int
tree_ssa_prefetch_arrays (void)
{
...
  FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST)
    {
      if (dump_file && (dump_flags & TDF_DETAILS))
        fprintf (dump_file, "Processing loop %d:\n", loop->num);

      unrolled |= loop_prefetch_arrays (loop);

static bool
loop_prefetch_arrays (struct loop *loop)
{
...
 determine_loop_nest_reuse (loop, refs, no_other_refs);

static void
determine_loop_nest_reuse (struct loop *loop, struct mem_ref_group *refs,
                           bool no_other_refs)
{
...
  if (loop->inner)
    return;
...
  /* Find the outermost loop of the loop nest of loop (we require that
     there are no sibling loops inside the nest).  */
  nest = loop;
  while (1)
    {
      aloop = loop_outer (nest);

      if (aloop == current_loops->tree_root
          || aloop->inner->next)
        break;

      nest = aloop;
    }
...
  compute_all_dependences (datarefs, &dependences, vloops, true);


In fact you should iterate _only_ over innermost loops like

  FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)

does that make a difference?

>
> The patch passed Bootstrapping and regression tests.
>
> Is this patch OK to commit?

Ok.

Thanks,
Richard.

> Thanks,
>
> Changpeng


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]