This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Patch PR44576]: Don't do further analysis if we could know prefetching is not benefitial


Hi,

Attached is the patch to fix bug 44576.

Some prefetch analysis (e.g.  miss rate and dependence computations for reuse analysis) has
been known to be expansive. We should avoid un-necessary analysis whenever possible.

In this patch, we split the cost model implementing function (is_loop_prefetching_profitable) into
three small functions (trip_count_to_ahead_ratio_too_small_p, mem_ref_count_reasonable_p and
insn_to_prefetch_ratio_too_small_p), with each one called at the earliest possible time.  In
this way, we can reduce the compilation time.

In this patch, we also introduce a new parameter, PREFETCH_MAX_MEM_REFS_PER_LOOP.
We give up prefetching if the number of memory references in a loop is above this threshold.
This is based on both compilation time and memory consumption considerations. We found
that the memory allocation and freeing of data dependence relation (for reuse analysis) will become
the bottleneck if there is too many memory references in the loop.

This patch passed bootstrapping and completely fixed bug 44576.

Is it OK for the trunk?

Thanks,

Changpeng
 
From 8d878775b85c961ab02076c7deffebcc38cd35b5 Mon Sep 17 00:00:00 2001
From: Changpeng Fang <chfang@pathscale.(none)>
Date: Thu, 8 Jul 2010 13:32:11 -0700
Subject: [PATCH] pr44576 Avoid un-necessary prefetch analysis by distributing the cost models

	* tree-ssa-loop-prefetch.c (trip_count_to_ahead_ratio_too_small_p): New.
	Pull out from is_loop_prefetching_profitable to implement the trip count
	to ahead ratio heuristic.
        (mem_ref_count_reasonable_p): New.  Pull out from is_loop_prefetching_profitable
	to implement the instruction to memory reference ratio heuristic.  Also consider
	not reasonable if the memory reference count is above a threshold (to avoid
	explosive compilation time.
	(insn_to_prefetch_ratio_too_small_p): New.  Pull out from is_loop_prefetching_profitable
	to implement the instruction to prefetch ratio heuristic.
	(is_loop_prefetching_profitable): Removed.
	(loop_prefetch_arrays): Distribute the cost analysis across the function to
	allow early exit of the prefetch analysis.  is_loop_prefetching_profitable is
	splitted into three functions, with each one called as early as possible.
	(PREFETCH_MAX_MEM_REFS_PER_LOOP): New.  Threshold above which the number of
	memory references in a loop is considered too many.
---
 gcc/tree-ssa-loop-prefetch.c |  145 ++++++++++++++++++++++++++++-------------
 1 files changed, 99 insertions(+), 46 deletions(-)

diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
index c3e90d2..9f2bbba 100644
--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -109,18 +109,23 @@ along with GCC; see the file COPYING3.  If not see
       prefetch instructions with guards in cases where 5) was not sufficient
       to satisfy the constraints?
 
-   The function is_loop_prefetching_profitable() implements a cost model
-   to determine if prefetching is profitable for a given loop. The cost
-   model has two heuristcs:
-   1. A heuristic that determines whether the given loop has enough CPU
-      ops that can be overlapped with cache missing memory ops.
-      If not, the loop won't benefit from prefetching. This is implemented
-      by requirung the ratio between the instruction count and the mem ref
-      count to be above a certain minimum.
-   2. A heuristic that disables prefetching in a loop with an unknown trip
-      count if the prefetching cost is above a certain limit. The relative
-      prefetching cost is estimated by taking the ratio between the
-      prefetch count and the total intruction count (this models the I-cache
+   A cost model is implemented to determine whether or not prefetching is
+   profitable for a given loop.  The cost model has three heuristics:
+   1. Function trip_count_to_ahead_ratio_too_small_p implements a heuristic
+      that determines whether or not the loop has too few iterations
+      (compared to ahead).  Prefetching is not likely to be beneficial if the
+      trip count to ahead ratio is below a certain minimum.
+   
+   2. Function mem_ref_count_reasonable_p implements a heuristic that determines
+      whether the given loop has enough CPU ops that can be overlapped with cache
+      missing memory ops.  If not, the loop won't benefit from prefetching.  In the
+      implementation, prefetching is not considered beneficial if the ratio between
+      the instruction count and the mem ref count is below a certain minimum.
+
+   3. Function insn_to_prefetch_ratio_too_small_p implements a heuristic that
+      disables prefetching in a loop if the prefetching cost is above a certain
+      limit.  The relative prefetching cost is estimated by taking the ratio between
+      the prefetch count and the total intruction count (this models the I-cache
       cost).
    The limits used in these heuristics are defined as parameters with
    reasonable default values. Machine-specific default values will be
@@ -237,6 +242,14 @@ struct mem_ref_group
 #define PREFETCH_MOD_TO_UNROLL_FACTOR_RATIO 4
 #endif
 
+/* Some of the prefetch computations have quadratic complexity.  We want to
+   avoid huge compile times and, therefore, want to limit the amount of
+   memory references per loop where we consider prefetching.  */
+
+#ifndef PREFETCH_MAX_MEM_REFS_PER_LOOP
+#define PREFETCH_MAX_MEM_REFS_PER_LOOP 200
+#endif
+
 /* The memory reference.  */
 
 struct mem_ref
@@ -1640,24 +1653,51 @@ determine_loop_nest_reuse (struct loop *loop, struct mem_ref_group *refs,
     }
 }
 
-/* Do a cost-benefit analysis to determine if prefetching is profitable
-   for the current loop given the following parameters:
+/* Determine whether or not the trip count to ahead ratio is too small based
+   on prefitablility consideration.
    AHEAD: the iteration ahead distance,
-   EST_NITER: the estimated trip count,
+   EST_NITER: the estimated trip count.  */
+
+static bool
+trip_count_to_ahead_ratio_too_small_p (unsigned ahead, HOST_WIDE_INT est_niter)
+{
+  /* Assume trip count to ahead ratio is big enough if the trip count could not
+     be estimated at compile time.  */
+  if (est_niter < 0)
+    return false;
+
+  if (est_niter < (HOST_WIDE_INT) (TRIP_COUNT_TO_AHEAD_RATIO * ahead))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "Not prefetching -- loop estimated to roll only %d times\n",
+		 (int) est_niter);
+      return true;
+    }
+
+  return false;
+}
+
+/* Determine whether or not the number of memory references in the loop is
+   reasonable based on the profitablity and compilation time considerations.
    NINSNS: estimated number of instructions in the loop,
-   PREFETCH_COUNT: an estimate of the number of prefetches
    MEM_REF_COUNT: total number of memory references in the loop.  */
 
 static bool
-is_loop_prefetching_profitable (unsigned ahead, HOST_WIDE_INT est_niter,
-				unsigned ninsns, unsigned prefetch_count,
-				unsigned mem_ref_count, unsigned unroll_factor)
+mem_ref_count_reasonable_p (unsigned ninsns, unsigned mem_ref_count)
 {
-  int insn_to_mem_ratio, insn_to_prefetch_ratio;
+  int insn_to_mem_ratio;
 
   if (mem_ref_count == 0)
     return false;
 
+  /* Miss rate computation (is_miss_rate_acceptable) and dependence analysis
+     (compute_all_dependences) have high costs based on quadratic complexity.
+     To avoid huge compilation time, we give up prefetching if mem_ref_count
+     is too large.  */
+  if (mem_ref_count > PREFETCH_MAX_MEM_REFS_PER_LOOP)
+    return false;
+
   /* Prefetching improves performance by overlapping cache missing
      memory accesses with CPU operations.  If the loop does not have
      enough CPU operations to overlap with memory operations, prefetching
@@ -1678,6 +1718,21 @@ is_loop_prefetching_profitable (unsigned ahead, HOST_WIDE_INT est_niter,
       return false;
     }
 
+  return true;
+}
+
+/* Determine whether or not the instruction to prefetch ratio in the loop is
+   too small based on the profitablity consideration.
+   NINSNS: estimated number of instructions in the loop,
+   PREFETCH_COUNT: an estimate of the number of prefetches,
+   UNROLL_FACTOR:  the factor to unroll the loop if prefetching.  */
+
+static bool
+insn_to_prefetch_ratio_too_small_p (unsigned ninsns, unsigned prefetch_count,
+                                     unsigned unroll_factor)
+{
+  int insn_to_prefetch_ratio;
+
   /* Prefetching most likely causes performance degradation when the instruction
      to prefetch ratio is too small.  Too many prefetch instructions in a loop
      may reduce the I-cache performance.
@@ -1697,23 +1752,10 @@ is_loop_prefetching_profitable (unsigned ahead, HOST_WIDE_INT est_niter,
         fprintf (dump_file,
 		 "Not prefetching -- instruction to prefetch ratio (%d) too small\n",
 		 insn_to_prefetch_ratio);
-      return false;
+      return true;
     }
 
-  /* Could not do further estimation if the trip count is unknown.  Just assume
-     prefetching is profitable. Too aggressive???  */
-  if (est_niter < 0)
-    return true;
-
-  if (est_niter < (HOST_WIDE_INT) (TRIP_COUNT_TO_AHEAD_RATIO * ahead))
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file,
-		 "Not prefetching -- loop estimated to roll only %d times\n",
-		 (int) est_niter);
-      return false;
-    }
-  return true;
+  return false;
 }
 
 
@@ -1738,9 +1780,27 @@ loop_prefetch_arrays (struct loop *loop)
       return false;
     }
 
+  /* FIXME: the time should be weighted by the probabilities of the blocks in
+     the loop body.  */
+  time = tree_num_loop_insns (loop, &eni_time_weights);
+  ahead = (PREFETCH_LATENCY + time - 1) / time;
+  est_niter = estimated_loop_iterations_int (loop, false);
+
+  /* Prefetching is not likely to be profitable if the trip count to ahead
+     ratio is too small.  */
+  if (trip_count_to_ahead_ratio_too_small_p (ahead, est_niter))
+    return false;
+
+  ninsns = tree_num_loop_insns (loop, &eni_size_weights);
+
   /* Step 1: gather the memory references.  */
   refs = gather_memory_references (loop, &no_other_refs, &mem_ref_count);
 
+  /* Give up prefetching if the number of memory references in the loop is
+     not reasonable based on profitablity and compilation time considerations.  */
+  if (!mem_ref_count_reasonable_p (ninsns, mem_ref_count))
+    goto fail;
+
   /* Step 2: estimate the reuse effects.  */
   prune_by_reuse (refs);
 
@@ -1749,15 +1809,7 @@ loop_prefetch_arrays (struct loop *loop)
 
   determine_loop_nest_reuse (loop, refs, no_other_refs);
 
-  /* Step 3: determine the ahead and unroll factor.  */
-
-  /* FIXME: the time should be weighted by the probabilities of the blocks in
-     the loop body.  */
-  time = tree_num_loop_insns (loop, &eni_time_weights);
-  ahead = (PREFETCH_LATENCY + time - 1) / time;
-  est_niter = estimated_loop_iterations_int (loop, false);
-
-  ninsns = tree_num_loop_insns (loop, &eni_size_weights);
+  /* Step 3: determine unroll factor.  */
   unroll_factor = determine_unroll_factor (loop, refs, ninsns, &desc,
 					   est_niter);
 
@@ -1773,8 +1825,9 @@ loop_prefetch_arrays (struct loop *loop)
 	     ahead, unroll_factor, est_niter,
 	     ninsns, mem_ref_count, prefetch_count);
 
-  if (!is_loop_prefetching_profitable (ahead, est_niter, ninsns, prefetch_count,
-				       mem_ref_count, unroll_factor))
+  /* Prefetching is not likely to be profitable if the instruction to prefetch
+     ratio is too small.  */
+  if (insn_to_prefetch_ratio_too_small_p (ninsns, prefetch_count, unroll_factor))
     goto fail;
 
   mark_nontemporal_stores (loop, refs);
-- 
1.6.3.3


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]