This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] Fix for PR34976: fix interchange profitability


On Jan 27, 2008 2:00 PM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
> > +  l1_cache_size.low = L1_CACHE_SIZE * 1000;
> > +  l1_cache_size.high = 0;
> > +  l2_cache_size.low = L2_CACHE_SIZE * 1000;
> > +  l2_cache_size.high = 0;
>
> l1_cache_size = uhwi_to_double_int (L1_CACHE_SIZE * 1024);
> l2_cache_size = uhwi_to_double_int (L2_CACHE_SIZE * 1024);
>
> >    /* LOOP_I is always the outer loop.  */
> >    for (loop_j = first_loop->inner;
> >         loop_j;
> > @@ -216,18 +224,31 @@ try_interchange_loops (lambda_trans_matr
> >
> >       /* Heuristics for loop interchange profitability:
> >
> > +        0. Don't transform if the smallest stride is larger than
> > +           the L2 cache, or if the largest stride is smaller than
> > +           the L1 cache.
>
> Even if the largest stride is smaller than L1, it may make sense
> to do the interchange (what we would like to test here is whether
> the whole array fits in L1 -- probably large * number of iterations
> would be a good approximation).
>
> Otherwise the patch looks OK to me (but it does not seem to fit the
> "regression and documentation fixes only", so we need to wait till 4.4
> with it).
>

Okay, the patch can indeed wait for 4.4.
I've attached the updated patch.

Sebastian
Index: tree-loop-linear.c
===================================================================
--- tree-loop-linear.c	(revision 131878)
+++ tree-loop-linear.c	(working copy)
@@ -179,10 +179,14 @@ try_interchange_loops (lambda_trans_matr
 		       VEC (data_reference_p, heap) *datarefs,
 		       struct loop *first_loop)
 {
+  bool res;
   struct loop *loop_i;
   struct loop *loop_j;
   unsigned int dependence_steps_i, dependence_steps_j;
   double_int access_strides_i, access_strides_j;
+  double_int small, large, nb_iter;
+  double_int l1_cache_size, l2_cache_size;
+  int cmp;
   unsigned int nb_deps_not_carried_by_i, nb_deps_not_carried_by_j;
   struct data_dependence_relation *ddr;
 
@@ -194,7 +198,10 @@ try_interchange_loops (lambda_trans_matr
   ddr = VEC_index (ddr_p, dependence_relations, 0);
   if (ddr == NULL || DDR_ARE_DEPENDENT (ddr) == chrec_dont_know)
     return trans;
-  
+
+  l1_cache_size = uhwi_to_double_int (L1_CACHE_SIZE * 1024);
+  l2_cache_size = uhwi_to_double_int (L2_CACHE_SIZE * 1024);
+
   /* LOOP_I is always the outer loop.  */
   for (loop_j = first_loop->inner; 
        loop_j; 
@@ -216,18 +223,38 @@ try_interchange_loops (lambda_trans_matr
 	
 	/* Heuristics for loop interchange profitability:
 
+	   0. Don't transform if the smallest stride is larger than
+	      the L2 cache, or if the largest stride multiplied by the
+	      number of iterations is smaller than the L1 cache.
+
 	   1. (spatial locality) Inner loops should have smallest
               dependence steps.
 
 	   2. (spatial locality) Inner loops should contain more
 	   dependence relations not carried by the loop.
 
-	   3. (temporal locality) Inner loops should have smallest 
+	   3. (temporal locality) Inner loops should have smallest
 	      array access strides.
 	*/
+
+	cmp = double_int_ucmp (access_strides_i, access_strides_j);
+	small = cmp < 0 ? access_strides_i : access_strides_j;
+	large = cmp < 0 ? access_strides_j : access_strides_i;
+
+	if (double_int_ucmp (small, l2_cache_size) > 0)
+	  continue;
+
+	res = cmp < 0 ?
+	  estimated_loop_iterations (loop_j, false, &nb_iter):
+	  estimated_loop_iterations (loop_i, false, &nb_iter);
+	large = double_int_mul (large, nb_iter);
+
+	if (res && double_int_ucmp (large, l1_cache_size) < 0)
+	  continue;
+
 	if (dependence_steps_i < dependence_steps_j 
 	    || nb_deps_not_carried_by_i > nb_deps_not_carried_by_j
-	    || double_int_ucmp (access_strides_i, access_strides_j) < 0)
+	    || cmp < 0)
 	  {
 	    lambda_matrix_row_exchange (LTM_MATRIX (trans),
 					loop_depth (loop_i) - loop_depth (first_loop),

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]