This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH][RFC] Preserve loop info from tree loop opts to after RTL loop opts (PR44688)


The attached patch blob makes us preserve loop information (the loop
tree) from the start of tree loop optimizations until the end of
RTL loop optimizations.  The motivation for this is to fix excessive
prefetching and loop unrolling we perform on (for example) prologue
loops created by the vectorizer.  The reason why we do so is that
we are not able to analyze/bound their number of iterations.  But
of course the vectorizer perfectly knows a bound to its prologue loops,
so why not record that information ... this is what the inlined patch
does, as well as adjust passes to actually _use_ an upper bound if
available.

The whole patch does not yet pass bootstrap, but the C/C++ testsuites
are fine (and the target libs build).

Thus, inline the "meat" of the patch that makes us perform less
unrolling/prefetching.  For example on 437.leslie3d this reduces
code size from

   text    data     bss     dec     hex filename
 438423       0    4184  442607   6c0ef tml.o

to

   text    data     bss     dec     hex filename
 368903       0    4184  373087   5b15f tml.o

at -Ofast -funroll-loops and from

   text    data     bss     dec     hex filename
 741167       0    4184  745351   b5f87 tml.o

to

   text    data     bss     dec     hex filename
 561479       0    4184  565663   8a19f tml.o

at -Ofast -funroll-loops -march=barcelona.

Attached you find the collection of changes I had to make to preserve 
loops.  The main idea is to make loop_optimizer_finalize a no-op if
PROP_loops is set on the current function.  I added tons of checking
to make sure loop info is correct as well as dominators (loop
verification needs dominators).  I plan to split out the verification
bits (or at least its fixes), then the generic CFG bits that preserve
loops on the RTL side (and the few tree cases I catched).

Any comments on that plan?

Thanks,
Richard.


Index: gcc/loop-iv.c
===================================================================
--- gcc/loop-iv.c.orig	2011-07-11 17:02:51.000000000 +0200
+++ gcc/loop-iv.c	2012-02-23 15:22:14.000000000 +0100
@@ -2764,6 +2764,10 @@ iv_number_of_iterations (struct loop *lo
     {
       if (!desc->niter_max)
 	desc->niter_max = determine_max_iter (loop, desc, old_niter);
+      if (loop->any_upper_bound
+	  && double_int_fits_in_uhwi_p (loop->nb_iterations_upper_bound)
+	  && loop->nb_iterations_upper_bound.low < desc->niter_max)
+	desc->niter_max = loop->nb_iterations_upper_bound.low;
 
       /* simplify_using_initial_values does a copy propagation on the registers
 	 in the expression for the number of iterations.  This prolongs life
Index: gcc/loop-unroll.c
===================================================================
--- gcc/loop-unroll.c.orig	2011-12-02 10:14:44.000000000 +0100
+++ gcc/loop-unroll.c	2012-02-23 15:26:46.000000000 +0100
@@ -859,7 +859,8 @@ decide_unroll_runtime_iterations (struct
     }
 
   /* If we have profile feedback, check whether the loop rolls.  */
-  if (loop->header->count && expected_loop_iterations (loop) < 2 * nunroll)
+  if ((loop->header->count && expected_loop_iterations (loop) < 2 * nunroll)
+      || desc->niter_max < 2 * nunroll)
     {
       if (dump_file)
 	fprintf (dump_file, ";; Not unrolling loop, doesn't roll\n");
Index: gcc/tree-ssa-loop-niter.c
===================================================================
--- gcc/tree-ssa-loop-niter.c.orig	2011-09-01 12:08:51.000000000 +0200
+++ gcc/tree-ssa-loop-niter.c	2012-02-23 14:56:11.000000000 +0100
@@ -1383,6 +1383,10 @@ number_of_iterations_cond (struct loop *
       gcc_unreachable ();
     }
 
+  if (loop->any_upper_bound
+      && double_int_ucmp (loop->nb_iterations_upper_bound, niter->max) < 0)
+    niter->max = loop->nb_iterations_upper_bound;
+
   mpz_clear (bnds.up);
   mpz_clear (bnds.below);
 
@@ -3030,7 +3034,7 @@ estimate_numbers_of_iterations_loop (str
   if (loop->estimate_state != EST_NOT_COMPUTED)
     return;
   loop->estimate_state = EST_AVAILABLE;
-  loop->any_upper_bound = false;
+  /* loop->any_upper_bound = false; */
   loop->any_estimate = false;
 
   exits = get_loop_exit_edges (loop);
Index: gcc/tree-ssa-loop-prefetch.c
===================================================================
--- gcc/tree-ssa-loop-prefetch.c.orig	2011-10-12 13:14:10.000000000 +0200
+++ gcc/tree-ssa-loop-prefetch.c	2012-02-23 15:05:45.000000000 +0100
@@ -1801,6 +1801,8 @@ loop_prefetch_arrays (struct loop *loop)
 
   ahead = (PREFETCH_LATENCY + time - 1) / time;
   est_niter = max_stmt_executions_int (loop, false);
+  if (est_niter == -1)
+    est_niter = max_stmt_executions_int (loop, true);
 
   /* Prefetching is not likely to be profitable if the trip count to ahead
      ratio is too small.  */
Index: gcc/tree-vect-loop-manip.c
===================================================================
--- gcc/tree-vect-loop-manip.c.orig	2012-02-23 14:45:11.000000000 +0100
+++ gcc/tree-vect-loop-manip.c	2012-02-23 14:45:18.000000000 +0100
@@ -2206,6 +2206,12 @@ vect_do_peeling_for_alignment (loop_vec_
 #ifdef ENABLE_CHECKING
   slpeel_verify_cfg_after_peeling (new_loop, loop);
 #endif
+  new_loop->any_upper_bound = true;
+  new_loop->nb_iterations_upper_bound = uhwi_to_double_int (MAX (LOOP_VINFO_VECT_FACTOR (loop_vinfo), min_profitable_iters));
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "Setting upper bound of nb iterations for prologue "
+	     "loop to %d\n", MAX (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
+				  min_profitable_iters));
 
   /* Update number of times loop executes.  */
   n_iters = LOOP_VINFO_NITERS (loop_vinfo);

Attachment: preserve-loops
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]