This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[patch][RFA] Remove gcse.c ability to iterate PRE


Hello,

This patch removes the ability from gcse.c to run multiple passes. The
eventual goal is to split GCSE up into multiple independent passes in
passes.c, and to make gcse.c use DF more.

I suppose iterating PRE made sense once.  But in the current framework
of optimizations, I don't see any benefit from iterating PRE on e.g.
ia64 Polyhedron.  The obvious reasons for this are:

  1) There is less need for iterating PRE since we started handling
     REG_EQUAL notes as candidate expressions for PRE.  Expressions
     with N insns and a REG_EQUAL note previously needed N passes to
     fully eliminate the redundancy.  With REG_EQUAL notes as PRE
     candidates, things are moved in just one pass.

  2) The GIMPLE PRE implementation (tree-ssa-pre.c) is *really* good
     especially in comparison with the LCM PRE in gcse.c.  There are
     just not so many partial redundancies left to eliminate.  The
     bunch of the little amount of work that remains is address
     calculations, but (a) partially redundant ones are rare; (b) the
     ones that are there are simple enough to move in just one pass;
     and (c) the most important calculations are the loop-invariant
     ones, and loop-invariant.c handles those cases just fine (and in
     fact much better than gcse.c does).

All things considered, I don't think there is a need anymore for PRE
to have the ability to iterate.  Removing this ability makes a number
of good cleanups possible.

Bootstrapped&tested on ia64-unknown-linux-gnu.
OK for the trunk?

Ciao!
Steven


	* doc/invoke.texi (max_gcse_passes): Remove documentation.
	* params.def (PARAM_MAX_GCSE_PASSES): Remove.
	* params.h (MAX_GCSE_PASSES): Remove.
	* gcse.c (gcse_main): Run CPROP1, PRE or HOIST, and CPROP2
	in sequence.  Remove ability to run multiple passes.
	(bypass_jumps): Report run as third CPROP pass.

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 145701)
+++ doc/invoke.texi	(working copy)
@@ -7319,9 +7319,6 @@ order to perform the global common subexpression e
 optimization.  If more memory than specified is required, the
 optimization will not be done.

-@item max-gcse-passes
-The maximum number of passes of GCSE to run.  The default is 1.
-
 @item max-pending-list-length
 The maximum number of pending dependencies scheduling will allow
 before flushing the current state and starting over.  Large functions
Index: params.def
===================================================================
--- params.def	(revision 145701)
+++ params.def	(working copy)
@@ -223,11 +223,7 @@ DEFPARAM(PARAM_MAX_GCSE_MEMORY,
 	 "max-gcse-memory",
 	 "The maximum amount of memory to be allocated by GCSE",
 	 50 * 1024 * 1024, 0, 0)
-/* The number of repetitions of copy/const prop and PRE to run.  */
-DEFPARAM(PARAM_MAX_GCSE_PASSES,
-	"max-gcse-passes",
-	"The maximum number of passes to make when doing GCSE",
-	1, 1, 0)
+
 /* This is the threshold ratio when to perform partial redundancy
    elimination after reload. We perform partial redundancy elimination
    when the following holds:
Index: params.h
===================================================================
--- params.h	(revision 145701)
+++ params.h	(working copy)
@@ -124,8 +124,6 @@ typedef enum compiler_param
   PARAM_VALUE (PARAM_MAX_PENDING_LIST_LENGTH)
 #define MAX_GCSE_MEMORY \
   ((size_t) PARAM_VALUE (PARAM_MAX_GCSE_MEMORY))
-#define MAX_GCSE_PASSES \
-  PARAM_VALUE (PARAM_MAX_GCSE_PASSES)
 #define GCSE_AFTER_RELOAD_PARTIAL_FRACTION \
   PARAM_VALUE (PARAM_GCSE_AFTER_RELOAD_PARTIAL_FRACTION)
 #define GCSE_AFTER_RELOAD_CRITICAL_FRACTION \
Index: gcse.c
===================================================================
--- gcse.c	(revision 145701)
+++ gcse.c	(working copy)
@@ -190,17 +190,19 @@ along with GCC; see the file COPYING3.  If not see

    We perform the following steps:

-   1) Compute basic block information.
+   1) Compute table of places where registers are set.

-   2) Compute table of places where registers are set.
+   2) Perform copy/constant propagation.

-   3) Perform copy/constant propagation.
-
-   4) Perform global cse using lazy code motion if not optimizing
+   3) Perform global cse using lazy code motion if not optimizing
       for size, or code hoisting if we are.

-   5) Perform another pass of copy/constant propagation.
+   4) Perform another pass of copy/constant propagation.  Try to bypass
+      conditional jumps if the condition can be computed from a value of
+      an incoming edge.

+   5) Perform store motion.
+
    Two passes of copy/constant propagation are done because the first one
    enables more GCSE and the second one helps to clean up the copies that
    GCSE creates.  This is needed more for PRE than for Classic because Classic
@@ -212,6 +214,11 @@ along with GCC; see the file COPYING3.  If not see
    (set (pseudo-reg) (expression)).
    Function want_to_gcse_p says what these are.

+   In addition, expressions in REG_EQUAL notes are candidates for GXSE-ing.
+   This allows PRE to hoist expressions that are expressed in multiple insns,
+   such as comprex address calculations (e.g. for PIC code, or loads with a
+   high part and as lowe part).
+
    PRE handles moving invariant expressions out of loops (by treating them as
    partially redundant).

@@ -235,9 +242,13 @@ along with GCC; see the file COPYING3.  If not see
    It was found doing copy propagation between each pass enables further
    substitutions.

+   This study was done before expressions in REG_EQUAL notes were added as
+   candidate expressions for optimization, and before the GIMPLE optimizers
+   were added.  Probably, multiple passes is even less efficient now than
+   at the time when the study was conducted.
+
    PRE is quite expensive in complicated functions because the DFA can take
-   a while to converge.  Hence we only perform one pass.  The parameter
-   max-gcse-passes can be modified if one wants to experiment.
+   a while to converge.  Hence we only perform one pass.

    **********************

@@ -661,11 +672,7 @@ static bool is_too_expensive (const char *);
 static int
 gcse_main (rtx f ATTRIBUTE_UNUSED)
 {
-  int changed, pass;
-  /* Bytes used at start of pass.  */
-  int initial_bytes_used;
-  /* Maximum number of bytes used by a pass.  */
-  int max_pass_bytes;
+  int changed;
   /* Point to release obstack data from for each pass.  */
   char *gcse_obstack_bottom;

@@ -697,6 +704,7 @@ gcse_main (rtx f ATTRIBUTE_UNUSED)

   /* We need alias.  */
   init_alias_analysis ();
+
   /* Record where pseudo-registers are set.  This data is kept accurate
      during each pass.  ??? We could also record hard-reg information here
      [since it's unchanging], however it is currently done during hash table
@@ -704,99 +712,86 @@ gcse_main (rtx f ATTRIBUTE_UNUSED)

      It may be tempting to compute MEM set information here too, but MEM sets
      will be subject to code motion one day and thus we need to compute
-     information about memory sets when we build the hash tables.  */
+     information about memory sets when we build the hash tables.
+
+     ??? Actually, we already know the information that compute_sets computes
+     because it is available from DF.  FIXME.  */

   alloc_reg_set_mem (max_gcse_regno);
   compute_sets ();

-  pass = 0;
-  initial_bytes_used = bytes_used;
-  max_pass_bytes = 0;
   gcse_obstack_bottom = GOBNEWVAR (char, 1);
-  changed = 1;
-  while (changed && pass < MAX_GCSE_PASSES)
-    {
-      changed = 0;
-      if (dump_file)
-	fprintf (dump_file, "GCSE pass %d\n\n", pass + 1);
+  changed = 0;
+
+  if (dump_file)
+    fprintf (dump_file, "GCSE pass\n\n");

-      /* Initialize bytes_used to the space for the pred/succ lists,
-	 and the reg_set_table data.  */
-      bytes_used = initial_bytes_used;
+  max_gcse_regno = max_reg_num ();

-      /* Each pass may create new registers, so recalculate each time.  */
-      max_gcse_regno = max_reg_num ();
+  alloc_gcse_mem ();

-      alloc_gcse_mem ();
+  /* Don't allow constant propagation to modify jumps
+     during this pass.  */
+  if (dbg_cnt (cprop1))
+    {
+      timevar_push (TV_CPROP1);
+      changed = one_cprop_pass (1, false, false);
+      timevar_pop (TV_CPROP1);
+    }

-      /* Don't allow constant propagation to modify jumps
-	 during this pass.  */
-      if (dbg_cnt (cprop1))
+  if (optimize_function_for_speed_p (cfun))
+    {
+      timevar_push (TV_PRE);
+      changed |= one_pre_gcse_pass (1);
+      /* We may have just created new basic blocks.  Release and
+	 recompute various things which are sized on the number of
+	 basic blocks.
+	 ??? There would be no need for this if we used a block
+	 based Lazy Code Motion variant, with all (or selected)
+	 edges split before running the pass.  That would also
+	 help find_implicit_sets for cprop.  FIXME.  */
+      if (changed)
 	{
-	  timevar_push (TV_CPROP1);
-	  changed = one_cprop_pass (pass + 1, false, false);
-	  timevar_pop (TV_CPROP1);
+	  free_modify_mem_tables ();
+	  modify_mem_list = GCNEWVEC (rtx, last_basic_block);
+	  canon_modify_mem_list = GCNEWVEC (rtx, last_basic_block);
 	}

-      if (optimize_function_for_speed_p (cfun))
-	{
-	  timevar_push (TV_PRE);
-	  changed |= one_pre_gcse_pass (pass + 1);
-	  /* We may have just created new basic blocks.  Release and
-	     recompute various things which are sized on the number of
-	     basic blocks.  */
-	  if (changed)
-	    {
-	      free_modify_mem_tables ();
-	      modify_mem_list = GCNEWVEC (rtx, last_basic_block);
-	      canon_modify_mem_list = GCNEWVEC (rtx, last_basic_block);
-	    }
-	  free_reg_set_mem ();
-	  alloc_reg_set_mem (max_reg_num ());
-	  compute_sets ();
-	  run_jump_opt_after_gcse = 1;
-	  timevar_pop (TV_PRE);
-	}
-
-      if (max_pass_bytes < bytes_used)
-	max_pass_bytes = bytes_used;
-
-      /* Free up memory, then reallocate for code hoisting.  We can
-	 not re-use the existing allocated memory because the tables
-	 will not have info for the insns or registers created by
-	 partial redundancy elimination.  */
-      free_gcse_mem ();
-
-      /* It does not make sense to run code hoisting unless we are optimizing
+      /* ??? When we allocate this at the start of the function,
+	 the comment says that "this data is kept accurate during
+	 each pass".  Apparently this is not so?  FIXME.  */
+      free_reg_set_mem ();
+      alloc_reg_set_mem (max_reg_num ());
+      compute_sets ();
+      run_jump_opt_after_gcse = 1;
+      timevar_pop (TV_PRE);
+    }
+  else
+    {
+      /* This function is being optimized for code size.
+	 It does not make sense to run code hoisting unless we are optimizing
 	 for code size -- it rarely makes programs faster, and can make
 	 them bigger if we did partial redundancy elimination (when optimizing
 	 for space, we don't run the partial redundancy algorithms).  */
-      if (optimize_function_for_size_p (cfun))
-	{
-	  timevar_push (TV_HOIST);
-	  max_gcse_regno = max_reg_num ();
-	  alloc_gcse_mem ();
-	  changed |= one_code_hoisting_pass ();
-	  free_gcse_mem ();
+      timevar_push (TV_HOIST);
+      max_gcse_regno = max_reg_num ();
+      alloc_gcse_mem ();
+      one_code_hoisting_pass ();
+      timevar_pop (TV_HOIST);
+    }

-	  if (max_pass_bytes < bytes_used)
-	    max_pass_bytes = bytes_used;
-	  timevar_pop (TV_HOIST);
-	}
+  free_gcse_mem ();

-      if (dump_file)
-	{
-	  fprintf (dump_file, "\n");
-	  fflush (dump_file);
-	}
-
-      obstack_free (&gcse_obstack, gcse_obstack_bottom);
-      pass++;
+  if (dump_file)
+    {
+      fprintf (dump_file, "\n");
+      fflush (dump_file);
     }

-  /* Do one last pass of copy propagation, including cprop into
-     conditional jumps.  */
+  obstack_free (&gcse_obstack, gcse_obstack_bottom);

+  /* Do the second const/copy propagation pass, including cprop into
+     conditional jumps.  */
   if (dbg_cnt (cprop2))
     {
       max_gcse_regno = max_reg_num ();
@@ -804,7 +799,7 @@ gcse_main (rtx f ATTRIBUTE_UNUSED)

       /* This time, go ahead and allow cprop to alter jumps.  */
       timevar_push (TV_CPROP2);
-      one_cprop_pass (pass + 1, true, true);
+      one_cprop_pass (2, true, true);
       timevar_pop (TV_CPROP2);
       free_gcse_mem ();
     }
@@ -813,16 +808,17 @@ gcse_main (rtx f ATTRIBUTE_UNUSED)
     {
       fprintf (dump_file, "GCSE of %s: %d basic blocks, ",
 	       current_function_name (), n_basic_blocks);
-      fprintf (dump_file, "%d pass%s, %d bytes\n\n",
-	       pass, pass > 1 ? "es" : "", max_pass_bytes);
+      fprintf (dump_file, "pass 1, %d bytes\n\n", bytes_used);
     }

   obstack_free (&gcse_obstack, NULL);
   free_reg_set_mem ();

-  /* We are finished with alias.  */
+  /* We are finished with alias.
+     ??? Actually we recompute alias in store_motion.  */
   end_alias_analysis ();

+  /* Run store motion.  */
   if (optimize_function_for_speed_p (cfun) && flag_gcse_sm)
     {
       timevar_push (TV_LSM);
@@ -6530,7 +6526,7 @@ bypass_jumps (void)

   max_gcse_regno = max_reg_num ();
   alloc_gcse_mem ();
-  changed = one_cprop_pass (MAX_GCSE_PASSES + 2, true, true);
+  changed = one_cprop_pass (3, true, true);
   free_gcse_mem ();

   if (dump_file)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]