[PATCH] Fix PRE of TARGET_MEM_REF

Sun May 24 12:53:00 GMT 2009

On Sun, May 24, 2009 at 2:39 PM, Paolo Bonzini <bonzini@gnu.org> wrote:
> Richard Guenther wrote:
>> This patch fixes PRE/SCCVN handling of TARGET_MEM_REF.  This
>> would allow scheduling PRE after loop optimizations (apart from
>> some Fortran fallout that in principle looks unrelated).
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu, once with
>> the current pass ordering and once with PRE moved after
>> loop optimizations.
>
> Thanks, indeed I had experienced some missed optimizations in the
> vectorizer when PRE managed to mess up the pattern it recognizes, so
> it's a good idea.

FYI, SPEC 2006 doesn't seem to care if I do

Index: gcc/passes.c
===================================================================

--- gcc/passes.c	(revision 147833)
+++ gcc/passes.c	(working copy)
@@ -635,7 +635,6 @@ init_optimization_passes (void)
       NEXT_PASS (pass_fold_builtins);
       NEXT_PASS (pass_cse_sincos);
       NEXT_PASS (pass_split_crit_edges);
-      NEXT_PASS (pass_pre);
       NEXT_PASS (pass_sink_code);
       NEXT_PASS (pass_tree_loop);
 	{
@@ -668,6 +667,8 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_iv_optimize);
 	  NEXT_PASS (pass_tree_loop_done);
 	}
+      NEXT_PASS (pass_split_crit_edges);
+      NEXT_PASS (pass_pre);
       NEXT_PASS (pass_cse_reciprocals);
       NEXT_PASS (pass_convert_to_rsqrt);
       NEXT_PASS (pass_reassoc);

though Andrew mentioned that he'd rather do PRE at the place of the current
FRE ...

Note that predcom also confuses the vectorizer in some cases.  I was mainly
playing with the above to have an extra FRE after loop optimizations
(which includes unrolling!) - DOM doesn't do a very good job here.  Moving
PRE get's that without extra overhead (probably sink_code should be moved
as well, LIM does a good job for all loop related stuff but needs a copyprop
pass to clean up after it to not confuse the vectorizer).

So I would like to do

Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 147833)
+++ gcc/passes.c	(working copy)
@@ -634,16 +634,13 @@ init_optimization_passes (void)
       NEXT_PASS (pass_copy_prop);
       NEXT_PASS (pass_fold_builtins);
       NEXT_PASS (pass_cse_sincos);
-      NEXT_PASS (pass_split_crit_edges);
-      NEXT_PASS (pass_pre);
-      NEXT_PASS (pass_sink_code);
       NEXT_PASS (pass_tree_loop);
 	{
 	  struct opt_pass **p = &pass_tree_loop.pass.sub;
 	  NEXT_PASS (pass_tree_loop_init);
+	  NEXT_PASS (pass_lim);
 	  NEXT_PASS (pass_copy_prop);
 	  NEXT_PASS (pass_dce_loop);
-	  NEXT_PASS (pass_lim);
 	  NEXT_PASS (pass_predcom);
 	  NEXT_PASS (pass_tree_unswitch);
 	  NEXT_PASS (pass_scev_cprop);
@@ -668,10 +665,13 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_iv_optimize);
 	  NEXT_PASS (pass_tree_loop_done);
 	}
+      NEXT_PASS (pass_vrp);
+      NEXT_PASS (pass_split_crit_edges);
+      NEXT_PASS (pass_pre);
+      NEXT_PASS (pass_sink_code);
       NEXT_PASS (pass_cse_reciprocals);
       NEXT_PASS (pass_convert_to_rsqrt);
       NEXT_PASS (pass_reassoc);
-      NEXT_PASS (pass_vrp);
       NEXT_PASS (pass_dominator);
       /* The only const/copy propagation opportunities left after
 	 DOM should be due to degenerate PHI nodes.  So rather than

with the theory that VRP should do the same amount of CCP as CCP does
(it doesn't, but for loop related stuff it probably is ok, otherwise we should
fix it).  Doing VRP before PRE also helps new FRE/PRE opportunities by
the jump threading performed, so we'd also likely get better cascading
with the jump threading performed by DOM.  The last forwprop also
does not make much sense in its position, but I'd rather remove it completely
(likewise the following phiopt).

Of course the above is completely untested / unbenchmarked.

Richard.