This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[dataflow] Move final RTL DSE pass


Kenny noticed that we were failing to kill a dead store in
gcc.c-torture/execute/restrict-1.c (compiled at -O2).
The powerpc code looked like:

main:
        [...]
        stw 11,8(1)
        stw 12,12(1)
        stw 0,8(1)
        [...]

x86 had a similar dead store.  The problem is that:

        stw 11,8(1)
        stw 12,12(1)

is still a single insn when the final DSE pass is run, and its DImode
store to 8(1) is not completely overshadowed by the later SImode store.

My original DSE patch had rtl_dse before flow2, but IIRC that was mostly
for sanity checking (I had some asserts in flow to check that the new
code wasn't missing things that the old code would).  I think it makes
sense to move it after the remains of the old flow2 pass, i.e. after:

  NEXT_PASS (pass_split_after_reload);
  NEXT_PASS (pass_branch_target_load_optimize1);
  NEXT_PASS (pass_thread_prologue_and_epilogue);

since at least the first and third passes can introduce new instructions.
(thread_prologue_and_epilogue generally shouldn't introduce dead stores,
but in principle, the backend can legimately do so if it isn't sure whether
the store is dead or not.  It can then add a REG_MAYBE_DEAD note to the
insn to indicate the store might be dead.)

DSE includes a standard DCE, so moving the pass has the nice side-effect
of giving us a standard DCE between thread_prologue_and_epilogue_insns
and combine_stack_adjustments.  There are cases on x86 where the prologue
contains:

    sp <- sp - X
    reg <- sp
    sp <- sp - Y

and where REG turns out to be dead.  With the existing sequence, we kill
the register copy after combine_stack_adjustments but keep separate
additions.  With the new sequence, we kill the copy before
combine_stack_adjustments, which will in turn simplify things to:

    sp <- sp - (X + Y)

I've compiled the C testsuite at -O2 both before and after this change,
and the new code is consistently equal-or-better in terms of number of
total instructions, number of stores, and number of stack adjustments.

Bootstrapped & regression-tested on i686-pc-linux-gnu.  Approved
off-list by Kenny and applied.

Richard


	* passes.c (init_optimization_passes): Move the final RTL DSE pass
	after thread_prologue_and_epilogue.

Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 115172)
+++ gcc/passes.c	(working copy)
@@ -676,10 +676,10 @@ #define NEXT_PASS(PASS)  (p = next_pass_
   p = &pass_postreload.sub;
   NEXT_PASS (pass_postreload_cse);
   NEXT_PASS (pass_gcse2);
-  NEXT_PASS (pass_rtl_dse);
   NEXT_PASS (pass_split_after_reload);
   NEXT_PASS (pass_branch_target_load_optimize1);
   NEXT_PASS (pass_thread_prologue_and_epilogue);
+  NEXT_PASS (pass_rtl_dse);
   NEXT_PASS (pass_rtl_seqabstr);
   NEXT_PASS (pass_stack_adjustments);
   NEXT_PASS (pass_peephole2);


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]