This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH, PR 10474] Shedule pass_cprop_hardreg before pass_thread_prologue_and_epilogue


Hi,

I have discovered that scheduling pass_cprop_hardreg before
pass_thread_prologue_and_epilogue leads to significant increases in
numbers of performed shrink-wrappings.  For one it solves PR 10474 (at
least on x86_64-linux) but it also boosts the number of
shrink-wrappings performed during gcc bootstrap by nearly 80%
(3165->5692 functions).  It is also necessary (although not
sufficient) to perform shrink-wrapping in at least one function in the
povray benchmark.

The reason why it helps so much is that before register allocation
there are instructions moving the value of actual arguments from
"originally hard" register (e.g. SI, DI, etc.) to a pseudo at the
beginning of each function.  When the argument is live across a
function call, the pseudo is likely to be assigned to a callee-saved
register and then also accessed from that register, even in the first
BB, making it require prologue, though it could be fetched from the
original one.  When we convert all uses (at least in the first BB) to
the original register, the preparatory stage of shrink wrapping is
often capable of moving the register moves to a later BB, thus
creating fast paths which do not require prologue and epilogue.

We believe this change in the pipeline should not bring about any
negative effects.  During gcc bootstrap, the number of instructions
changed by pass_cprop_hardreg dropped but by only 1.2%.  We have also
ran SPEC 2006 CPU benchmarks on recent Intel and AMD hardware and all
run time differences could be attributed to noise.  The changes in
binary sizes were also small:

    |                | Trunk produced | New         |        |
    | Benchmark      |    binary size | binary size | % diff |
    |----------------+----------------+-------------+--------|
    | 400.perlbench  |        6219603 |     6136803 |  -1.33 |
    | 401.bzip2      |         359291 |      351659 |  -2.12 |
    | 403.gcc        |       16249718 |    15915774 |  -2.06 |
    | 410.bwaves     |         145249 |      145769 |   0.36 |
    | 416.gamess     |       40269686 |    40270270 |   0.00 |
    | 429.mcf        |          97142 |       97126 |  -0.02 |
    | 433.milc       |         715444 |      713236 |  -0.31 |
    | 434.zeusmp     |        1444596 |     1444676 |   0.01 |
    | 435.gromacs    |        6609207 |     6470039 |  -2.11 |
    | 436.cactusADM  |        4571319 |     4532607 |  -0.85 |
    | 437.leslie3d   |         492197 |      492357 |   0.03 |
    | 444.namd       |        1001921 |     1007001 |   0.51 |
    | 445.gobmk      |        8193495 |     8163839 |  -0.36 |
    | 450.soplex     |        5565070 |     5530734 |  -0.62 |
    | 453.povray     |        7468446 |     7340142 |  -1.72 |
    | 454.calculix   |        8474754 |     8464954 |  -0.12 |
    | 456.hmmer      |        1662315 |     1650147 |  -0.73 |
    | 458.sjeng      |         623065 |      620817 |  -0.36 |
    | 459.GemsFDTD   |        1456669 |     1461573 |   0.34 |
    | 462.libquantum |         249809 |      248401 |  -0.56 |
    | 464.h264ref    |        2784806 |     2772806 |  -0.43 |
    | 465.tonto      |       15511395 |    15480899 |  -0.20 |
    | 470.lbm        |          64327 |       64215 |  -0.17 |
    | 471.omnetpp    |        5325418 |     5293874 |  -0.59 |
    | 473.astar      |         365853 |      363261 |  -0.71 |
    | 481.wrf        |       22002287 |    21950783 |  -0.23 |
    | 482.sphinx3    |        1153616 |     1145248 |  -0.73 |
    | 483.xalancbmk  |       62458676 |    62001540 |  -0.73 |
    |----------------+----------------+-------------+--------|
    | TOTAL          |      221535374 |   220130550 |  -0.63 |

I have successfully bootstrapped and tested the patch on
x86-64-linux.  Is it OK for trunk?  Or should I also examine some
other aspect?

Thanks,

Martin


2013-03-28  Martin Jambor  <mjambor@suse.cz>

	PR middle-end/10474
	* passes.c (init_optimization_passes): Move pass_cprop_hardreg before
	pass_thread_prologue_and_epilogue.

testsuite/
	* gcc.dg/pr10474.c: New test.


Index: src/gcc/passes.c
===================================================================
--- src.orig/gcc/passes.c
+++ src/gcc/passes.c
@@ -1630,6 +1630,7 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_ree);
 	  NEXT_PASS (pass_compare_elim_after_reload);
 	  NEXT_PASS (pass_branch_target_load_optimize1);
+	  NEXT_PASS (pass_cprop_hardreg);
 	  NEXT_PASS (pass_thread_prologue_and_epilogue);
 	  NEXT_PASS (pass_rtl_dse2);
 	  NEXT_PASS (pass_stack_adjustments);
@@ -1637,7 +1638,6 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_peephole2);
 	  NEXT_PASS (pass_if_after_reload);
 	  NEXT_PASS (pass_regrename);
-	  NEXT_PASS (pass_cprop_hardreg);
 	  NEXT_PASS (pass_fast_rtl_dce);
 	  NEXT_PASS (pass_reorder_blocks);
 	  NEXT_PASS (pass_branch_target_load_optimize2);
Index: src/gcc/testsuite/gcc.dg/pr10474.c
===================================================================
--- /dev/null
+++ src/gcc/testsuite/gcc.dg/pr10474.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-rtl-pro_and_epilogue"  } */
+
+void f(int *i)
+{
+	if (!i)
+		return;
+	else
+	{
+		__builtin_printf("Hi");
+		*i=0;
+	}
+}
+
+/* { dg-final { scan-rtl-dump "Performing shrink-wrapping" "pro_and_epilogue"  } } */
+/* { dg-final { cleanup-rtl-dump "pro_and_epilogue" } } */


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]