This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[PATCH v4] Repeat jump threading after combine

From: Ilya Leoshkevich <iii at linux dot ibm dot com>
To: gcc-patches at gcc dot gnu dot org
Cc: krebbel at linux dot ibm dot com, rdapp at linux dot ibm dot com, segher at kernel dot crashing dot org, Ilya Leoshkevich <iii at linux dot ibm dot com>
Date: Mon, 26 Nov 2018 13:11:40 +0100
Subject: [PATCH v4] Repeat jump threading after combine

Bootstrapped and regtested on x86_64-redhat-linux, s390x-redhat-linux
and ppc64le-redhat-linux.

Previous iteration:
https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00495.html

In the end, the main question was: does this make the code better on
architectures other than s390?
https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00993.html

Not sure whether it's already too late for this one, but I'd like to at
least post the updated code, my observations and SPEC CPU results.

- Code size decreases in most cases.  In general, the main side-effect of
this patch is that after jump threading bbro pass builds different
traces and reorders and merges basic blocks differently:

# x86_64-redhat-linux:
436.cactusADM 274479  insns -528 smaller    # maximum decrease
526.blender_r 2773303 insns -203 smaller
502.gcc_r     2262388 insns -142 smaller
403.gcc       815367  insns -106 smaller
...
525.x264_r    174450  insns +10 bigger      # maximum increase

# ppc64le-redhat-linux:
526.blender_r   3422613 insns -276 smaller  # maximum decrease
521.wrf_r       6008722 insns -228 smaller
520.omnetpp_r   612626  insns -52 smaller
...
435.gromacs     338597  insns +16 bigger    # maximum increase

- Compilation performance did not seem to have been affected in a
measurable way.  According to -ftime-report, the total user time of
SPEC CPU build used to be 26018s, and now it is 25985s, the difference
being -0.12%.

- Run time differences are all over the place:

# x86_64-redhat-linux:
548.exchange2_r -1.82%
541.leela_r     -1.59%
538.imagick_r   -0.95%
520.omnetpp_r   -0.94%
403.gcc         -0.76%
447.dealII      -0.58%
526.blender_r   -0.56%
450.soplex      -0.51%
# skip |dt| < 0.5%
523.xalancbmk_r +0.52%
416.gamess      +0.61%
503.bwaves_r    +0.62%
445.gobmk       +0.66%
456.hmmer       +0.70%
549.fotonik3d_r +0.74%
471.omnetpp     +0.99%
459.GemsFDTD    +1.09%
554.roms_r      +1.30%
500.perlbench_r +1.56%
483.xalancbmk   +1.60%

# ppc64le-redhat-linux:
511.povray_r       -1.29%
482.sphinx3        -0.65%
456.hmmer          -0.53%
519.lbm_r          -0.51%
# skip |dt| < 0.5%
549.fotonik3d_r    +1.13%
403.gcc            +1.76%
500.perlbench_r    +2.35%

I've investigated 483.xalancbmk and 500.perlbench_r regressions on
x86_64. 

Even though the total 483.xalancbmk size slightly decreases, we get 4%
more icache misses and 25% more stalls because of that.  I couldn't
pinpoint that to a certain function or line of code - can this be due to
somehow generally worsened locality?

500.perlbench_r has 25% more indirect branch mispedicts, particularly,
when perl_run ends up calling Perl_pp_rv2av, Perl_pp_gvsv and
Perl_pp_nextstate.  I have to admit I don't know what could have caused
that.



Consider the following RTL:

(insn (set (reg 65) (if_then_else (eq %cc 0) 1 0)))
(insn (parallel [(set %cc (compare (reg 65) 0)) (clobber %scratch)]))
(jump_insn (set %pc (if_then_else (ne %cc 0) (label_ref 23) %pc)))

Combine simplifies this into:

(note NOTE_INSN_DELETED)
(note NOTE_INSN_DELETED)
(jump_insn (set %pc (if_then_else (eq %cc 0) (label_ref 23) %pc)))

opening up the possibility to perform jump threading.

gcc/ChangeLog:

2018-09-19  Ilya Leoshkevich  <iii@linux.ibm.com>

	PR target/80080
	* cfgcleanup.c (class pass_postreload_jump): New pass.
	(pass_postreload_jump::execute): Likewise.
	(make_pass_postreload_jump): Likewise.
	* passes.def: Add pass_postreload_jump before
	pass_postreload_cse.
	* tree-pass.h (make_pass_postreload_jump): New pass.

gcc/testsuite/ChangeLog:

2018-09-05  Ilya Leoshkevich  <iii@linux.ibm.com>

	PR target/80080
	* gcc.target/s390/pr80080-4.c: New test.
---
 gcc/cfgcleanup.c                          | 42 +++++++++++++++++++++++
 gcc/passes.def                            |  1 +
 gcc/testsuite/gcc.target/s390/pr80080-4.c | 16 +++++++++
 gcc/tree-pass.h                           |  1 +
 4 files changed, 60 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/pr80080-4.c

diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
index 4a5dc29d14f..bc4a78889db 100644
--- a/gcc/cfgcleanup.c
+++ b/gcc/cfgcleanup.c
@@ -3259,6 +3259,48 @@ make_pass_jump (gcc::context *ctxt)
 
 namespace {
 
+const pass_data pass_data_postreload_jump =
+{
+  RTL_PASS, /* type */
+  "postreload_jump", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_JUMP, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_postreload_jump : public rtl_opt_pass
+{
+public:
+  pass_postreload_jump (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_postreload_jump, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual unsigned int execute (function *);
+
+}; // class pass_postreload_jump
+
+unsigned int
+pass_postreload_jump::execute (function *)
+{
+  cleanup_cfg (flag_thread_jumps ? CLEANUP_THREADING : 0);
+  return 0;
+}
+
+} // anon namespace
+
+rtl_opt_pass *
+make_pass_postreload_jump (gcc::context *ctxt)
+{
+  return new pass_postreload_jump (ctxt);
+}
+
+namespace {
+
 const pass_data pass_data_jump2 =
 {
   RTL_PASS, /* type */
diff --git a/gcc/passes.def b/gcc/passes.def
index 82ad9404b9e..0079fecef32 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -458,6 +458,7 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_reload);
       NEXT_PASS (pass_postreload);
       PUSH_INSERT_PASSES_WITHIN (pass_postreload)
+	  NEXT_PASS (pass_postreload_jump);
 	  NEXT_PASS (pass_postreload_cse);
 	  NEXT_PASS (pass_gcse2);
 	  NEXT_PASS (pass_split_after_reload);
diff --git a/gcc/testsuite/gcc.target/s390/pr80080-4.c b/gcc/testsuite/gcc.target/s390/pr80080-4.c
new file mode 100644
index 00000000000..5fc6a558008
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/pr80080-4.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { lp64 } } } */
+/* { dg-options "-march=z196 -O2" } */
+
+extern void bar(int *mem);
+
+void foo4(int *mem)
+{
+  int oldval = 0;
+  if (!__atomic_compare_exchange_n (mem, (void *) &oldval, 1,
+				    1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    {
+      bar (mem);
+    }
+}
+
+/* { dg-final { scan-assembler {(?n)\n\tlt\t.*\n\tjne\t(\.L\d+)\n(.*\n)*\tcs\t.*\n\tber\t%r14\n\1:\n\tjg\tbar\n} } } */
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 2f8779ee4b8..b20d34c15e9 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -579,6 +579,7 @@ extern rtl_opt_pass *make_pass_clean_state (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_branch_prob (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_value_profile_transformations (gcc::context
 							      *ctxt);
+extern rtl_opt_pass *make_pass_postreload_jump (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_postreload_cse (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_gcse2 (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_split_after_reload (gcc::context *ctxt);
-- 
2.19.1

Follow-Ups:
- Re: [PATCH v4] Repeat jump threading after combine
  - From: Segher Boessenkool

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]