This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH v4] Repeat jump threading after combine
- From: Ilya Leoshkevich <iii at linux dot ibm dot com>
- To: gcc-patches at gcc dot gnu dot org
- Cc: krebbel at linux dot ibm dot com, rdapp at linux dot ibm dot com, segher at kernel dot crashing dot org, Ilya Leoshkevich <iii at linux dot ibm dot com>
- Date: Mon, 26 Nov 2018 13:11:40 +0100
- Subject: [PATCH v4] Repeat jump threading after combine
Bootstrapped and regtested on x86_64-redhat-linux, s390x-redhat-linux
and ppc64le-redhat-linux.
Previous iteration:
https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00495.html
In the end, the main question was: does this make the code better on
architectures other than s390?
https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00993.html
Not sure whether it's already too late for this one, but I'd like to at
least post the updated code, my observations and SPEC CPU results.
- Code size decreases in most cases. In general, the main side-effect of
this patch is that after jump threading bbro pass builds different
traces and reorders and merges basic blocks differently:
# x86_64-redhat-linux:
436.cactusADM 274479 insns -528 smaller # maximum decrease
526.blender_r 2773303 insns -203 smaller
502.gcc_r 2262388 insns -142 smaller
403.gcc 815367 insns -106 smaller
...
525.x264_r 174450 insns +10 bigger # maximum increase
# ppc64le-redhat-linux:
526.blender_r 3422613 insns -276 smaller # maximum decrease
521.wrf_r 6008722 insns -228 smaller
520.omnetpp_r 612626 insns -52 smaller
...
435.gromacs 338597 insns +16 bigger # maximum increase
- Compilation performance did not seem to have been affected in a
measurable way. According to -ftime-report, the total user time of
SPEC CPU build used to be 26018s, and now it is 25985s, the difference
being -0.12%.
- Run time differences are all over the place:
# x86_64-redhat-linux:
548.exchange2_r -1.82%
541.leela_r -1.59%
538.imagick_r -0.95%
520.omnetpp_r -0.94%
403.gcc -0.76%
447.dealII -0.58%
526.blender_r -0.56%
450.soplex -0.51%
# skip |dt| < 0.5%
523.xalancbmk_r +0.52%
416.gamess +0.61%
503.bwaves_r +0.62%
445.gobmk +0.66%
456.hmmer +0.70%
549.fotonik3d_r +0.74%
471.omnetpp +0.99%
459.GemsFDTD +1.09%
554.roms_r +1.30%
500.perlbench_r +1.56%
483.xalancbmk +1.60%
# ppc64le-redhat-linux:
511.povray_r -1.29%
482.sphinx3 -0.65%
456.hmmer -0.53%
519.lbm_r -0.51%
# skip |dt| < 0.5%
549.fotonik3d_r +1.13%
403.gcc +1.76%
500.perlbench_r +2.35%
I've investigated 483.xalancbmk and 500.perlbench_r regressions on
x86_64.
Even though the total 483.xalancbmk size slightly decreases, we get 4%
more icache misses and 25% more stalls because of that. I couldn't
pinpoint that to a certain function or line of code - can this be due to
somehow generally worsened locality?
500.perlbench_r has 25% more indirect branch mispedicts, particularly,
when perl_run ends up calling Perl_pp_rv2av, Perl_pp_gvsv and
Perl_pp_nextstate. I have to admit I don't know what could have caused
that.
Consider the following RTL:
(insn (set (reg 65) (if_then_else (eq %cc 0) 1 0)))
(insn (parallel [(set %cc (compare (reg 65) 0)) (clobber %scratch)]))
(jump_insn (set %pc (if_then_else (ne %cc 0) (label_ref 23) %pc)))
Combine simplifies this into:
(note NOTE_INSN_DELETED)
(note NOTE_INSN_DELETED)
(jump_insn (set %pc (if_then_else (eq %cc 0) (label_ref 23) %pc)))
opening up the possibility to perform jump threading.
gcc/ChangeLog:
2018-09-19 Ilya Leoshkevich <iii@linux.ibm.com>
PR target/80080
* cfgcleanup.c (class pass_postreload_jump): New pass.
(pass_postreload_jump::execute): Likewise.
(make_pass_postreload_jump): Likewise.
* passes.def: Add pass_postreload_jump before
pass_postreload_cse.
* tree-pass.h (make_pass_postreload_jump): New pass.
gcc/testsuite/ChangeLog:
2018-09-05 Ilya Leoshkevich <iii@linux.ibm.com>
PR target/80080
* gcc.target/s390/pr80080-4.c: New test.
---
gcc/cfgcleanup.c | 42 +++++++++++++++++++++++
gcc/passes.def | 1 +
gcc/testsuite/gcc.target/s390/pr80080-4.c | 16 +++++++++
gcc/tree-pass.h | 1 +
4 files changed, 60 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/s390/pr80080-4.c
diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
index 4a5dc29d14f..bc4a78889db 100644
--- a/gcc/cfgcleanup.c
+++ b/gcc/cfgcleanup.c
@@ -3259,6 +3259,48 @@ make_pass_jump (gcc::context *ctxt)
namespace {
+const pass_data pass_data_postreload_jump =
+{
+ RTL_PASS, /* type */
+ "postreload_jump", /* name */
+ OPTGROUP_NONE, /* optinfo_flags */
+ TV_JUMP, /* tv_id */
+ 0, /* properties_required */
+ 0, /* properties_provided */
+ 0, /* properties_destroyed */
+ 0, /* todo_flags_start */
+ 0, /* todo_flags_finish */
+};
+
+class pass_postreload_jump : public rtl_opt_pass
+{
+public:
+ pass_postreload_jump (gcc::context *ctxt)
+ : rtl_opt_pass (pass_data_postreload_jump, ctxt)
+ {}
+
+ /* opt_pass methods: */
+ virtual unsigned int execute (function *);
+
+}; // class pass_postreload_jump
+
+unsigned int
+pass_postreload_jump::execute (function *)
+{
+ cleanup_cfg (flag_thread_jumps ? CLEANUP_THREADING : 0);
+ return 0;
+}
+
+} // anon namespace
+
+rtl_opt_pass *
+make_pass_postreload_jump (gcc::context *ctxt)
+{
+ return new pass_postreload_jump (ctxt);
+}
+
+namespace {
+
const pass_data pass_data_jump2 =
{
RTL_PASS, /* type */
diff --git a/gcc/passes.def b/gcc/passes.def
index 82ad9404b9e..0079fecef32 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -458,6 +458,7 @@ along with GCC; see the file COPYING3. If not see
NEXT_PASS (pass_reload);
NEXT_PASS (pass_postreload);
PUSH_INSERT_PASSES_WITHIN (pass_postreload)
+ NEXT_PASS (pass_postreload_jump);
NEXT_PASS (pass_postreload_cse);
NEXT_PASS (pass_gcse2);
NEXT_PASS (pass_split_after_reload);
diff --git a/gcc/testsuite/gcc.target/s390/pr80080-4.c b/gcc/testsuite/gcc.target/s390/pr80080-4.c
new file mode 100644
index 00000000000..5fc6a558008
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/pr80080-4.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { lp64 } } } */
+/* { dg-options "-march=z196 -O2" } */
+
+extern void bar(int *mem);
+
+void foo4(int *mem)
+{
+ int oldval = 0;
+ if (!__atomic_compare_exchange_n (mem, (void *) &oldval, 1,
+ 1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+ {
+ bar (mem);
+ }
+}
+
+/* { dg-final { scan-assembler {(?n)\n\tlt\t.*\n\tjne\t(\.L\d+)\n(.*\n)*\tcs\t.*\n\tber\t%r14\n\1:\n\tjg\tbar\n} } } */
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 2f8779ee4b8..b20d34c15e9 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -579,6 +579,7 @@ extern rtl_opt_pass *make_pass_clean_state (gcc::context *ctxt);
extern rtl_opt_pass *make_pass_branch_prob (gcc::context *ctxt);
extern rtl_opt_pass *make_pass_value_profile_transformations (gcc::context
*ctxt);
+extern rtl_opt_pass *make_pass_postreload_jump (gcc::context *ctxt);
extern rtl_opt_pass *make_pass_postreload_cse (gcc::context *ctxt);
extern rtl_opt_pass *make_pass_gcse2 (gcc::context *ctxt);
extern rtl_opt_pass *make_pass_split_after_reload (gcc::context *ctxt);
--
2.19.1