For s390, I am now using the c11 atomic builtins in glibc. There are now some issues with the emitted cs-instructions. If __atomic_compare_exchange_n is used within a condition for if/while, it is sometimes not using the condition code directly to jump away. Instead it extracts the condition code to a general register via ipm followed by further instructions in order to compare it. Afterwards it jumps according to this comparison. int foo1 (int *mem) { int val, newval; val = __atomic_load_n (mem, __ATOMIC_RELAXED); do { newval = val | 123; } while (!__atomic_compare_exchange_n (mem, &val, newval, 1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)); /* 0000000000000000 <foo>: 0: 58 10 20 00 l %r1,0(%r2) 4: 18 31 lr %r3,%r1 6: a5 3b 00 7b oill %r3,123 a: ba 13 20 00 cs %r1,%r3,0(%r2) e: b2 22 00 30 ipm %r3 12: 8a 30 00 1c sra %r3,28 16: ec 36 ff f7 00 7e cijne %r3,0,4 <foo+0x4> 1c: a7 29 00 00 lghi %r2,0 20: 07 fe br %r14 22: 07 07 nopr %r7 24: 07 07 nopr %r7 26: 07 07 nopr %r7 */ return 0; } For __atomic_exchange_n, s390 has no special instruction and thus a cs-loop is used. An extra register is needed to save the old-value instead of using the "old"-register of cs-instruction which is updated if the new value is not stored to memory. This extra register is reloaded in every loop. extern int bar (int *mem); int foo2 (int *mem) { int old = __atomic_exchange_n (mem, 0, __ATOMIC_ACQUIRE); if (old >= 2) return bar (mem); /* 0000000000000028 <foo2>: 28: a7 48 00 00 lhi %r4,0 2c: 58 10 20 00 l %r1,0(%r2) 30: 18 31 lr %r3,%r1 32: ba 14 20 00 cs %r1,%r4,0(%r2) 36: a7 74 ff fd jne 30 <foo2+0x8> 3a: ec 3c 00 06 01 7e cijnh %r3,1,46 <foo2+0x1e> 40: c0 f4 00 00 00 00 jg 40 <foo2+0x18> 42: R_390_PC32DBL bar+0x2 46: a7 29 00 00 lghi %r2,0 4a: 07 fe br %r14 4c: 07 07 nopr %r7 4e: 07 07 nopr %r7 */ return 0; } In case of exchanging to a zero value, like above, the load-and-and instruction can be used. Then only one register is needed for the new and old value and there is no loop. If the exchanged memory is a global variable, the address of it is loaded within the loop instead of before the loop. extern int foo3_mem; int foo3 (void) { return __atomic_exchange_n (&foo3_mem, 5, __ATOMIC_ACQUIRE); /* 0000000000000050 <foo3>: 50: c4 1d 00 00 00 00 lrl %r1,50 <foo3> 52: R_390_PC32DBL foo3_mem+0x2 56: a7 38 00 05 lhi %r3,5 5a: 18 21 lr %r2,%r1 5c: c0 40 00 00 00 00 larl %r4,5c <foo3+0xc> 5e: R_390_PC32DBL foo3_mem+0x2 62: ba 13 40 00 cs %r1,%r3,0(%r4) 66: a7 74 ff fa jne 5a <foo3+0xa> 6a: b9 14 00 22 lgfr %r2,%r2 6e: 07 fe br %r14 */ }
Is this a duplicate of PR79981 (and thus fixed in trunk)?
(In reply to Richard Biener from comment #1) > Is this a duplicate of PR79981 (and thus fixed in trunk)? No it is a different problem. We are working on a fix.
Most of this can be fixed in the backend, but all attempts to get rid of the additional load ("lr %r3,%31") in "bar" within the backend have failed. This is caused by overly complicated code being generated in expand_compare_and_swap_loop(). I'm currently testing the patch below which eliminates the superfluous variable cmp_reg (and the register allocated for it). Is there any chance to get something like this into Gcc7? The patch could possibly also help other targets. (Will post on gcc-patches when the tests are complete.) --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -5798,17 +5798,15 @@ find_cc_set (rtx x, const_rtx pat, void *data) static bool expand_compare_and_swap_loop (rtx mem, rtx old_reg, rtx new_reg, rtx seq) { - machine_mode mode = GET_MODE (mem); rtx_code_label *label; - rtx cmp_reg, success, oldval; + rtx success, oldval; /* The loop we want to generate looks like - cmp_reg = mem; + old_reg = mem; label: - old_reg = cmp_reg; seq; - (success, cmp_reg) = compare-and-swap(mem, old_reg, new_reg) + (success, old_reg) = compare-and-swap(mem, old_reg, new_reg) if (success) goto label; @@ -5816,23 +5814,21 @@ expand_compare_and_swap_loop (rtx mem, rtx old_reg, rtx new_reg, rtx seq) iterations use the value loaded by the compare-and-swap pattern. */ label = gen_label_rtx (); - cmp_reg = gen_reg_rtx (mode); - emit_move_insn (cmp_reg, mem); + emit_move_insn (old_reg, mem); emit_label (label); - emit_move_insn (old_reg, cmp_reg); if (seq) emit_insn (seq); success = NULL_RTX; - oldval = cmp_reg; + oldval = old_reg; if (!expand_atomic_compare_and_swap (&success, &oldval, mem, old_reg, new_reg, false, MEMMODEL_SYNC_SEQ_CST, MEMMODEL_RELAXED)) return false; - if (oldval != cmp_reg) - emit_move_insn (cmp_reg, oldval); + if (oldval != old_reg) + emit_move_insn (old_reg, oldval); /* Mark this jump predicted not taken. */ emit_cmp_and_jump_insns (success, const0_rtx, EQ, const0_rtx,
(In reply to Dominik Vogt from comment #3) > Most of this can be fixed in the backend, but all attempts to get rid of the > additional load ("lr %r3,%31") in "bar" within the backend have failed. > This is caused by overly complicated code being generated in > expand_compare_and_swap_loop(). I'm currently testing the patch below which > eliminates the superfluous variable cmp_reg (and the register allocated for > it). Is there any chance to get something like this into Gcc7? The patch > could possibly also help other targets. (Will post on gcc-patches when the > tests are complete.) > > > --- a/gcc/optabs.c > +++ b/gcc/optabs.c > @@ -5798,17 +5798,15 @@ find_cc_set (rtx x, const_rtx pat, void *data) > static bool > expand_compare_and_swap_loop (rtx mem, rtx old_reg, rtx new_reg, rtx seq) > { > - machine_mode mode = GET_MODE (mem); > rtx_code_label *label; > - rtx cmp_reg, success, oldval; > + rtx success, oldval; > > /* The loop we want to generate looks like > > - cmp_reg = mem; > + old_reg = mem; > label: > - old_reg = cmp_reg; > seq; > - (success, cmp_reg) = compare-and-swap(mem, old_reg, new_reg) > + (success, old_reg) = compare-and-swap(mem, old_reg, new_reg) > if (success) > goto label; > > @@ -5816,23 +5814,21 @@ expand_compare_and_swap_loop (rtx mem, rtx old_reg, > rtx new_reg, rtx seq) > iterations use the value loaded by the compare-and-swap pattern. */ > > label = gen_label_rtx (); > - cmp_reg = gen_reg_rtx (mode); > > - emit_move_insn (cmp_reg, mem); > + emit_move_insn (old_reg, mem); > emit_label (label); > - emit_move_insn (old_reg, cmp_reg); > if (seq) > emit_insn (seq); > > success = NULL_RTX; > - oldval = cmp_reg; > + oldval = old_reg; > if (!expand_atomic_compare_and_swap (&success, &oldval, mem, old_reg, > new_reg, false, MEMMODEL_SYNC_SEQ_CST, > MEMMODEL_RELAXED)) > return false; > > - if (oldval != cmp_reg) > - emit_move_insn (cmp_reg, oldval); > + if (oldval != old_reg) > + emit_move_insn (old_reg, oldval); > > /* Mark this jump predicted not taken. */ > emit_cmp_and_jump_insns (success, const0_rtx, EQ, const0_rtx, That doesn't look correct. expand_compare_and_swap_loop is called from two places and at least in the first case it wants old_reg to contain on success something that is the target that will be returned. While with your patch it might be something different.
What case do you mean? The + if (oldval != old_reg) + emit_move_insn (old_reg, oldval); at the end should make sure that the oldval-rtx is either not changed by the call, or its value is copied into old_reg at runtime. In both cases, the value of old_reg should be the old value to be returned (as it was hopefully before the patch). Me and Stefan have spent quite some time to proof read this patch; do you see somethings we might have overlooked?
I think it depends on what (success, old_reg) = compare-and-swap(mem, old_reg, new_reg) sets if success is true. Is there a guarantee that old_reg will be assigned whatever has been passed as the second argument in that case on all targets? If yes, then it can work. If not, then it will return a different value. So I think you need to analyze all targets that have HW compare-and-swap and verify that.
(In reply to Jakub Jelinek from comment #6) > I think it depends on what > (success, old_reg) = compare-and-swap(mem, old_reg, new_reg) > sets if success is true. Is there a guarantee that old_reg will be assigned > whatever has been passed as the second argument in that case on all targets? Isn't that part of the interface of the atomic_compare_and_swap pattern? Gccinternals: > Operand 1 is an output operand which is set to the contents of the memory > before the operation was attempted. Operand 3 is the value that is > expected to be in memory. If the CS instruction on a target does not overwrite the register old_reg with the memory value before swapping, the target's backend must do that manually, I think.
The patch has a performance regression on s390x. .L1 lr %r3,%r1 cs %r1,%r4,0(%r2) jne .L1 becomes .L1 cs %r1,%r3,0(%r2) ipm %r4 sra %r4,28 cijne %r4,0,.L1 Although this can be fixed on s390/s390x in a follow-up patch, I cannot really prove that something similar won't happen on other targets without testing them.
Author: krebbel Date: Tue Apr 25 07:37:50 2017 New Revision: 247132 URL: https://gcc.gnu.org/viewcvs?rev=247132&root=gcc&view=rev Log: S/390: PR80080: Optimize atomic patterns. The attached patch optimizes the atomic_exchange and atomic_compare patterns on s390 and s390x (mostly limited to SImode and DImode). Among general optimizaation, the changes fix most of the problems reported in PR 80080: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080 gcc/ChangeLog: 2017-04-25 Dominik Vogt <vogt@linux.vnet.ibm.com> PR target/80080 * s390-protos.h (s390_expand_cs_hqi): Removed. (s390_expand_cs, s390_expand_atomic_exchange_tdsi): New prototypes. * config/s390/s390.c (s390_emit_compare_and_swap): Handle all integer modes as well as CCZ1mode and CCZmode. (s390_expand_atomic_exchange_tdsi, s390_expand_atomic): Adapt to new signature of s390_emit_compare_and_swap. (s390_expand_cs_hqi): Likewise, make static. (s390_expand_cs_tdsi): Generate an explicit compare before trying compare-and-swap, in some cases. (s390_expand_cs): Wrapper function. (s390_expand_atomic_exchange_tdsi): New backend specific expander for atomic_exchange. (s390_match_ccmode_set): Allow CCZmode <-> CCZ1 mode. * config/s390/s390.md ("atomic_compare_and_swap<mode>"): Merge the patterns for small and large integers. Forbid symref memory operands. Move expander to s390.c. Require cc register. ("atomic_compare_and_swap<DGPR:mode><CCZZ1:mode>_internal") ("*atomic_compare_and_swap<TDI:mode><CCZZ1:mode>_1") ("*atomic_compare_and_swapdi<CCZZ1:mode>_2") ("*atomic_compare_and_swapsi<CCZZ1:mode>_3"): Use s_operand to forbid symref memory operands. Remove CC mode and call s390_match_ccmode instead. ("atomic_exchange<mode>"): Allow and implement all integer modes. gcc/testsuite/ChangeLog: 2017-04-25 Dominik Vogt <vogt@linux.vnet.ibm.com> PR target/80080 * gcc.target/s390/md/atomic_compare_exchange-1.c: New test. * gcc.target/s390/md/atomic_compare_exchange-1.inc: New test. * gcc.target/s390/md/atomic_exchange-1.inc: New test. Added: trunk/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c trunk/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc trunk/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/s390/s390-protos.h trunk/gcc/config/s390/s390.c trunk/gcc/config/s390/s390.md trunk/gcc/testsuite/ChangeLog
Author: krebbel Date: Tue Apr 25 11:11:48 2017 New Revision: 247189 URL: https://gcc.gnu.org/viewcvs?rev=247189&root=gcc&view=rev Log: S/390: PR80080: Optimize atomic patterns. The attached patch optimizes the atomic_exchange and atomic_compare patterns on s390 and s390x (mostly limited to SImode and DImode). Among general optimizaation, the changes fix most of the problems reported in PR 80080: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080 gcc/ChangeLog: 2017-04-25 Dominik Vogt <vogt@linux.vnet.ibm.com> Backport from mainline 2017-04-25 Dominik Vogt <vogt@linux.vnet.ibm.com> PR target/80080 * s390-protos.h (s390_expand_cs_hqi): Removed. (s390_expand_cs, s390_expand_atomic_exchange_tdsi): New prototypes. * config/s390/s390.c (s390_emit_compare_and_swap): Handle all integer modes as well as CCZ1mode and CCZmode. (s390_expand_atomic_exchange_tdsi, s390_expand_atomic): Adapt to new signature of s390_emit_compare_and_swap. (s390_expand_cs_hqi): Likewise, make static. (s390_expand_cs_tdsi): Generate an explicit compare before trying compare-and-swap, in some cases. (s390_expand_cs): Wrapper function. (s390_expand_atomic_exchange_tdsi): New backend specific expander for atomic_exchange. (s390_match_ccmode_set): Allow CCZmode <-> CCZ1 mode. * config/s390/s390.md ("atomic_compare_and_swap<mode>"): Merge the patterns for small and large integers. Forbid symref memory operands. Move expander to s390.c. Require cc register. ("atomic_compare_and_swap<DGPR:mode><CCZZ1:mode>_internal") ("*atomic_compare_and_swap<TDI:mode><CCZZ1:mode>_1") ("*atomic_compare_and_swapdi<CCZZ1:mode>_2") ("*atomic_compare_and_swapsi<CCZZ1:mode>_3"): Use s_operand to forbid symref memory operands. Remove CC mode and call s390_match_ccmode instead. ("atomic_exchange<mode>"): Allow and implement all integer modes. gcc/testsuite/ChangeLog: 2017-04-25 Dominik Vogt <vogt@linux.vnet.ibm.com> Backport from mainline 2017-04-25 Dominik Vogt <vogt@linux.vnet.ibm.com> PR target/80080 * gcc.target/s390/md/atomic_compare_exchange-1.c: New test. * gcc.target/s390/md/atomic_compare_exchange-1.inc: New test. * gcc.target/s390/md/atomic_exchange-1.inc: New test. Added: branches/gcc-7-branch/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c branches/gcc-7-branch/gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc branches/gcc-7-branch/gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/config/s390/s390-protos.h branches/gcc-7-branch/gcc/config/s390/s390.c branches/gcc-7-branch/gcc/config/s390/s390.md branches/gcc-7-branch/gcc/testsuite/ChangeLog
Hi, I've retested the samples with gcc 7, 8 and head from 2018-07-20, but there are still issues: The examples foo1 and foo2 are okay. The issue in example foo3 is still present (see description of the bug-report): 00000000000000a0 <foo3>: a0: a7 18 00 05 lhi %r1,5 a4: c4 2d 00 00 00 00 lrl %r2,a4 <foo3+0x4> a6: R_390_PC32DBL foo3_mem+0x2 aa: c0 30 00 00 00 00 larl %r3,aa <foo3+0xa> ac: R_390_PC32DBL foo3_mem+0x2 b0: ba 21 30 00 cs %r2,%r1,0(%r3) b4: a7 74 ff fb jne aa <foo3+0xa> The address of the global variable is still reloaded within the loop. If the value was not swapped with cs, the jne can jump directly to the cs instruction instead of the larl-instruction. b8: b9 14 00 22 lgfr %r2,%r2 bc: 07 fe br %r14 be: 07 07 nopr %r7 I've found a further issue which is observable with the following two examples. See the questions in the disassembly: void foo4(int *mem) { int oldval = 0; if (!__atomic_compare_exchange_n (mem, (void *) &oldval, 1, 1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) { bar (mem); } /* 0000000000000000 <foo4>: 0: e3 10 20 00 00 12 lt %r1,0(%r2) 6: a7 74 00 06 jne 12 <foo4+0x12> Why do we need to jump to 0x12 first instead of directly jumping to 0x18? a: a7 38 00 01 lhi %r3,1 e: ba 13 20 00 cs %r1,%r3,0(%r2) 12: a7 74 00 03 jne 18 <foo4+0x18> 16: 07 fe br %r14 18: c0 f4 00 00 00 00 jg 18 <foo4+0x18> 1a: R_390_PC32DBL bar+0x2 1e: 07 07 nopr %r7 */ } void foo5(int *mem) { int oldval = 0; __atomic_compare_exchange_n (mem, (void *) &oldval, 1, 1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); if (oldval != 0) bar (mem); /* 0000000000000040 <foo5>: 40: e3 10 20 00 00 12 lt %r1,0(%r2) 46: a7 74 00 06 jne 52 <foo5+0x12> This is similar to foo4, but the variable oldval is compared against zero instead of using the return value of __atomic_compare_exchange_n. Can't we jump directly to 0x5a instead of 0x52? 4a: a7 38 00 01 lhi %r3,1 4e: ba 13 20 00 cs %r1,%r3,0(%r2) 52: 12 11 ltr %r1,%r1 54: a7 74 00 03 jne 5a <foo5+0x1a> 58: 07 fe br %r14 5a: c0 f4 00 00 00 00 jg 5a <foo5+0x1a> 5c: R_390_PC32DBL bar+0x2 */ }
I've investigated foo3, foo4 and foo5, and came to the following conclusions: When foo3 is compiled with -march=z10 or later, cprop1 pass propagates global's SYMBOL_REF value into UNSPECV_CAS. On previous machines it does not happen, because the result is rejected by insn_invalid_p (). Then, reload realizes that SYMBOL_REF cannot be a legitimate UNSPECV_CAS argument, and loads it into a pseudo right before. The net result is that loading of SYMBOL_REF is moved from outside of the loop into the loop. So we need to somehow inhibit constant propagation for this case. Jump threading in foo4 does not work, because it's done only during `jump' pass, at which point there are insns with side-effects in the basic block of the 2nd jump. They are later deleted by the `combine' pass, but we don't request CLEANUP_THREADING after that. I wonder if we could introduce it? In addition, when foo4 is compiled with -O2 or -O3, we don't use conditional return, because our return sequence contains a PARALLEL, which is rejected by bb_is_just_return (). This can also be improved. Finally, in foo5 `cs' is generated by s390_expand_cs_tdsi (), and comparison is generated by common expansion logic, so it doesn't look possible to improve the situation solely in the back-end. We need to somehow make gcc aware that (oldval == 0) and (retval != 0) are equivalent after `cs', but I'm not sure at which point we could and should do this - in theory doing this on tree rather than RTL level can help other architectures.
Author: krebbel Date: Thu Sep 6 07:35:35 2018 New Revision: 264142 URL: https://gcc.gnu.org/viewcvs?rev=264142&root=gcc&view=rev Log: S/390: Register pass_s390_early_mach statically The dump file used to come at the end of the sorted dump file list, because the pass was registered dynamically. This did not reflect the order in which passes are executed. Static registration fixes this: * foo4.c.277r.split2 * foo4.c.281r.early_mach * foo4.c.282r.pro_and_epilogue gcc/ChangeLog: 2018-09-06 Ilya Leoshkevich <iii@linux.ibm.com> PR target/80080 * config/s390/s390-passes.def: New file. * config/s390/s390-protos.h (class rtl_opt_pass): Add forward declaration. (make_pass_s390_early_mach): Add declaration. * config/s390/s390.c (make_pass_s390_early_mach): (s390_option_override): Remove dynamic registration. * config/s390/t-s390: Add s390-passes.def. Added: trunk/gcc/config/s390/s390-passes.def Modified: trunk/gcc/ChangeLog trunk/gcc/config/s390/s390-protos.h trunk/gcc/config/s390/s390.c trunk/gcc/config/s390/t-s390
Author: krebbel Date: Thu Sep 6 07:38:42 2018 New Revision: 264143 URL: https://gcc.gnu.org/viewcvs?rev=264143&root=gcc&view=rev Log: S/390: Prohibit SYMBOL_REF in UNSPECV_CAS Inhibit constant propagation inlining SYMBOL_REF loads into UNSPECV_CAS. Even though reload can later undo it, the resulting code will be less efficient. gcc/ChangeLog: 2018-09-06 Ilya Leoshkevich <iii@linux.ibm.com> PR target/80080 * config/s390/predicates.md: Add nonsym_memory_operand. * config/s390/s390.c (s390_legitimize_cs_operand): If operand contains a SYMBOL_REF, load it into an intermediate pseudo. (s390_emit_compare_and_swap): Legitimize operand. * config/s390/s390.md: Use the new nonsym_memory_operand with UNSPECV_CAS patterns. gcc/testsuite/ChangeLog: 2018-09-06 Ilya Leoshkevich <iii@linux.ibm.com> PR target/80080 * gcc.target/s390/pr80080-3.c: New test. * gcc.target/s390/s390.exp: Make sure the new test passes on all optimization levels. Added: trunk/gcc/testsuite/gcc.target/s390/pr80080-3.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/s390/predicates.md trunk/gcc/config/s390/s390.c trunk/gcc/config/s390/s390.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/s390/s390.exp
Author: iii Date: Mon Sep 24 14:21:03 2018 New Revision: 264535 URL: https://gcc.gnu.org/viewcvs?rev=264535&root=gcc&view=rev Log: S/390: Fix conditional returns on z196+ S/390 epilogue ends with (parallel [(return) (use %r14)]) instead of the more usual (return) or (simple_return). This sequence is not recognized by the conditional return logic in try_optimize_cfg (). This was introduced for processors older than z196, where it is sometimes profitable to use call-clobbered register for returning instead of %r14. On newer processors we always return via %r14, for which the fact that it's used is already reflected by EPILOGUE_USES. In this case a simple (return) suffices. This patch changes return_use () to emit simple (return)s when returning via %r14. The resulting sequences are recognized by the conditional return logic in try_optimize_cfg (). gcc/ChangeLog: 2018-09-24 Ilya Leoshkevich <iii@linux.ibm.com> PR target/80080 * config/s390/s390.c (s390_emit_epilogue): Do not use PARALLEL RETURN+USE when returning via %r14. gcc/testsuite/ChangeLog: 2018-09-24 Ilya Leoshkevich <iii@linux.ibm.com> PR target/80080 * gcc.target/s390/risbg-ll-3.c: Expect conditional returns. * gcc.target/s390/zvector/vec-cmp-2.c: Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/config/s390/s390.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/s390/risbg-ll-3.c trunk/gcc/testsuite/gcc.target/s390/zvector/vec-cmp-2.c
Author: iii Date: Mon Dec 3 09:49:02 2018 New Revision: 266734 URL: https://gcc.gnu.org/viewcvs?rev=266734&root=gcc&view=rev Log: Repeat jump threading after combine Consider the following RTL: (insn (set (reg 65) (if_then_else (eq %cc 0) 1 0))) (insn (parallel [(set %cc (compare (reg 65) 0)) (clobber %scratch)])) (jump_insn (set %pc (if_then_else (ne %cc 0) (label_ref 23) %pc))) Combine simplifies this into: (note NOTE_INSN_DELETED) (note NOTE_INSN_DELETED) (jump_insn (set %pc (if_then_else (eq %cc 0) (label_ref 23) %pc))) opening up the possibility to perform jump threading. gcc/ChangeLog: 2018-12-03 Ilya Leoshkevich <iii@linux.ibm.com> PR target/80080 * cfgcleanup.c (class pass_postreload_jump): New pass. (pass_postreload_jump::execute): Likewise. (make_pass_postreload_jump): Likewise. * passes.def: Add pass_postreload_jump before pass_postreload_cse. * tree-pass.h (make_pass_postreload_jump): New pass. gcc/testsuite/ChangeLog: 2018-12-03 Ilya Leoshkevich <iii@linux.ibm.com> PR target/80080 * gcc.target/s390/pr80080-4.c: New test. Added: trunk/gcc/testsuite/gcc.target/s390/pr80080-4.c Modified: trunk/gcc/ChangeLog trunk/gcc/cfgcleanup.c trunk/gcc/passes.def trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-pass.h