Reduced from yarpgen: $ cat func.cpp short var = 9; int test_var_1 = 0, test_var_5 = 0, test_var_8 = 0, test_var_10 = 0; void test(unsigned var_6, unsigned long long var_9) { for (; test_var_10;) if (test_var_5) for (;; test_var_1 += test_var_8) ; for (int i_10 = 0; i_10 < 23; i_10 += 4) for (unsigned int i_11 = 0; i_11 < var_6 + 471511258; i_11 ++) if ((var_9 == 0) % var_6) var = 0; } int main() { test(3823456048, 10675217251973); __builtin_printf("%u\n", var); if (var != 9) __builtin_abort (); return 0; } $ g++ func.cpp && ./a.out 9 $ g++ func.cpp -O3 && ./a.out 0 Aborted (core dumped)
-fdisable-tree-cunroll fixes it but not disabling the lim pass after it but disabling lim2 which then no longer makes us unroll the loop. Disabling DOM3 _and_ VRP2 also fixes the issue, it looks like some bogus VRP gets triggered.
Interestingly, it only reproduces on AMD CPUs and not Intel. The bug is in: xorl %edx, %edx divl %edi movl $1, %eax cmove %edx, %eax divl leaves ZF undefined as documented (and as seen in RTL), but we use that in the cmove instruction.
Before combine it looks fine: (insn 23 22 105 6 (parallel [ (set (reg:SI 108) (udiv:SI (reg:SI 104) (reg/v:SI 102 [ var_6 ]))) (set (reg:SI 107) (umod:SI (reg:SI 104) (reg/v:SI 102 [ var_6 ]))) (clobber (reg:CC 17 flags)) ]) "pr99927.c":13:24 449 {*udivmodsi4} (expr_list:REG_DEAD (reg:SI 104) (expr_list:REG_UNUSED (reg:SI 108) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))))) (insn 105 23 106 6 (set (reg:QI 135) (const_int 1 [0x1])) "pr99927.c":13:24 77 {*movqi_internal} (nil)) (insn 106 105 107 6 (parallel [ (set (reg:QI 134) (and:QI (subreg:QI (reg:SI 107) 0) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) "pr99927.c":13:24 491 {*andqi_1} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (insn 107 106 108 6 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 107) (const_int 0 [0]))) "pr99927.c":13:24 7 {*cmpsi_ccno_1} (expr_list:REG_DEAD (reg:SI 107) (nil))) (insn 108 107 111 6 (set (reg:QI 96 [ var_lsm_flag.12 ]) (if_then_else:QI (eq (reg:CCZ 17 flags) (const_int 0 [0])) (reg:QI 134) (reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc} (expr_list:REG_DEAD (reg:QI 134) (nil))) but in combine dump there is: (insn 23 22 105 6 (parallel [ (set (reg:SI 108) (udiv:SI (reg:SI 104) (reg/v:SI 102 [ var_6 ]))) (set (reg:SI 107) (umod:SI (reg:SI 104) (reg/v:SI 102 [ var_6 ]))) (clobber (reg:CC 17 flags)) ]) "pr99927.c":13:24 449 {*udivmodsi4} (expr_list:REG_DEAD (reg:SI 104) (expr_list:REG_UNUSED (reg:SI 108) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))))) (note 105 23 106 6 NOTE_INSN_DELETED) (note 106 105 107 6 NOTE_INSN_DELETED) (insn 107 106 108 6 (set (reg:QI 135) (const_int 1 [0x1])) "pr99927.c":13:24 77 {*movqi_internal} (nil)) (note 108 107 111 6 NOTE_INSN_DELETED) (insn 111 108 85 6 (set (reg:QI 96 [ var_lsm_flag.12 ]) (if_then_else:QI (eq (reg:CCZ 17 flags) (const_int 0 [0])) (subreg:QI (reg:SI 107) 0) (reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc} (expr_list:REG_DEAD (reg:SI 107) (expr_list:REG_DEAD (reg:QI 135) (nil))))
Trying 105, 107 -> 108: 105: r135:QI=0x1 107: flags:CCZ=cmp(r107:SI,0) 108: r96:QI={(flags:CCZ==0)?r107:SI#0:r135:QI} REG_DEAD r107:SI REG_DEAD flags:CC Failed to match this instruction: (parallel [ (set (reg:QI 96 [ var_lsm_flag.12 ]) (subreg:QI (reg:SI 107) 0)) (set (reg:QI 135) (const_int 1 [0x1])) ]) Failed to match this instruction: (parallel [ (set (reg:QI 96 [ var_lsm_flag.12 ]) (subreg:QI (reg:SI 107) 0)) (set (reg:QI 135) (const_int 1 [0x1])) ]) Successfully matched this instruction: (set (reg:QI 135) (const_int 1 [0x1])) Successfully matched this instruction: (set (reg:QI 96 [ var_lsm_flag.12 ]) (subreg:QI (reg:SI 107) 0)) allowing combination of insns 105, 107 and 108 original costs 4 + 4 + 8 = 16 replacement costs 4 + 4 = 8 deferring deletion of insn with uid = 105. modifying insn i2 107: r135:QI=0x1 deferring rescan insn with uid = 107. modifying insn i3 108: r96:QI=r107:SI#0 REG_DEAD r107:SI deferring rescan insn with uid = 108. note that insn 107 was the CC setter for the if-then-else but we now have a plain move there.
It indeed goes wrong in the 105, 107 -> 108 try_combine, but at the start of that we have: (insn 105 23 106 6 (set (reg:QI 135) (const_int 1 [0x1])) "pr99927.c":13:24 77 {*movqi_internal} (nil)) (note 106 105 107 6 NOTE_INSN_DELETED) (insn 107 106 108 6 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 107) (const_int 0 [0]))) "pr99927.c":13:24 7 {*cmpsi_ccno_1} (nil)) (insn 108 107 111 6 (set (reg:QI 96 [ var_lsm_flag.12 ]) (if_then_else:QI (eq (reg:CCZ 17 flags) (const_int 0 [0])) (subreg:QI (reg:SI 107) 0) (reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc} (expr_list:REG_DEAD (reg:SI 107) (expr_list:REG_DEAD (reg:CC 17 flags) (nil)))) (insn 111 108 85 6 (set (reg:QI 96 [ var_lsm_flag.12 ]) (if_then_else:QI (eq (reg:CCZ 17 flags) (const_int 0 [0])) (reg:QI 96 [ var_lsm_flag.12 ]) (reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc} (expr_list:REG_DEAD (reg:QI 135) (nil))) (jump_insn 85 111 35 6 (set (pc) (if_then_else (ne (reg:CCZ 17 flags) (const_int 0 [0])) (label_ref 45) (pc))) 806 {*jcc} (expr_list:REG_DEAD (reg:CCZ 17 flags) (int_list:REG_BR_PROB 536870916 (nil))) -> 45) The substitutions of 105 and 107 into 108 properly simplify 108 into (set (reg:QI 96 [ var_lsm_flag.12 ]) (subreg:QI (reg:SI 107) 0)) because it is: (set (reg:QI 96 [ var_lsm_flag.12 ]) (if_then_else:QI (ne (reg:SI 107) (const_int 0 [0])) (const_int 1 [0x1]) (const_int 0 [0]))) But what is wrong is that try_combine has been called at all, because (reg:CCZ 17 flags) is used in 3 instructions rather than just one.
So, we have at the start of first try_combine called on bb 6: ... (insn 105 23 106 6 (set (reg:QI 135) (const_int 1 [0x1])) "pr99927.c":13:24 77 {*movqi_internal} (nil)) (insn 106 105 107 6 (parallel [ (set (reg:QI 134) (and:QI (subreg:QI (reg:SI 107) 0) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) "pr99927.c":13:24 491 {*andqi_1} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (insn 107 106 108 6 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 107) (const_int 0 [0]))) "pr99927.c":13:24 7 {*cmpsi_ccno_1} (expr_list:REG_DEAD (reg:SI 107) (nil))) (insn 108 107 111 6 (set (reg:QI 96 [ var_lsm_flag.12 ]) (if_then_else:QI (eq (reg:CCZ 17 flags) (const_int 0 [0])) (reg:QI 134) (reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc} (expr_list:REG_DEAD (reg:QI 134) (nil))) (insn 111 108 85 6 (set (reg:QI 96 [ var_lsm_flag.12 ]) (if_then_else:QI (eq (reg:CCZ 17 flags) (const_int 0 [0])) (reg:QI 96 [ var_lsm_flag.12 ]) (reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc} (expr_list:REG_DEAD (reg:QI 135) (nil))) (jump_insn 85 111 35 6 (set (pc) (if_then_else (ne (reg:CCZ 17 flags) (const_int 0 [0])) (label_ref 45) (pc))) 806 {*jcc} (expr_list:REG_DEAD (reg:CCZ 17 flags) (int_list:REG_BR_PROB 536870916 (nil))) -> 45) where LOG_LINKS of 108 are i105/r135, i106/r134 and i107/r17, of 111 are i108/r96 and 85 has NULL LOG_LINKS. But, r17 is used in all of i108, i111 and i85, so isn't single use, so isn't it incorrect that it has the i107/r17 link?
Ah, create_log_links wants to work like that. So, the bug seems to be that insn 108 has REG_DEAD (reg:CC 17 flags) note. It doesn't initially, but it is added during 106 -> 108 combination (gdb) p debug_rtx (i3) (insn 108 107 111 6 (set (reg:QI 96 [ var_lsm_flag.12 ]) (if_then_else:QI (eq (reg:CCZ 17 flags) (const_int 0 [0])) (reg:QI 134) (reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc} (expr_list:REG_DEAD (reg:QI 134) (nil))) $151 = void (gdb) p debug_rtx (i2) (insn 106 105 107 6 (parallel [ (set (reg:QI 134) (and:QI (subreg:QI (reg:SI 107) 0) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) "pr99927.c":13:24 491 {*andqi_1} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) The combination of those 2 insns is successful - into: (insn 108 107 111 6 (set (reg:QI 96 [ var_lsm_flag.12 ]) (if_then_else:QI (eq (reg:CCZ 17 flags) (const_int 0 [0])) (subreg:QI (reg:SI 107) 0) (reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc} (expr_list:REG_DEAD (reg:SI 107) (expr_list:REG_DEAD (reg:CC 17 flags) (nil)))) but the distribute_notes that turned REG_UNUSED (reg:CC 17 flags) note from insn 106 into REG_DEAD (reg:CC 17 flags) note on insn 108 is what looks broken to me, the flags register is set by insn 107 in between those two and is used by some insns after insn 108 (111 and 85) and the new combined pattern certainly doesn't kill the register in any way. Segher, could you please have a look? Thanks.
distribute_notes has: /* If this register is set or clobbered in I3, put the note there unless there is one already. */ if (reg_set_p (XEXP (note, 0), PATTERN (i3))) { if (from_insn != i3) break; if (! (REG_P (XEXP (note, 0)) ? find_regno_note (i3, REG_UNUSED, REGNO (XEXP (note, 0))) : find_reg_note (i3, REG_UNUSED, XEXP (note, 0)))) place = i3; } /* Otherwise, if this register is used by I3, then this register now dies here, so we must put a REG_DEAD note here unless there is one already. */ else if (reg_referenced_p (XEXP (note, 0), PATTERN (i3)) && ! (REG_P (XEXP (note, 0)) ? find_regno_note (i3, REG_DEAD, REGNO (XEXP (note, 0))) : find_reg_note (i3, REG_DEAD, XEXP (note, 0)))) { PUT_REG_NOTE_KIND (note, REG_DEAD); place = i3; } the if (reg_set_p (...)) is false, as flags is not set by the insn, it is used. But the else if is clearly not true, at least when XEXP (note, 0) is set in instructions in between i3 and i2 (or from whatever insn the notes come from).
(In reply to Jakub Jelinek from comment #5) > But what is wrong is that try_combine has been called at all, because > (reg:CCZ 17 flags) is used in 3 instructions rather than just one. That is not a problem; If that were true it just would mean that added_sets_2 should be set: added_sets_2 = !dead_or_set_p (i3, i2dest); But, the flags reg actually *is* dead in i3 (insn 108), it dies in i2 (insn 107): (expr_list:REG_DEAD (reg:SI 107) So something earlier is bad already.
(In reply to Segher Boessenkool from comment #9) > So something earlier is bad already. Yes, see #c7 and #c8.
(In reply to Jakub Jelinek from comment #7) > Ah, create_log_links wants to work like that. > So, the bug seems to be that insn 108 has REG_DEAD (reg:CC 17 flags) note. > It doesn't initially, but it is added during 106 -> 108 combination But that combination should never have been made: flags is set in insn 107!
(In reply to Segher Boessenkool from comment #11) > (In reply to Jakub Jelinek from comment #7) > > Ah, create_log_links wants to work like that. > > So, the bug seems to be that insn 108 has REG_DEAD (reg:CC 17 flags) note. > > It doesn't initially, but it is added during 106 -> 108 combination > > But that combination should never have been made: flags is set in insn 107! Why? It is not across a LOG_LINK for the flags register, but for the r134 pseudo. Yes, the first insn has a clobber on flags, but don't most of x86 insns have that? The combiner doesn't move over the clobber, it just substs r134 to its SET_SRC.
Yes, combine just drops that clobber of flags, that was a thinko :-)
distribute_notes says Any clobbers from i2 or i1 can only exist if they were added by recog_for_combine. which is not true apparently. But all of this code *does* depend on that, it just doesn't make sense otherwise.
So ... the conclusion is?
(In reply to Richard Biener from comment #15) > So ... the conclusion is? The conclusion is I have a patch and I will commit it after testing it successfully on enough targets. This takes time. I see I forgot to self-assign the bug. Fixed.
The master branch has been updated by Segher Boessenkool <segher@gcc.gnu.org>: https://gcc.gnu.org/g:b412ce8e961052e6becea3bc783a53e1d5feaa0f commit r11-8237-gb412ce8e961052e6becea3bc783a53e1d5feaa0f Author: Segher Boessenkool <segher@kernel.crashing.org> Date: Sat Apr 17 18:06:17 2021 +0000 combine: Don't create REG_UNUSED notes if the reg already died (PR99927) If the register named in an existing REG_UNUSED note dies somewhere between where the note used to be and I3, we should just drop it. 2021-04-21 Segher Boessenkool <segher@kernel.crashing.org> PR rtl-optimization/99927 * combine.c (distribute_notes) [REG_UNUSED]: If the register already is dead, just drop it.
Fixed for 11. This still needs backports for 10 and everything before, please don't close the bug.
GCC 11.1 has been released, retargeting bugs to GCC 11.2.
The releases/gcc-10 branch has been updated by Segher Boessenkool <segher@gcc.gnu.org>: https://gcc.gnu.org/g:7ef9f0272258862439348dbaa57a46e9eabdc9ca commit r10-9989-g7ef9f0272258862439348dbaa57a46e9eabdc9ca Author: Segher Boessenkool <segher@kernel.crashing.org> Date: Sat Apr 17 18:06:17 2021 +0000 combine: Don't create REG_UNUSED notes if the reg already died (PR99927) If the register named in an existing REG_UNUSED note dies somewhere between where the note used to be and I3, we should just drop it. 2021-04-21 Segher Boessenkool <segher@kernel.crashing.org> PR rtl-optimization/99927 * combine.c (distribute_notes) [REG_UNUSED]: If the register already is dead, just drop it.
The releases/gcc-9 branch has been updated by Segher Boessenkool <segher@gcc.gnu.org>: https://gcc.gnu.org/g:183a4022a2acfae91cfd861df98697eacfb5c2e5 commit r9-9631-g183a4022a2acfae91cfd861df98697eacfb5c2e5 Author: Segher Boessenkool <segher@kernel.crashing.org> Date: Sat Apr 17 18:06:17 2021 +0000 combine: Don't create REG_UNUSED notes if the reg already died (PR99927) If the register named in an existing REG_UNUSED note dies somewhere between where the note used to be and I3, we should just drop it. 2021-04-21 Segher Boessenkool <segher@kernel.crashing.org> PR rtl-optimization/99927 * combine.c (distribute_notes) [REG_UNUSED]: If the register already is dead, just drop it. (cherry picked from commit b412ce8e961052e6becea3bc783a53e1d5feaa0f)
Fixed everywhere.