Bug 99927 - [9/10 only] Wrong code since r11-39-gf9e1ea10e657af9f
Summary: [9/10 only] Wrong code since r11-39-gf9e1ea10e657af9f
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 11.0
: P3 normal
Target Milestone: 9.5
Assignee: Segher Boessenkool
URL:
Keywords: wrong-code
Depends on:
Blocks: yarpgen
  Show dependency treegraph
 
Reported: 2021-04-06 08:22 UTC by Martin Liška
Modified: 2021-11-03 05:20 UTC (History)
4 users (show)

See Also:
Host: x86_64-linux-gnu
Target:
Build:
Known to work: 11.0
Known to fail: 10.0
Last reconfirmed: 2021-04-06 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Liška 2021-04-06 08:22:33 UTC
Reduced from yarpgen:

$ cat func.cpp
short var = 9;

int test_var_1 = 0, test_var_5 = 0, test_var_8 = 0, test_var_10 = 0;

void test(unsigned var_6, unsigned long long var_9) {

  for (; test_var_10;)
    if (test_var_5)
      for (;; test_var_1 += test_var_8)
        ;
  for (int i_10 = 0; i_10 < 23; i_10 += 4)
    for (unsigned int i_11 = 0; i_11 < var_6 + 471511258; i_11 ++)
      if ((var_9 == 0) % var_6)
        var = 0;
}

int main() {
  test(3823456048, 10675217251973);
  __builtin_printf("%u\n", var);
  if (var != 9)
    __builtin_abort ();

  return 0;
}

$ g++ func.cpp && ./a.out
9
$ g++ func.cpp -O3 && ./a.out
0
Aborted (core dumped)
Comment 1 Richard Biener 2021-04-06 08:54:51 UTC
-fdisable-tree-cunroll fixes it but not disabling the lim pass after it but
disabling lim2 which then no longer makes us unroll the loop.

Disabling DOM3 _and_ VRP2 also fixes the issue, it looks like some bogus VRP
gets triggered.
Comment 2 Jakub Jelinek 2021-04-06 13:57:09 UTC
Interestingly, it only reproduces on AMD CPUs and not Intel.
The bug is in:
        xorl    %edx, %edx
        divl    %edi
        movl    $1, %eax
        cmove   %edx, %eax
divl leaves ZF undefined as documented (and as seen in RTL), but we use that in the cmove instruction.
Comment 3 Jakub Jelinek 2021-04-06 14:02:05 UTC
Before combine it looks fine:
(insn 23 22 105 6 (parallel [
            (set (reg:SI 108)
                (udiv:SI (reg:SI 104)
                    (reg/v:SI 102 [ var_6 ])))
            (set (reg:SI 107)
                (umod:SI (reg:SI 104)
                    (reg/v:SI 102 [ var_6 ])))
            (clobber (reg:CC 17 flags))
        ]) "pr99927.c":13:24 449 {*udivmodsi4}
     (expr_list:REG_DEAD (reg:SI 104)
        (expr_list:REG_UNUSED (reg:SI 108)
            (expr_list:REG_UNUSED (reg:CC 17 flags)
                (nil)))))
(insn 105 23 106 6 (set (reg:QI 135)
        (const_int 1 [0x1])) "pr99927.c":13:24 77 {*movqi_internal}
     (nil))
(insn 106 105 107 6 (parallel [
            (set (reg:QI 134)
                (and:QI (subreg:QI (reg:SI 107) 0)
                    (const_int 1 [0x1])))
            (clobber (reg:CC 17 flags))
        ]) "pr99927.c":13:24 491 {*andqi_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))
(insn 107 106 108 6 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:SI 107)
            (const_int 0 [0]))) "pr99927.c":13:24 7 {*cmpsi_ccno_1}
     (expr_list:REG_DEAD (reg:SI 107)
        (nil)))
(insn 108 107 111 6 (set (reg:QI 96 [ var_lsm_flag.12 ])
        (if_then_else:QI (eq (reg:CCZ 17 flags)
                (const_int 0 [0]))
            (reg:QI 134)
            (reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc}
     (expr_list:REG_DEAD (reg:QI 134)
        (nil)))
but in combine dump there is:
(insn 23 22 105 6 (parallel [
            (set (reg:SI 108)
                (udiv:SI (reg:SI 104)
                    (reg/v:SI 102 [ var_6 ])))
            (set (reg:SI 107)
                (umod:SI (reg:SI 104)
                    (reg/v:SI 102 [ var_6 ])))
            (clobber (reg:CC 17 flags))
        ]) "pr99927.c":13:24 449 {*udivmodsi4}
     (expr_list:REG_DEAD (reg:SI 104)
        (expr_list:REG_UNUSED (reg:SI 108)
            (expr_list:REG_UNUSED (reg:CC 17 flags)
                (nil)))))
(note 105 23 106 6 NOTE_INSN_DELETED)
(note 106 105 107 6 NOTE_INSN_DELETED)
(insn 107 106 108 6 (set (reg:QI 135)
        (const_int 1 [0x1])) "pr99927.c":13:24 77 {*movqi_internal}
     (nil))
(note 108 107 111 6 NOTE_INSN_DELETED)
(insn 111 108 85 6 (set (reg:QI 96 [ var_lsm_flag.12 ])
        (if_then_else:QI (eq (reg:CCZ 17 flags)
                (const_int 0 [0]))
            (subreg:QI (reg:SI 107) 0)
            (reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc}
     (expr_list:REG_DEAD (reg:SI 107)
        (expr_list:REG_DEAD (reg:QI 135)
            (nil))))
Comment 4 Richard Biener 2021-04-06 14:17:48 UTC
Trying 105, 107 -> 108:
  105: r135:QI=0x1
  107: flags:CCZ=cmp(r107:SI,0)
  108: r96:QI={(flags:CCZ==0)?r107:SI#0:r135:QI}
      REG_DEAD r107:SI
      REG_DEAD flags:CC
Failed to match this instruction:
(parallel [
        (set (reg:QI 96 [ var_lsm_flag.12 ])
            (subreg:QI (reg:SI 107) 0))
        (set (reg:QI 135)
            (const_int 1 [0x1]))
    ])
Failed to match this instruction:
(parallel [
        (set (reg:QI 96 [ var_lsm_flag.12 ])
            (subreg:QI (reg:SI 107) 0))
        (set (reg:QI 135)
            (const_int 1 [0x1]))
    ])
Successfully matched this instruction:
(set (reg:QI 135)
    (const_int 1 [0x1]))
Successfully matched this instruction:
(set (reg:QI 96 [ var_lsm_flag.12 ])
    (subreg:QI (reg:SI 107) 0))
allowing combination of insns 105, 107 and 108
original costs 4 + 4 + 8 = 16
replacement costs 4 + 4 = 8
deferring deletion of insn with uid = 105.
modifying insn i2   107: r135:QI=0x1
deferring rescan insn with uid = 107.
modifying insn i3   108: r96:QI=r107:SI#0
      REG_DEAD r107:SI
deferring rescan insn with uid = 108.

note that insn 107 was the CC setter for the if-then-else but we now have
a plain move there.
Comment 5 Jakub Jelinek 2021-04-06 14:44:17 UTC
It indeed goes wrong in the 105, 107 -> 108 try_combine, but at the start of that we have:
(insn 105 23 106 6 (set (reg:QI 135)
        (const_int 1 [0x1])) "pr99927.c":13:24 77 {*movqi_internal}
     (nil))
(note 106 105 107 6 NOTE_INSN_DELETED)
(insn 107 106 108 6 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:SI 107)
            (const_int 0 [0]))) "pr99927.c":13:24 7 {*cmpsi_ccno_1}
     (nil))
(insn 108 107 111 6 (set (reg:QI 96 [ var_lsm_flag.12 ])
        (if_then_else:QI (eq (reg:CCZ 17 flags)
                (const_int 0 [0]))
            (subreg:QI (reg:SI 107) 0)
            (reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc}
     (expr_list:REG_DEAD (reg:SI 107)
        (expr_list:REG_DEAD (reg:CC 17 flags)
            (nil))))
(insn 111 108 85 6 (set (reg:QI 96 [ var_lsm_flag.12 ])
        (if_then_else:QI (eq (reg:CCZ 17 flags)
                (const_int 0 [0]))
            (reg:QI 96 [ var_lsm_flag.12 ])
            (reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc}
     (expr_list:REG_DEAD (reg:QI 135)
        (nil)))
(jump_insn 85 111 35 6 (set (pc)
        (if_then_else (ne (reg:CCZ 17 flags)
                (const_int 0 [0]))
            (label_ref 45)
            (pc))) 806 {*jcc}
     (expr_list:REG_DEAD (reg:CCZ 17 flags)
        (int_list:REG_BR_PROB 536870916 (nil)))
 -> 45)

The substitutions of 105 and 107 into 108 properly simplify 108 into
(set (reg:QI 96 [ var_lsm_flag.12 ])
    (subreg:QI (reg:SI 107) 0))
because it is:
(set (reg:QI 96 [ var_lsm_flag.12 ])
    (if_then_else:QI (ne (reg:SI 107)
            (const_int 0 [0]))
        (const_int 1 [0x1])
        (const_int 0 [0])))
But what is wrong is that try_combine has been called at all, because
(reg:CCZ 17 flags) is used in 3 instructions rather than just one.
Comment 6 Jakub Jelinek 2021-04-06 15:03:44 UTC
So, we have at the start of first try_combine called on bb 6:
...
(insn 105 23 106 6 (set (reg:QI 135)
        (const_int 1 [0x1])) "pr99927.c":13:24 77 {*movqi_internal}
     (nil))
(insn 106 105 107 6 (parallel [
            (set (reg:QI 134)
                (and:QI (subreg:QI (reg:SI 107) 0)
                    (const_int 1 [0x1])))
            (clobber (reg:CC 17 flags))
        ]) "pr99927.c":13:24 491 {*andqi_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))
(insn 107 106 108 6 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:SI 107)
            (const_int 0 [0]))) "pr99927.c":13:24 7 {*cmpsi_ccno_1}
     (expr_list:REG_DEAD (reg:SI 107)
        (nil)))
(insn 108 107 111 6 (set (reg:QI 96 [ var_lsm_flag.12 ])
        (if_then_else:QI (eq (reg:CCZ 17 flags)
                (const_int 0 [0]))
            (reg:QI 134)
            (reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc}
     (expr_list:REG_DEAD (reg:QI 134)
        (nil)))
(insn 111 108 85 6 (set (reg:QI 96 [ var_lsm_flag.12 ])
        (if_then_else:QI (eq (reg:CCZ 17 flags)
                (const_int 0 [0]))
            (reg:QI 96 [ var_lsm_flag.12 ])
            (reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc}
     (expr_list:REG_DEAD (reg:QI 135)
        (nil)))
(jump_insn 85 111 35 6 (set (pc)
        (if_then_else (ne (reg:CCZ 17 flags)
                (const_int 0 [0]))
            (label_ref 45)
            (pc))) 806 {*jcc}
     (expr_list:REG_DEAD (reg:CCZ 17 flags)
        (int_list:REG_BR_PROB 536870916 (nil)))
 -> 45)

where LOG_LINKS of 108 are i105/r135, i106/r134 and i107/r17,
of 111 are i108/r96 and 85 has NULL LOG_LINKS.
But, r17 is used in all of i108, i111 and i85, so isn't single use, so isn't it incorrect that it has the i107/r17 link?
Comment 7 Jakub Jelinek 2021-04-06 15:37:05 UTC
Ah, create_log_links wants to work like that.
So, the bug seems to be that insn 108 has REG_DEAD (reg:CC 17 flags) note.
It doesn't initially, but it is added during 106 -> 108 combination
(gdb) p debug_rtx (i3)
(insn 108 107 111 6 (set (reg:QI 96 [ var_lsm_flag.12 ])
        (if_then_else:QI (eq (reg:CCZ 17 flags)
                (const_int 0 [0]))
            (reg:QI 134)
            (reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc}
     (expr_list:REG_DEAD (reg:QI 134)
        (nil)))
$151 = void
(gdb) p debug_rtx (i2)
(insn 106 105 107 6 (parallel [
            (set (reg:QI 134)
                (and:QI (subreg:QI (reg:SI 107) 0)
                    (const_int 1 [0x1])))
            (clobber (reg:CC 17 flags))
        ]) "pr99927.c":13:24 491 {*andqi_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))

The combination of those 2 insns is successful - into:
(insn 108 107 111 6 (set (reg:QI 96 [ var_lsm_flag.12 ])
        (if_then_else:QI (eq (reg:CCZ 17 flags)
                (const_int 0 [0]))
            (subreg:QI (reg:SI 107) 0)
            (reg:QI 135))) "pr99927.c":13:24 1104 {*movqicc_noc}
     (expr_list:REG_DEAD (reg:SI 107)
        (expr_list:REG_DEAD (reg:CC 17 flags)
            (nil))))
but the distribute_notes that turned REG_UNUSED (reg:CC 17 flags) note from insn 106 into REG_DEAD (reg:CC 17 flags) note on insn 108 is what looks broken to me,
the flags register is set by insn 107 in between those two and is used by some insns after insn 108 (111 and 85) and the new combined pattern certainly doesn't
kill the register in any way.

Segher, could you please have a look?
Thanks.
Comment 8 Jakub Jelinek 2021-04-06 15:42:48 UTC
distribute_notes has:
          /* If this register is set or clobbered in I3, put the note there
             unless there is one already.  */
          if (reg_set_p (XEXP (note, 0), PATTERN (i3)))
            {
              if (from_insn != i3)
                break;
                  
              if (! (REG_P (XEXP (note, 0))
                     ? find_regno_note (i3, REG_UNUSED, REGNO (XEXP (note, 0)))
                     : find_reg_note (i3, REG_UNUSED, XEXP (note, 0))))
                place = i3;
            }
          /* Otherwise, if this register is used by I3, then this register
             now dies here, so we must put a REG_DEAD note here unless there
             is one already.  */
          else if (reg_referenced_p (XEXP (note, 0), PATTERN (i3))
                   && ! (REG_P (XEXP (note, 0))
                         ? find_regno_note (i3, REG_DEAD,
                                            REGNO (XEXP (note, 0)))
                         : find_reg_note (i3, REG_DEAD, XEXP (note, 0))))
            {
              PUT_REG_NOTE_KIND (note, REG_DEAD);
              place = i3;
            }
the if (reg_set_p (...)) is false, as flags is not set by the insn, it is used.
But the else if is clearly not true, at least when XEXP (note, 0) is set in instructions in between i3 and i2 (or from whatever insn the notes come from).
Comment 9 Segher Boessenkool 2021-04-06 19:59:03 UTC
(In reply to Jakub Jelinek from comment #5)
> But what is wrong is that try_combine has been called at all, because
> (reg:CCZ 17 flags) is used in 3 instructions rather than just one.

That is not a problem; If that were true it just would mean that
added_sets_2 should be set:

  added_sets_2 = !dead_or_set_p (i3, i2dest);

But, the flags reg actually *is* dead in i3 (insn 108), it dies in i2
(insn 107):

     (expr_list:REG_DEAD (reg:SI 107)

So something earlier is bad already.
Comment 10 Jakub Jelinek 2021-04-06 20:05:43 UTC
(In reply to Segher Boessenkool from comment #9)
> So something earlier is bad already.


Yes, see #c7 and #c8.
Comment 11 Segher Boessenkool 2021-04-07 17:38:13 UTC
(In reply to Jakub Jelinek from comment #7)
> Ah, create_log_links wants to work like that.
> So, the bug seems to be that insn 108 has REG_DEAD (reg:CC 17 flags) note.
> It doesn't initially, but it is added during 106 -> 108 combination

But that combination should never have been made: flags is set in insn 107!
Comment 12 Jakub Jelinek 2021-04-07 17:46:20 UTC
(In reply to Segher Boessenkool from comment #11)
> (In reply to Jakub Jelinek from comment #7)
> > Ah, create_log_links wants to work like that.
> > So, the bug seems to be that insn 108 has REG_DEAD (reg:CC 17 flags) note.
> > It doesn't initially, but it is added during 106 -> 108 combination
> 
> But that combination should never have been made: flags is set in insn 107!

Why?  It is not across a LOG_LINK for the flags register, but for the r134 pseudo.  Yes, the first insn has a clobber on flags, but don't most of x86 insns have that?  The combiner doesn't move over the clobber, it just substs r134 to its SET_SRC.
Comment 13 Segher Boessenkool 2021-04-07 18:09:35 UTC
Yes, combine just drops that clobber of flags, that was a thinko :-)
Comment 14 Segher Boessenkool 2021-04-08 02:39:14 UTC
distribute_notes says
  Any clobbers from i2 or i1 can only exist if they were added by
  recog_for_combine.
which is not true apparently.  But all of this code *does* depend
on that, it just doesn't make sense otherwise.
Comment 15 Richard Biener 2021-04-12 14:21:24 UTC
So ... the conclusion is?
Comment 16 Segher Boessenkool 2021-04-12 18:00:49 UTC
(In reply to Richard Biener from comment #15)
> So ... the conclusion is?

The conclusion is I have a patch and I will commit it after testing it
successfully on enough targets.  This takes time.

I see I forgot to self-assign the bug.  Fixed.
Comment 17 GCC Commits 2021-04-18 15:01:29 UTC
The master branch has been updated by Segher Boessenkool <segher@gcc.gnu.org>:

https://gcc.gnu.org/g:b412ce8e961052e6becea3bc783a53e1d5feaa0f

commit r11-8237-gb412ce8e961052e6becea3bc783a53e1d5feaa0f
Author: Segher Boessenkool <segher@kernel.crashing.org>
Date:   Sat Apr 17 18:06:17 2021 +0000

    combine: Don't create REG_UNUSED notes if the reg already died (PR99927)
    
    If the register named in an existing REG_UNUSED note dies somewhere
    between where the note used to be and I3, we should just drop it.
    
    2021-04-21  Segher Boessenkool  <segher@kernel.crashing.org>
    
            PR rtl-optimization/99927
            * combine.c (distribute_notes) [REG_UNUSED]: If the register already
            is dead, just drop it.
Comment 18 Segher Boessenkool 2021-04-18 15:08:22 UTC
Fixed for 11.  This still needs backports for 10 and everything before,
please don't close the bug.
Comment 19 Jakub Jelinek 2021-04-27 11:40:45 UTC
GCC 11.1 has been released, retargeting bugs to GCC 11.2.
Comment 20 GCC Commits 2021-07-19 17:13:09 UTC
The releases/gcc-10 branch has been updated by Segher Boessenkool <segher@gcc.gnu.org>:

https://gcc.gnu.org/g:7ef9f0272258862439348dbaa57a46e9eabdc9ca

commit r10-9989-g7ef9f0272258862439348dbaa57a46e9eabdc9ca
Author: Segher Boessenkool <segher@kernel.crashing.org>
Date:   Sat Apr 17 18:06:17 2021 +0000

    combine: Don't create REG_UNUSED notes if the reg already died (PR99927)
    
    If the register named in an existing REG_UNUSED note dies somewhere
    between where the note used to be and I3, we should just drop it.
    
    2021-04-21  Segher Boessenkool  <segher@kernel.crashing.org>
    
            PR rtl-optimization/99927
            * combine.c (distribute_notes) [REG_UNUSED]: If the register already
            is dead, just drop it.
Comment 21 GCC Commits 2021-07-19 17:15:57 UTC
The releases/gcc-9 branch has been updated by Segher Boessenkool <segher@gcc.gnu.org>:

https://gcc.gnu.org/g:183a4022a2acfae91cfd861df98697eacfb5c2e5

commit r9-9631-g183a4022a2acfae91cfd861df98697eacfb5c2e5
Author: Segher Boessenkool <segher@kernel.crashing.org>
Date:   Sat Apr 17 18:06:17 2021 +0000

    combine: Don't create REG_UNUSED notes if the reg already died (PR99927)
    
    If the register named in an existing REG_UNUSED note dies somewhere
    between where the note used to be and I3, we should just drop it.
    
    2021-04-21  Segher Boessenkool  <segher@kernel.crashing.org>
    
            PR rtl-optimization/99927
            * combine.c (distribute_notes) [REG_UNUSED]: If the register already
            is dead, just drop it.
    
    (cherry picked from commit b412ce8e961052e6becea3bc783a53e1d5feaa0f)
Comment 22 Segher Boessenkool 2021-07-19 17:16:38 UTC
Fixed everywhere.