Bug 107404 - [12/13 Regression] Wrong code with -O3 since r12-6416-g037cc0b4a6646cc8
Summary: [12/13 Regression] Wrong code with -O3 since r12-6416-g037cc0b4a6646cc8
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 13.0
: P2 normal
Target Milestone: 12.3
Assignee: Uroš Bizjak
URL:
Keywords: wrong-code
Depends on:
Blocks: yarpgen
  Show dependency treegraph
 
Reported: 2022-10-26 00:40 UTC by Vsevolod Livinskii
Modified: 2022-11-03 16:18 UTC (History)
7 users (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work: 11.3.0
Known to fail: 12.0
Last reconfirmed: 2022-10-26 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vsevolod Livinskii 2022-10-26 00:40:07 UTC
Link to the Compiler Explorer: https://godbolt.org/z/jP6chErEY

The reproducer is a bit too big, but it's the best I was able to achieve with Creduce, Cvise, and manual reduction. UBSan and ASan say that it is compliant with the standard.

Reproducer:
#include <stdio.h>
unsigned long long a;
void b(unsigned long long *f, int p2) { *f ^= p2; }
long c;
char e, i;
short g, m;
long long ab[1][25][21][22];
unsigned long long aa[1][21][22];
unsigned long long ae[1][25][21][21];
long long ac[129360];
char ad[25][1][21];
char ah[1][25][1][21];
short af[100];
long max(long f, unsigned long p2) { return f < p2 ? p2 : f; }
const int &max2(const int &f, const int &p2) { return f < p2 ? p2 : f; }
void foo(unsigned f, unsigned p2, char l, char p4, long long n[][25][21][22],
        unsigned long long p6[][21][22], unsigned long long u[][25][21][21]) {
  long an;
  for (int j = 0; j < 4; j = p2)
    for (short k = 0; k < 7; k += 2)
      for (short o = 0; o < (short)p2 + 21742; o = l) {
        for (signed char p = 2; p < 9; p += p4)
          if (p6[j][o][p])
            for (long q(3); 4 ? n[0][k][o][0] : 0;
                 q += p6[0][o][0] ? p6[j][0][p] : 0)
              ac[j + q] = 5066799590;
        for (long r(p4 - 16); r < 21; r += 4) {
          ad[k][o][r] = max(u[j][k][o][r], f + u[j][k][o][r]);
          long d = u[j][k][o][r];
          an = d < p2 ? p2 : d;
          e = ah[j][k][o][r] = an;
          af[o * r] = i;
        }
        for (short s(c); s < (short)p2; s = 2)
          for (short am(m); am; am = max2(3, p2))
            for (long y = 0; y; y = 3)
              for (short t(0); t < max2(g, 0);)
                ;
      }
}
int main() {
  foo(7, 1558227751, 104, 16, ab, aa, ae);
  for (size_t v = 0; v < 5; ++v)
    for (size_t w = 0; w < 1; ++w)
      for (size_t x = 0; x < 21; ++x)
        b(&a, ad[v][w][x]);
  printf("%llu\n", a);
}

Error:
>$ g++ -O3 small.cpp && ./a.out 
28
>$ g++ -O0 small.cpp && ./a.out 
0

gcc version 13.0.0 20221025 (321f89e58510dd5df1b3dbe323344b987a7b11c6)
Comment 1 Andrew Pinski 2022-10-26 01:07:54 UTC
Note "-O3 -g0 -fno-tree-vrp -fno-tree-ccp -fno-ivopts -fno-thread-jumps  -fno-tree-dominator-opts -fno-tree-vectorize" still fails but adding -fwrapv allows it to pass.

Confirmed.
Comment 2 Martin Liška 2022-10-26 06:36:18 UTC
Started with r12-6416-g037cc0b4a6646cc8.
Comment 3 Richard Sandiford 2022-10-26 11:13:55 UTC
This is due to the peephole2 added in r12-2640-gf7bf03cf69cc:

;; Eliminate a reg-reg mov by inverting the condition of a cmov (#2).
;; mov r2,r3; mov r0,r1; dec r0; cmov r0,r2 -> dec r1; mov r0,r3; cmov r0, r1
(define_peephole2
 [(set (match_operand:SWI248 2 "general_reg_operand")
       (match_operand:SWI248 3 "general_gr_operand"))
  (set (match_operand:SWI248 0 "general_reg_operand")
       (match_operand:SWI248 1 "general_reg_operand"))
  (parallel [(set (reg FLAGS_REG) (match_operand 5))
             (set (match_dup 0) (match_operand:SWI248 6))])
  (set (match_dup 0)
       (if_then_else:SWI248 (match_operator 4 "ix86_comparison_operator"
                             [(reg FLAGS_REG) (const_int 0)])
        (match_dup 0)
        (match_dup 2)))]
 "TARGET_CMOVE
  && REGNO (operands[2]) != REGNO (operands[0])
  && REGNO (operands[2]) != REGNO (operands[1])
  && peep2_reg_dead_p (2, operands[1])
  && peep2_reg_dead_p (4, operands[2])
  && !reg_overlap_mentioned_p (operands[0], operands[3])"
 [(parallel [(set (match_dup 7) (match_dup 8))
             (set (match_dup 1) (match_dup 9))])
  (set (match_dup 0) (match_dup 3))
  (set (match_dup 0) (if_then_else:SWI248 (match_dup 4)
                                          (match_dup 1)
                                          (match_dup 0)))]
{
  operands[7] = SET_DEST (XVECEXP (PATTERN (peep2_next_insn (2)), 0, 0));
  operands[8] = replace_rtx (operands[5], operands[0], operands[1], true);
  operands[9] = replace_rtx (operands[6], operands[0], operands[1], true);
})

The conditions make sure that the 2<-3 and 0<-1 moves are independent,
but they don't check what effect the 2<-3 move has on the third
instruction.  I don't know if the intention was to exclude cases
where operands 5 and 6 reference operand 2, or whether the intention
was to cope with those cases where possible.
Comment 4 Uroš Bizjak 2022-11-03 09:29:35 UTC
(In reply to rsandifo@gcc.gnu.org from comment #3)
> This is due to the peephole2 added in r12-2640-gf7bf03cf69cc:
> 
> ;; Eliminate a reg-reg mov by inverting the condition of a cmov (#2).
> ;; mov r2,r3; mov r0,r1; dec r0; cmov r0,r2 -> dec r1; mov r0,r3; cmov r0, r1

The "dec" above in fact matches many other instructions, including:

(insn 485 960 737 19 (parallel [
            (set (reg:CCC 17 flags)
                (compare:CCC (plus:DI (reg:DI 43 r15 [orig:157 _138 ] [157])
                        (reg:DI 3 bx [orig:99 _23 ] [99]))
                    (reg:DI 3 bx [orig:99 _23 ] [99])))
            (set (reg:DI 43 r15 [orig:157 _138 ] [157])
                (plus:DI (reg:DI 43 r15 [orig:157 _138 ] [157])
                    (reg:DI 3 bx [orig:99 _23 ] [99])))
        ]) "pr107404.cpp":28:46 413 {*adddi3_cc_overflow_2}
     (nil))


So, the processed sequence is:

bx <- [mem]
r15 <- r8
r15 <- r15 + bx
r15 <- (r15, bx)

and the sequence is converted to:

r8 <- r8 + bx
r15 <- [mem]
r15 <- (r8, r15)

However, the bx register in the first insn is uninitialized.

We can still allow unrelated registers in the "dec" instruction, under the condition this register is not the one in the removed assignment. So, the answer to:

> The conditions make sure that the 2<-3 and 0<-1 moves are independent,
> but they don't check what effect the 2<-3 move has on the third
> instruction.  I don't know if the intention was to exclude cases
> where operands 5 and 6 reference operand 2, or whether the intention
> was to cope with those cases where possible.

is that we want to exclude cases where operand 6 references operand 2 (please note that operand 5 is a duplicate of operand 6).
Comment 5 Uroš Bizjak 2022-11-03 10:00:34 UTC
Patch in testing:

--cut here--
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 85567980aa3..436eabb691a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -21800,7 +21800,8 @@ (define_peephole2
   && REGNO (operands[2]) != REGNO (operands[1])
   && peep2_reg_dead_p (2, operands[1])
   && peep2_reg_dead_p (4, operands[2])
-  && !reg_overlap_mentioned_p (operands[0], operands[3])"
+  && !reg_overlap_mentioned_p (operands[0], operands[3])
+  && !reg_mentioned_p (operands[2], operands[6])"
  [(parallel [(set (match_dup 7) (match_dup 8))
             (set (match_dup 1) (match_dup 9))])
   (set (match_dup 0) (match_dup 3))
--cut here--
Comment 6 GCC Commits 2022-11-03 13:18:52 UTC
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:553b1d3dd5b9253ebdf66ee3260c717d5b807dd1

commit r13-3624-g553b1d3dd5b9253ebdf66ee3260c717d5b807dd1
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Thu Nov 3 14:17:42 2022 +0100

    i386: Fix uninitialized register after peephole2 conversion [PR107404]
    
    The eliminate reg-reg move by inverting the condition of
    a cmove #2 peephole2 converts the following sequence:
    
      473: bx:DI=[r14:DI*0x8+r12:DI]
      960: r15:DI=r8:DI
      485: {flags:CCC=cmp(r15:DI+bx:DI,bx:DI);r15:DI=r15:DI+bx:DI;}
      737: r15:DI={(geu(flags:CCC,0))?r15:DI:bx:DI}
    
    to:
    
     1110: {flags:CCC=cmp(r8:DI+bx:DI,bx:DI);r8:DI=r8:DI+bx:DI;}
     1111: r15:DI=[r14:DI*0x8+r12:DI]
     1112: r15:DI={(geu(flags:CCC,0))?r8:DI:r15:DI}
    
    Please note that(insn 1110) uses register BX, but its
    initialization was eliminated.
    
    Avoid conversion if eliminated move intialized a register, used
    in the moved instruction.
    
    2022-11-03  Uroš Bizjak  <ubizjak@gmail.com>
    
    gcc/ChangeLog:
    
            PR target/107404
            * config/i386/i386.md (eliminate reg-reg move by inverting the
            condition of a cmove #2 peephole2): Check if eliminated move
            initialized a register, used in the moved instruction.
    
    gcc/testsuite/ChangeLog:
    
            PR target/107404
            * g++.target/i386/pr107404.C: New test.
Comment 7 GCC Commits 2022-11-03 16:13:15 UTC
The releases/gcc-12 branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:35e8b889af144faad074c0fceed0390143ec37dd

commit r12-8890-g35e8b889af144faad074c0fceed0390143ec37dd
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Thu Nov 3 14:17:42 2022 +0100

    i386: Fix uninitialized register after peephole2 conversion [PR107404]
    
    The eliminate reg-reg move by inverting the condition of
    a cmove #2 peephole2 converts the following sequence:
    
      473: bx:DI=[r14:DI*0x8+r12:DI]
      960: r15:DI=r8:DI
      485: {flags:CCC=cmp(r15:DI+bx:DI,bx:DI);r15:DI=r15:DI+bx:DI;}
      737: r15:DI={(geu(flags:CCC,0))?r15:DI:bx:DI}
    
    to:
    
     1110: {flags:CCC=cmp(r8:DI+bx:DI,bx:DI);r8:DI=r8:DI+bx:DI;}
     1111: r15:DI=[r14:DI*0x8+r12:DI]
     1112: r15:DI={(geu(flags:CCC,0))?r8:DI:r15:DI}
    
    Please note that(insn 1110) uses register BX, but its
    initialization was eliminated.
    
    Avoid conversion if eliminated move intialized a register, used
    in the moved instruction.
    
    2022-11-03  Uroš Bizjak  <ubizjak@gmail.com>
    
    gcc/ChangeLog:
    
            PR target/107404
            * config/i386/i386.md (eliminate reg-reg move by inverting the
            condition of a cmove #2 peephole2): Check if eliminated move
            initialized a register, used in the moved instruction.
    
    gcc/testsuite/ChangeLog:
    
            PR target/107404
            * g++.target/i386/pr107404.C: New test.
    
    (cherry picked from commit 553b1d3dd5b9253ebdf66ee3260c717d5b807dd1)
Comment 8 Uroš Bizjak 2022-11-03 16:18:44 UTC
Fixed.