Bug 102788

Summary: [11 Regression] Wrong code with -O3
Product: gcc Reporter: Vsevolod Livinskii <vsevolod.livinskiy>
Component: tree-optimizationAssignee: Richard Biener <rguenth>
Status: RESOLVED FIXED    
Severity: normal CC: babokin, jakub, marxin, regehr, rguenth, rsandifo, uros, vsevolod.livinskiy
Priority: P2 Keywords: wrong-code
Version: 12.0   
Target Milestone: 11.3   
Host: Target:
Build: Known to work: 11.0, 12.0
Known to fail: 11.2.1 Last reconfirmed: 2021-10-15 00:00:00
Bug Depends on:    
Bug Blocks: 103035    
Attachments: patch I am testing

Description Vsevolod Livinskii 2021-10-15 19:54:21 UTC
Link to the Compiler Explorer: https://godbolt.org/z/78ab77Env

Reproducer:
#include <stdio.h>

unsigned long long int var_4 = 235;
unsigned long long int var_5 = 74;
signed char var_12 = -99;
unsigned long long int var_349;
unsigned char var_645;
void test();

const unsigned long long &min(const unsigned long long &a, const unsigned long long &b) {
  return b < a ? b : a;
}

void test() __attribute__((noipa));
void test() {
    for (short c = var_12; c; c += 5)
      ; 
    for (int e = 0; e < 12; e += 1) {
      var_349 = var_4 ? 235 : 74;
      var_645 = min((unsigned long long)true, var_5 ? var_12 : var_4);
    }
}

int main() {
    test();
    printf("%d\n", (int)var_645);
    //if (var_645 != 1)
    //  __builtin_abort();
}

Error:
>$g++ -O2 small.cpp && ./a.out 
1
>$g++ -O3 small.cpp && ./a.out 
255

GCC version:
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/testing/gcc/bin_master/libexec/gcc/x86_64-pc-linux-gnu/12.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /testing/gcc/gcc_src_master/configure --enable-multilib --prefix=/testing/gcc/bin_master --disable-bootstrap
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.0.0 20211011 (30cce6f65a77b8eaa22f3efff7f1ba54858106f9) (GCC)
Comment 1 Jakub Jelinek 2021-10-15 20:07:36 UTC
Started with r12-1075-g28484d00c45b7bf094a22a4fddf9ffdc7482c7e1
Comment 2 Andrew Pinski 2021-10-15 20:15:12 UTC
Confirmed.

...

  mask__43.21_62 = vect_cst__60 != vect_cst__61;
  _43 = var_12.0_1 != 0;
  _3 = (long long unsigned int) _43;
  vect_patt_34.22_63 = VIEW_CONVERT_EXPR<vector(4) unsigned char>(mask__43.21_62);
  _26 = (unsigned char) _3;

Somehow the vectorizer missed the conversion but has the conversion there.

/app/example.cpp:19:23: note:  add new stmt: mask__43.21_62 = vect_cst__60 != vect_cst__61;
/app/example.cpp:19:23: note:  ------>vectorizing statement: _3 = (long long unsigned int) _43;
/app/example.cpp:19:23: note:  ------>vectorizing statement: patt_34 = (unsigned char) _43;
/app/example.cpp:19:23: note:  transform statement.
/app/example.cpp:19:23: note:  vect_is_simple_use: operand var_12.0_1 != 0, type of def: internal
/app/example.cpp:19:23: note:  vect_is_simple_use: vectype vector(4) <signed-boolean:8>
/app/example.cpp:19:23: note:  transform assignment.
/app/example.cpp:19:23: note:  vect_get_vec_defs_for_operand: _43
/app/example.cpp:19:23: note:  vect_is_simple_use: operand var_12.0_1 != 0, type of def: internal
/app/example.cpp:19:23: note:    def_stmt =  _43 = var_12.0_1 != 0;
/app/example.cpp:19:23: note:  add new stmt: vect_patt_34.22_63 = VIEW_CONVERT_EXPR<vector(4) unsigned char>(mask__43.21_62);
/app/example.cpp:19:23: note:  extracting lane for live stmt _26 = (unsigned char) _3;
Comment 3 Andrew Pinski 2021-10-15 20:16:07 UTC
(In reply to Jakub Jelinek from comment #1)
> Started with r12-1075-g28484d00c45b7bf094a22a4fddf9ffdc7482c7e1

I just think that exposed the latent bug in the vectorizer as far as I can tell.
Comment 4 Richard Biener 2021-10-18 06:29:36 UTC
I will have a look.
Comment 5 Richard Biener 2021-10-18 08:38:18 UTC
The interesting thing is that the difference starts with pattern recog (for the epilogue loop) where the main loop recognizes a bool pattern but the epilogue one
does not.  That's because 'long long unsigned int' doesn't have a vector type with V4QImode.

But then vect_recog_cast_forwprop_pattern makes this cast irrelevant but we
do not re-visit the bool pattern as we only (re-)process the pattern def sequence stmts but not the stmt itself:

  /* If this statement has already been replaced with pattern statements,
     leave the original statement alone, since the first match wins.
     Instead try to match against the definition statements that feed
     the main pattern statement.  */
  if (STMT_VINFO_IN_PATTERN_P (stmt_info))
    {
      gimple_stmt_iterator gsi;
      for (gsi = gsi_start (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
           !gsi_end_p (gsi); gsi_next (&gsi))
        vect_pattern_recog_1 (vinfo, recog_func,
                              vinfo->lookup_stmt (gsi_stmt (gsi)));
      return;
    }

and we're also not expecting to do this:

void
vect_mark_pattern_stmts (vec_info *vinfo,
                         stmt_vec_info orig_stmt_info, gimple *pattern_stmt,
                         tree pattern_vectype)
{
...
      /* We shouldn't be replacing the main pattern statement.  */
      gcc_assert (STMT_VINFO_RELATED_STMT (orig_stmt_info)->stmt
                  != orig_pattern_stmt);

but bool pattern recog is required for correctness, we are not going to
fail vectorization later.

Richard - you added most of the re-processing of patterns (and also the
forwprop pattern that triggers the failure situation).  Can you share
insights and maybe fix this?

It's also really a latent issue.  One could use __int128_t to trigger it
for V8QI vectorization, but that's likely it - the trigger is the FAIL
of the bool pattern recog (which should probably a fatal vectorization
FAIL in case the stmt is ending up relevant).

What "works" (for the testcase) is to not terminate vect_recog_bool_pattern
when we have no vector type for the LHS but allow NULL to be propagated
through the pattern machinery (nothing will in the end use that type
for this testcase).  As said, since bool patterns are required for
correctness we cannot really early-out here but need to communicate
failure upthread.  I'm going to test a patch with this route.
Comment 6 Richard Biener 2021-10-18 08:40:26 UTC
Created attachment 51621 [details]
patch I am testing
Comment 7 GCC Commits 2021-10-18 10:57:52 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:eb032893675afea4b01cc6ad06a3e0dcfe9b51cd

commit r12-4472-geb032893675afea4b01cc6ad06a3e0dcfe9b51cd
Author: Richard Biener <rguenther@suse.de>
Date:   Mon Oct 18 10:31:19 2021 +0200

    tree-optimization/102788 - avoid spurious bool pattern fails
    
    Bool pattern recog is required for correctness since vectorized
    compares otherwise produce -1 for true so any context where bool
    is used as value and not as condition or mask needs to be replaced
    with CMP ? 1 : 0.  When we fail to find a vector type for the
    result of such use we may not simply elide such transform since
    a new bool result can emerge when for example the cast_forwprop
    pattern is applied.  So the following avoids failing of the
    bool pattern recog process and instead not assign a vector type
    for the stmt.
    
    2021-10-18  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/102788
            * tree-vect-patterns.c (vect_init_pattern_stmt): Allow
            a NULL vectype.
            (vect_pattern_recog_1): Likewise.
            (vect_recog_bool_pattern): Continue matching the pattern
            even if we do not have a vector type for a conversion
            result.
    
            * g++.dg/vect/pr102788.cc: New testcase.
Comment 8 Richard Biener 2021-10-18 11:01:31 UTC
Fixed on trunk but the issue is latent since the introduction of the forwprop pattern.  I'm considering backporting to GCC 11.
Comment 9 Richard Biener 2021-11-08 12:35:46 UTC
Fixed.
Comment 10 GCC Commits 2021-11-08 12:35:54 UTC
The releases/gcc-11 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:34ed721929700b85f19f14fc56fb598a658b2bbc

commit r11-9227-g34ed721929700b85f19f14fc56fb598a658b2bbc
Author: Richard Biener <rguenther@suse.de>
Date:   Mon Oct 18 10:31:19 2021 +0200

    tree-optimization/102788 - avoid spurious bool pattern fails
    
    Bool pattern recog is required for correctness since vectorized
    compares otherwise produce -1 for true so any context where bool
    is used as value and not as condition or mask needs to be replaced
    with CMP ? 1 : 0.  When we fail to find a vector type for the
    result of such use we may not simply elide such transform since
    a new bool result can emerge when for example the cast_forwprop
    pattern is applied.  So the following avoids failing of the
    bool pattern recog process and instead not assign a vector type
    for the stmt.
    
    2021-10-18  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/102788
            * tree-vect-patterns.c (vect_init_pattern_stmt): Allow
            a NULL vectype.
            (vect_pattern_recog_1): Likewise.
            (vect_recog_bool_pattern): Continue matching the pattern
            even if we do not have a vector type for a conversion
            result.
    
            * g++.dg/vect/pr102788.cc: New testcase.
    
    (cherry picked from commit eb032893675afea4b01cc6ad06a3e0dcfe9b51cd)