Bug 69820

Summary: [6 Regression] Test miscompiled with -O3 option
Product: gcc Reporter: Vsevolod Livinskii <vsevolod.livinskiy>
Component: tree-optimizationAssignee: Jakub Jelinek <jakub>
Status: RESOLVED FIXED    
Severity: normal CC: jakub, kyukhin, uros
Priority: P1 Keywords: wrong-code
Version: 6.0   
Target Milestone: 6.0   
Host: Target: x86_64
Build: Known to work:
Known to fail: Last reconfirmed: 2016-02-15 00:00:00
Bug Depends on:    
Bug Blocks: 103035    
Attachments: Reproducer.
Preprocessed testcase.
gcc6-pr69820.patch
gcc6-pr69820-2.patch

Description Vsevolod Livinskii 2016-02-15 06:33:15 UTC
Created attachment 37689 [details]
Reproducer.

Test case produces incorrect result with -O3 option. Everything works fine with gcc version 5.3.1 20160212 (Revision=233387) and gcc version 4.9.4 20160212 (prerelease) (Revision=233387).

Output: 
> g++ -march=core-avx2 -O3 repr.cpp ; ./a.out
7540928331148314342
> g++ -march=core-avx2 -O2 repr.cpp ; ./a.out
18084857435860378664

GCC version:
> g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/export/users/vlivinsk/gcc-trunk/bin/../libexec/gcc/x86_64-pc-linux-gnu/6.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /export/users/gnutester/stability/svn/trunk/configure --with-arch=corei7 --with-cpu=corei7 --enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld --enable-cloog-backend=isl --with-fpmath=sse --enable-checking=release --with-pkgversion=Revision=233012 --prefix=/export/users/gnutester/stability/work/trunk/64/install --enable-languages=c,c++,fortran,java,lto
Thread model: posix
gcc version 6.0.0 20160130 (experimental) (Revision=233012)
Comment 1 Vsevolod Livinskii 2016-02-15 06:37:26 UTC
Created attachment 37690 [details]
Preprocessed testcase.
Comment 2 Markus Trippelsdorf 2016-02-15 10:39:43 UTC
-fno-tree-loop-vectorize "fixes" the issue.
Comment 3 Markus Trippelsdorf 2016-02-15 14:05:56 UTC
Started with r230098.
Comment 4 Vsevolod Livinskii 2016-02-15 14:54:08 UTC
Test case also cause internal compiler error when it is compiled for knl.

> g++ -march=knl -O3 repr.cpp
repr.cpp: In function ‘void foo()’:
repr.cpp:23:45: internal compiler error: in ix86_expand_sse_cmp, at config/i386/i386.c:22702
         out_1 [i] = in_1 [i] * (out_4 [i] * (bool)out_4 [i]);
                                             ^~~~~~~~~~~~~~~
0xd0461e ix86_expand_sse_cmp
        /export/users/gnutester/stability/svn/trunk/gcc/config/i386/i386.c:22702
0xd37344 ix86_expand_int_vcond(rtx_def**)
        /export/users/gnutester/stability/svn/trunk/gcc/config/i386/i386.c:23398
0xee89d9 gen_vconduv32hiv32hi(rtx_def*, rtx_def*, rtx_def*, rtx_def*, rtx_def*, rtx_def*)
        /export/users/gnutester/stability/svn/trunk/gcc/config/i386/sse.md:11065
0x9fdf58 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
        /export/users/gnutester/stability/svn/trunk/gcc/optabs.c:7015
0x9fdf58 expand_insn(insn_code, unsigned int, expand_operand*)
        /export/users/gnutester/stability/svn/trunk/gcc/optabs.c:7046
0x9fe700 expand_vec_cond_expr(tree_node*, tree_node*, tree_node*, tree_node*, rtx_def*)
        /export/users/gnutester/stability/svn/trunk/gcc/optabs.c:5622
0x863bc9 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier)
        /export/users/gnutester/stability/svn/trunk/gcc/expr.c:9343
0x78035e expand_gimple_stmt_1
        /export/users/gnutester/stability/svn/trunk/gcc/cfgexpand.c:3641
0x78035e expand_gimple_stmt
        /export/users/gnutester/stability/svn/trunk/gcc/cfgexpand.c:3702
0x7816ff expand_gimple_basic_block
        /export/users/gnutester/stability/svn/trunk/gcc/cfgexpand.c:5708
0x786606 execute
        /export/users/gnutester/stability/svn/trunk/gcc/cfgexpand.c:6323
Comment 5 Markus Trippelsdorf 2016-02-15 15:16:06 UTC
knl ICE:

int a[100], b[100];
short c[100];
void fn1() {
  for (int i = 0; i < 100; ++i)
    b[i] = a[i] * (bool)c[i];
}
Comment 6 Jakub Jelinek 2016-02-15 15:58:06 UTC
C runtime testcase:
unsigned int a[100];
long long int b[100];
unsigned short c[100];

__attribute__((noinline, noclone)) void
foo (void)
{
  int i;
  for (i = 0; i < 100; ++i)
    b[i] = a[i] * (c[i] * (_Bool) c[i]);
}

int
main ()
{
  int i;
  if (__SIZEOF_INT__ * __CHAR_BIT__ != 32)
    return 0;
  for (i = 0; i < 100; ++i)
    {
      a[i] = 3489456818U;
      b[i] = 0x1eadbeefbeefdeadLL;
      c[i] = 38364;
    }
  foo ();
  for (i = 0; i < 100; ++i)
    if (b[i] != 0xed446af8U)
      __builtin_abort ();
  return 0;
}
Comment 7 Jakub Jelinek 2016-02-15 16:19:55 UTC
Created attachment 37694 [details]
gcc6-pr69820.patch

As for the ICE, the following patch works for me.  VI_512 iterator is only used in the vcond* patterns, therefore it is desirable to change them all this way.
Comment 8 Jakub Jelinek 2016-02-15 16:41:34 UTC
As for the wrong-code, we used to emit:
...
  vect_cst_.23_105 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  vect_cst_.24_106 = { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 };
  vect_cst_.25_107 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
...
  vect_patt_37.22_108 = VEC_COND_EXPR <vect__5.20_102 != vect_cst_.23_105, vect_cst_.24_106, vect_cst_.25_107>;
  vect_patt_37.22_109 = VEC_COND_EXPR <vect__5.21_104 != vect_cst_.23_105, vect_cst_.24_106, vect_cst_.25_107>;
...
  vect__35.26_110 = VIEW_CONVERT_EXPR<vector(16) short unsigned int>(vect_patt_37.22_108);
  vect__35.26_111 = VIEW_CONVERT_EXPR<vector(16) short unsigned int>(vect_patt_37.22_109);
but now we emit incorrect:
  vect_cst__103 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
...
  mask__7.22_104 = vect__5.20_100 != vect_cst__103;
  mask__7.22_105 = vect__5.21_102 != vect_cst__103;
  vect__35.23_106 = VIEW_CONVERT_EXPR<vector(16) short unsigned int>(mask__7.22_104);
  vect__35.23_107 = VIEW_CONVERT_EXPR<vector(16) short unsigned int>(mask__7.22_105);
The reason why the latter is wrong is that for non-zero short values it yields 0xffff instead of 1.
Comment 9 Jakub Jelinek 2016-02-15 18:10:32 UTC
Created attachment 37695 [details]
gcc6-pr69820-2.patch

Untested fix for the wrong-code issue.  For conversion from bool/_Bool/unsigned :1 we have pattern recognizer for it, which is essential for correct behavior (otherwise we get all ones instead of 1 for true), but the various widening pattern recognizers aren't prepared to deal with those.
Comment 10 Jakub Jelinek 2016-02-16 15:17:20 UTC
Author: jakub
Date: Tue Feb 16 15:16:48 2016
New Revision: 233457

URL: https://gcc.gnu.org/viewcvs?rev=233457&root=gcc&view=rev
Log:
	PR tree-optimization/69820
	* tree-vect-patterns.c (type_conversion_p): Return false if
	*orig_type is unsigned single precision or boolean.
	(vect_recog_dot_prod_pattern, vect_recog_widen_mult_pattern):
	Formatting fix.

	* gcc.dg/vect/pr69820.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/vect/pr69820.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-patterns.c
Comment 11 Jakub Jelinek 2016-02-16 15:18:34 UTC
Wrong-code issue fixed, the ICE fix is still pending review.
Comment 12 Jakub Jelinek 2016-02-19 13:43:12 UTC
Author: jakub
Date: Fri Feb 19 13:42:38 2016
New Revision: 233558

URL: https://gcc.gnu.org/viewcvs?rev=233558&root=gcc&view=rev
Log:
	PR target/69820
	* config/i386/sse.md (VI_512): Only include V64QImode and V32HImode
	if TARGET_AVX512BW.

	* gcc.target/i386/pr69820.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr69820.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog
Comment 13 Jakub Jelinek 2016-02-19 13:44:15 UTC
Fixed.