Created attachment 37689 [details] Reproducer. Test case produces incorrect result with -O3 option. Everything works fine with gcc version 5.3.1 20160212 (Revision=233387) and gcc version 4.9.4 20160212 (prerelease) (Revision=233387). Output: > g++ -march=core-avx2 -O3 repr.cpp ; ./a.out 7540928331148314342 > g++ -march=core-avx2 -O2 repr.cpp ; ./a.out 18084857435860378664 GCC version: > g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/export/users/vlivinsk/gcc-trunk/bin/../libexec/gcc/x86_64-pc-linux-gnu/6.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /export/users/gnutester/stability/svn/trunk/configure --with-arch=corei7 --with-cpu=corei7 --enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld --enable-cloog-backend=isl --with-fpmath=sse --enable-checking=release --with-pkgversion=Revision=233012 --prefix=/export/users/gnutester/stability/work/trunk/64/install --enable-languages=c,c++,fortran,java,lto Thread model: posix gcc version 6.0.0 20160130 (experimental) (Revision=233012)
Created attachment 37690 [details] Preprocessed testcase.
-fno-tree-loop-vectorize "fixes" the issue.
Started with r230098.
Test case also cause internal compiler error when it is compiled for knl. > g++ -march=knl -O3 repr.cpp repr.cpp: In function ‘void foo()’: repr.cpp:23:45: internal compiler error: in ix86_expand_sse_cmp, at config/i386/i386.c:22702 out_1 [i] = in_1 [i] * (out_4 [i] * (bool)out_4 [i]); ^~~~~~~~~~~~~~~ 0xd0461e ix86_expand_sse_cmp /export/users/gnutester/stability/svn/trunk/gcc/config/i386/i386.c:22702 0xd37344 ix86_expand_int_vcond(rtx_def**) /export/users/gnutester/stability/svn/trunk/gcc/config/i386/i386.c:23398 0xee89d9 gen_vconduv32hiv32hi(rtx_def*, rtx_def*, rtx_def*, rtx_def*, rtx_def*, rtx_def*) /export/users/gnutester/stability/svn/trunk/gcc/config/i386/sse.md:11065 0x9fdf58 maybe_expand_insn(insn_code, unsigned int, expand_operand*) /export/users/gnutester/stability/svn/trunk/gcc/optabs.c:7015 0x9fdf58 expand_insn(insn_code, unsigned int, expand_operand*) /export/users/gnutester/stability/svn/trunk/gcc/optabs.c:7046 0x9fe700 expand_vec_cond_expr(tree_node*, tree_node*, tree_node*, tree_node*, rtx_def*) /export/users/gnutester/stability/svn/trunk/gcc/optabs.c:5622 0x863bc9 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier) /export/users/gnutester/stability/svn/trunk/gcc/expr.c:9343 0x78035e expand_gimple_stmt_1 /export/users/gnutester/stability/svn/trunk/gcc/cfgexpand.c:3641 0x78035e expand_gimple_stmt /export/users/gnutester/stability/svn/trunk/gcc/cfgexpand.c:3702 0x7816ff expand_gimple_basic_block /export/users/gnutester/stability/svn/trunk/gcc/cfgexpand.c:5708 0x786606 execute /export/users/gnutester/stability/svn/trunk/gcc/cfgexpand.c:6323
knl ICE: int a[100], b[100]; short c[100]; void fn1() { for (int i = 0; i < 100; ++i) b[i] = a[i] * (bool)c[i]; }
C runtime testcase: unsigned int a[100]; long long int b[100]; unsigned short c[100]; __attribute__((noinline, noclone)) void foo (void) { int i; for (i = 0; i < 100; ++i) b[i] = a[i] * (c[i] * (_Bool) c[i]); } int main () { int i; if (__SIZEOF_INT__ * __CHAR_BIT__ != 32) return 0; for (i = 0; i < 100; ++i) { a[i] = 3489456818U; b[i] = 0x1eadbeefbeefdeadLL; c[i] = 38364; } foo (); for (i = 0; i < 100; ++i) if (b[i] != 0xed446af8U) __builtin_abort (); return 0; }
Created attachment 37694 [details] gcc6-pr69820.patch As for the ICE, the following patch works for me. VI_512 iterator is only used in the vcond* patterns, therefore it is desirable to change them all this way.
As for the wrong-code, we used to emit: ... vect_cst_.23_105 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; vect_cst_.24_106 = { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }; vect_cst_.25_107 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; ... vect_patt_37.22_108 = VEC_COND_EXPR <vect__5.20_102 != vect_cst_.23_105, vect_cst_.24_106, vect_cst_.25_107>; vect_patt_37.22_109 = VEC_COND_EXPR <vect__5.21_104 != vect_cst_.23_105, vect_cst_.24_106, vect_cst_.25_107>; ... vect__35.26_110 = VIEW_CONVERT_EXPR<vector(16) short unsigned int>(vect_patt_37.22_108); vect__35.26_111 = VIEW_CONVERT_EXPR<vector(16) short unsigned int>(vect_patt_37.22_109); but now we emit incorrect: vect_cst__103 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; ... mask__7.22_104 = vect__5.20_100 != vect_cst__103; mask__7.22_105 = vect__5.21_102 != vect_cst__103; vect__35.23_106 = VIEW_CONVERT_EXPR<vector(16) short unsigned int>(mask__7.22_104); vect__35.23_107 = VIEW_CONVERT_EXPR<vector(16) short unsigned int>(mask__7.22_105); The reason why the latter is wrong is that for non-zero short values it yields 0xffff instead of 1.
Created attachment 37695 [details] gcc6-pr69820-2.patch Untested fix for the wrong-code issue. For conversion from bool/_Bool/unsigned :1 we have pattern recognizer for it, which is essential for correct behavior (otherwise we get all ones instead of 1 for true), but the various widening pattern recognizers aren't prepared to deal with those.
Author: jakub Date: Tue Feb 16 15:16:48 2016 New Revision: 233457 URL: https://gcc.gnu.org/viewcvs?rev=233457&root=gcc&view=rev Log: PR tree-optimization/69820 * tree-vect-patterns.c (type_conversion_p): Return false if *orig_type is unsigned single precision or boolean. (vect_recog_dot_prod_pattern, vect_recog_widen_mult_pattern): Formatting fix. * gcc.dg/vect/pr69820.c: New test. Added: trunk/gcc/testsuite/gcc.dg/vect/pr69820.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-patterns.c
Wrong-code issue fixed, the ICE fix is still pending review.
Author: jakub Date: Fri Feb 19 13:42:38 2016 New Revision: 233558 URL: https://gcc.gnu.org/viewcvs?rev=233558&root=gcc&view=rev Log: PR target/69820 * config/i386/sse.md (VI_512): Only include V64QImode and V32HImode if TARGET_AVX512BW. * gcc.target/i386/pr69820.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr69820.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog
Fixed.