Current trunk (r208813) and gcc-4.8 are affected, 4.7 does not appear to be. gcc -O0 t.c && ./a.out 500450210036 gcc -O3 t.c && ./a.out 500450200033 gcc -O2 t.c && ./a.out 500450210036 gcc -O2 t.c -ftree-vectorize && ./a.out 500450200033 // t.c #include <stdio.h> int main () { int v[] = {5000, 5001, 5002, 5003}; long s = 0; int i; for(i = 0; i < 4; ++i) { long P = v[i]; s += P*P*P; } printf("%ld\n", s); }
Started with r189006. Looking into it.
This bug is caused by an optimization in GCC vectorizer that is not implemented properly. When a reduction operation is vectorized, the order of elements in vectors directly used in reduction does not matter. In some cases the vectorizer may generate less code based on this fact. GCC assigns a property named "vect_used_by_reduction" to all vectors participating in reductions. However, vectors that are indirectly used in reduction also have this property. For example, consider the following three statements (all operands are vectors): a = b op1 c; d = a op2 e; s1 = s0 op3 d; Here assume the last statement is a reduction one, then a,b,c,d,e all have the property "vect_used_by_reduction". However, if op2 is different from op3, then a's element order can affect the final result. GCC does not check this.
You are right, before r189408 this was using ordered_p = false, now it is slightly different. In any case, perhaps we could just check if the imm use of the stmt is a reduction PHI and only assume unordered in that case?
Yes, there is a quick fix: we can check if the def with vect_used_by_reduction is immediately used by a reduction stmt. After all, it seems that supportable_widening_operation() is the only place that takes advantage of this "the element order doesn't matter" feature. diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 70fb411..7442d0c 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -7827,7 +7827,16 @@ supportable_widening_operation (enum tree_code code, gimple stmt, stmt, vectype_out, vectype_in, code1, code2, multi_step_cvt, interm_types)) - return true; + { + tree lhs = gimple_assign_lhs (stmt); + use_operand_p dummy; + gimple use_stmt; + stmt_vec_info use_stmt_info = NULL; + if (single_imm_use (lhs, &dummy, &use_stmt) + && (use_stmt_info = vinfo_for_stmt (use_stmt)) + && STMT_VINFO_DEF_TYPE (use_stmt_info) == vect_reduction_def) + return true; + } c1 = VEC_WIDEN_MULT_LO_EXPR; c2 = VEC_WIDEN_MULT_HI_EXPR; break;
(In reply to Cong Hou from comment #4) > Yes, there is a quick fix: we can check if the def with > vect_used_by_reduction is immediately used by a reduction stmt. After all, > it seems that supportable_widening_operation() is the only place that takes > advantage of this "the element order doesn't matter" feature. > > > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > index 70fb411..7442d0c 100644 > --- a/gcc/tree-vect-stmts.c > +++ b/gcc/tree-vect-stmts.c > @@ -7827,7 +7827,16 @@ supportable_widening_operation (enum tree_code code, > gimple stmt, > stmt, vectype_out, vectype_in, > code1, code2, multi_step_cvt, > interm_types)) > - return true; > + { > + tree lhs = gimple_assign_lhs (stmt); > + use_operand_p dummy; > + gimple use_stmt; > + stmt_vec_info use_stmt_info = NULL; > + if (single_imm_use (lhs, &dummy, &use_stmt) > + && (use_stmt_info = vinfo_for_stmt (use_stmt)) > + && STMT_VINFO_DEF_TYPE (use_stmt_info) == vect_reduction_def) > + return true; > + } > c1 = VEC_WIDEN_MULT_LO_EXPR; > c2 = VEC_WIDEN_MULT_HI_EXPR; > break; Looks good to me, perhaps just no need to initialize use_stmt_info to NULL. Are you going to bootstrap/regtest this and post to gcc-patches?
Created attachment 32475 [details] gcc49-pr60656.patch I've bootstrapped/regtested in the mean time this patch on x86_64-linux and i686-linux, no regression. As it is your patch, can you please post it?
Yes, will do it. Thank you a lot!
Author: congh Date: Sat Apr 5 01:27:21 2014 New Revision: 209138 URL: http://gcc.gnu.org/viewcvs?rev=209138&root=gcc&view=rev Log: 2014-04-04 Cong Hou <congh@google.com> PR tree-optimization/60656 * tree-vect-stmts.c (supportable_widening_operation): Fix a bug that elements in a vector with vect_used_by_reduction property are incorrectly reordered when the operation on it is not consistant with the one in reduction operation. 2014-04-04 Cong Hou <congh@google.com> PR tree-optimization/60656 * gcc.dg/vect/pr60656.c: New test. Added: trunk/gcc/testsuite/gcc.dg/vect/pr60656.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-stmts.c
Fixed.
Actually, still not fixed on the 4.8 branch, only on the trunk.
Hmm, with gcc-4.9.0-RC-20140411 on arm-linux-gnueabihf I see the following: ERROR: gcc.dg/vect/pr60656.c: error executing dg-final: can't read "et_vect_widen_mult_si_to_di_pattern_saved": no such variable UNRESOLVED: gcc.dg/vect/pr60656.c: error executing dg-final: can't read "et_vect_widen_mult_si_to_di_pattern_saved": no such variable what does that mean?
Should be fixed already with r209363.
GCC 4.8.3 is being released, adjusting target milestone.
GCC 4.8.4 has been released.
Confirmed: * Fixed in: gcc-4.9.2 (release) * NOT Fixed in: gcc-4.8.4 (release) It seems indeed only i386, x32 and x86_64 (-m32, -mx32 and -m64) are affected. -fopt-info-vec-note show that e.g. powerpc & powerpc64 don't vectorize the loop. The mainline fix from Comment #8 apples without fuzz and fixes this wrong code issue in gcc-4.8.4. Apply it for 4.8.5?
I'll see if it is backportable easily.
*** Bug 61108 has been marked as a duplicate of this bug. ***
Author: rguenth Date: Wed Jun 10 12:53:09 2015 New Revision: 224327 URL: https://gcc.gnu.org/viewcvs?rev=224327&root=gcc&view=rev Log: 2015-06-10 Richard Biener <rguenther@suse.de> Backport from mainline 2014-04-04 Cong Hou <congh@google.com> PR tree-optimization/60656 * tree-vect-stmts.c (supportable_widening_operation): Fix a bug that elements in a vector with vect_used_by_reduction property are incorrectly reordered when the operation on it is not consistant with the one in reduction operation. * gcc.dg/vect/pr60656.c: New test. 2014-01-31 Richard Biener <rguenther@suse.de> PR middle-end/59990 * builtins.c (fold_builtin_memory_op): Make sure to not use a floating-point mode or a boolean or enumeral type for the copy operation. * gcc.dg/torture/pr59990.c: New testcase. * gcc.target/i386/pr49168-1.c: Adjust. Added: branches/gcc-4_8-branch/gcc/testsuite/gcc.dg/torture/pr59990.c branches/gcc-4_8-branch/gcc/testsuite/gcc.dg/vect/pr60656.c Modified: branches/gcc-4_8-branch/gcc/ChangeLog branches/gcc-4_8-branch/gcc/builtins.c branches/gcc-4_8-branch/gcc/testsuite/ChangeLog branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr49168-1.c branches/gcc-4_8-branch/gcc/tree-vect-stmts.c