The current gcc trunk (as well as 4.6, 4.7, and 4.8) miscompiles the following code on x86_64-linux-gnu at -O3 in both 32-bit and 64-bit modes. It also affects gcc 4.6 and 4.7 at -O2, older versions of gcc, and on other platforms. For trunk and 4.8 (but not for 4.6 and 4.7 though), -fno-tree-vectorize makes the bug go away. Interestingly also, both the current clang and icc fail on this testcase too. $ gcc-trunk -v Using built-in specs. COLLECT_GCC=gcc-trunk COLLECT_LTO_WRAPPER=/usr/local/gcc-trunk/libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc-trunk/configure --enable-languages=c,c++,objc,obj-c++,fortran,lto --disable-werror --enable-checking=release --with-gmp=/usr/local/gcc-trunk --with-mpfr=/usr/local/gcc-trunk --with-mpc=/usr/local/gcc-trunk --with-cloog=/usr/local/gcc-trunk --prefix=/usr/local/gcc-trunk Thread model: posix gcc version 4.9.0 20131108 (experimental) [trunk revision 204593] (GCC) $ $ gcc-trunk -O2 small.c; a.out -1 $ gcc-trunk -O3 small.c; a.out 7 $ gcc-4.8.2 -O3 small.c; a.out 7 $ gcc-4.7.3 -O3 small.c; a.out 131071 $ gcc-4.7.3 -O2 small.c; a.out 131071 $ gcc-4.6.4 -O3 small.c; a.out 131071 $ gcc-4.6.4 -O2 small.c; a.out 131071 $ $ gcc-trunk -O3 -fno-tree-vectorize small.c; a.out -1 $ gcc-4.8.2 -O3 -fno-tree-vectorize small.c; a.out -1 $ gcc-4.7.3 -O3 -fno-tree-vectorize small.c; a.out 131071 $ gcc-4.6.4 -O3 -fno-tree-vectorize small.c; a.out 131071 $ $ clang-trunk -O3 small.c; a.out 0 $ icc -O3 small.c; a.out 1 $ -------------------------------------- int printf (const char *, ...); int a, c, d; short b = -1; void foo () { b++; } void bar () { l1: foo (); d = -3; for (a = 0; a > -3; a = d) c |= b; if (b) goto l1; } int main () { b++; l2: if (b) goto l2; bar (); printf ("%d\n", c); return 0; }
For the vectorizer issue on trunk, this is enough: int printf (const char *, ...); short b = 0; int main () { int c = 0; l1: b++; c |= b; if (b) goto l1; printf ("%d\n", c); return 0; } This code relies on the gcc extension that casting to a shorter integer is done modulo 2. Note that if we move the declaration of b inside the function, the compiler (wrongly?) warns: s.c:11:4: warning: iteration 32767u invokes undefined behavior [-Waggressive-loop-optimizations] b++; ^ s.c:13:6: note: containing loop if (b) ^ (and the return value changes from 7 to 131071) 2 missed optimizations while looking at the dumps: stmp_var_.12_46 = _4 + 1; stmp_var_.12_47 = _4 + 2; stmp_var_.12_48 = _4 + 3; stmp_var_.12_49 = _4 + 4; stmp_var_.12_50 = _4 + 5; stmp_var_.12_51 = _4 + 6; stmp_var_.12_52 = _4 + 7; vect_cst_.13_53 = {_4, stmp_var_.12_46, stmp_var_.12_47, stmp_var_.12_48, stmp_var_.12_49, stmp_var_.12_50, stmp_var_.12_51, stmp_var_.12_52}; This should really be: v = { _4, _4, ... }; w = v + { 0, 1, 2, ... }; Also: vector(8) unsigned short vect_b.17; vector(8) short int vect_vec_iv_.16; vector(8) unsigned short vect_vec_iv_.15; [...] vect_vec_iv_.16_57 = VIEW_CONVERT_EXPR<vector(8) short int>(vect_vec_iv_.15_55); vect_vec_iv_.15_56 = vect_vec_iv_.15_55 + vect_cst_.14_54; vect_b.17_58 = VIEW_CONVERT_EXPR<vector(8) unsigned short>(vect_vec_iv_.16_57); vect_b.17_58 is vect_vec_iv_.15_55, not sure why we don't see through that.
Confirmed, works with 3.3.
I will have a look.
I think the issue is (set_nb_iterations_in_loop = ~(unsigned short) pretmp_22)) t.c:12:6: note: ==> get_loop_niters:-(unsigned short) pretmp_22 that is, number_of_exit_cond_executions which returns number_of_latch_executions + 1 but does not check for overflow. It checks ret = chrec_fold_plus (type, ret, build_int_cst (type, 1)); if (TREE_CODE (ret) == INTEGER_CST && TREE_OVERFLOW (ret)) return chrec_dont_know; but obviously that only works for constants. It might be best to eliminate uses of number_of_exit_cond_executions in favor of number_of_latch_executions (while adjusting users of course, vectorization and loop distribution). We can also use a wider type here (within certain limits, of course).
Unfortunately @@ -2930,11 +2931,31 @@ number_of_exit_cond_executions (struct l if (chrec_contains_undetermined (ret)) return ret; - ret = chrec_fold_plus (type, ret, build_int_cst (type, 1)); - if (TREE_CODE (ret) == INTEGER_CST - && TREE_OVERFLOW (ret)) + /* Handle constants without widening. */ + if (TREE_CODE (ret) == INTEGER_CST) + { + double_int adj = tree_to_double_int (ret) + double_int_one; + if (double_int_fits_to_tree_p (type, adj)) + return double_int_to_tree (type, adj); + } + + /* For the remaining case widen to an unsigned type from a + signed one or to one with at least one bit more precision + but not larger than using word_mode. */ + if (!TYPE_UNSIGNED (type)) + type = unsigned_type_for (type); + else if (GET_MODE_CLASS (TYPE_MODE (type)) == MODE_INT + && TYPE_PRECISION (type) < GET_MODE_PRECISION (word_mode)) + { + enum machine_mode mode; + mode = smallest_mode_for_size (TYPE_PRECISION (type) + 1, MODE_INT); + type = build_nonstandard_integer_type (GET_MODE_PRECISION (mode), 1); + } + else return chrec_dont_know; + ret = chrec_convert (type, ret, NULL); + ret = chrec_fold_plus (type, ret, build_int_cst (type, 1)); return ret; } breaks loops which use size_t as induction variable type. But they are possibly miscompiled anyway. Not sure if we can rely on the availability of arithmetics on modes > word_mode, can we?
I improve this with also using max_stmt_executions I at least get no vect.exp fail but the testcase in this PR is not vectorized when using a size_t b.
Still FAILs to vectorize gcc.dg/vect/pr18425.c with -m32. But we have here => get_loop_niters:(unsigned long) (__n_7(D) + 4294967295) + 1 that could have been simplified. __n is unsigned int. So I can special case that (but really only that ... still common enough).
Created attachment 31259 [details] candidate patch Candidate patch. But I think it's better to remove this functions users.
Author: rguenth Date: Thu Nov 21 09:15:05 2013 New Revision: 205197 URL: http://gcc.gnu.org/viewcvs?rev=205197&root=gcc&view=rev Log: 2013-11-21 Richard Biener <rguenther@suse.de> PR tree-optimization/59058 * tree-loop-distribution.c (struct partition_s): Add plus_one member. (build_size_arg_loc): Apply niter adjustment here. (generate_memset_builtin): Adjust. (generate_memcpy_builtin): Likewise. (classify_partition): Do not use number_of_exit_cond_executions but record whether niter needs to be adjusted. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-loop-distribution.c
Author: rguenth Date: Thu Nov 21 14:09:15 2013 New Revision: 205217 URL: http://gcc.gnu.org/viewcvs?rev=205217&root=gcc&view=rev Log: 2013-11-21 Richard Biener <rguenther@suse.de> PR tree-optimization/59058 * tree-scalar-evolution.h (number_of_exit_cond_executions): Remove. * tree-scalar-evolution.c (number_of_exit_cond_executions): Likewise. * tree-vectorizer.h (LOOP_PEELING_FOR_ALIGNMENT): Rename to ... (LOOP_VINFO_PEELING_FOR_ALIGNMENT): ... this. (NITERS_KNOWN_P): Fold into ... (LOOP_VINFO_NITERS_KNOWN_P): ... this. (LOOP_VINFO_PEELING_FOR_NITER): Add. * tree-vect-loop-manip.c (vect_gen_niters_for_prolog_loop): Use LOOP_VINFO_PEELING_FOR_ALIGNMENT. (vect_do_peeling_for_alignment): Re-use precomputed niter instead of re-emitting it. * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Use LOOP_VINFO_PEELING_FOR_ALIGNMENT. * tree-vect-loop.c (vect_get_loop_niters): Use number_of_latch_executions. (new_loop_vec_info): Initialize LOOP_VINFO_PEELING_FOR_NITER. (vect_analyze_loop_form): Simplify. (vect_analyze_loop_operations): Move epilogue peeling code ... (vect_analyze_loop_2): ... here and adjust it to compute LOOP_VINFO_PEELING_FOR_NITER. (vect_estimate_min_profitable_iters): Use LOOP_VINFO_PEELING_FOR_ALIGNMENT. (vect_build_loop_niters): Emit on the preheader. (vect_generate_tmps_on_preheader): Likewise. (vect_transform_loop): Use LOOP_VINFO_PEELING_FOR_NITER instead of recomputing it. Adjust. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-scalar-evolution.c trunk/gcc/tree-scalar-evolution.h trunk/gcc/tree-vect-data-refs.c trunk/gcc/tree-vect-loop-manip.c trunk/gcc/tree-vect-loop.c trunk/gcc/tree-vectorizer.h
So, is this fully fixed now on the trunk?
"jakub at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote: >http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59058 > >Jakub Jelinek <jakub at gcc dot gnu.org> changed: > > What |Removed |Added >---------------------------------------------------------------------------- > CC| |jakub at gcc dot gnu.org > >--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> --- >So, is this fully fixed now on the trunk? No, it's not fixed. Richard.
Author: rguenth Date: Fri Dec 6 09:23:07 2013 New Revision: 205730 URL: http://gcc.gnu.org/viewcvs?rev=205730&root=gcc&view=rev Log: 2013-12-06 Richard Biener <rguenther@suse.de> PR tree-optimization/59058 * tree-vectorizer.h (struct _loop_vec_info): Add num_itersm1 member. (LOOP_VINFO_NITERSM1): New macro. * tree-vect-loop-manip.c (slpeel_tree_peel_loop_to_edge): Express the vector loop entry test in terms of scalar latch executions. (vect_do_peeling_for_alignment): Update LOOP_VINFO_NITERSM1. * tree-vect-loop.c (vect_get_loop_niters): Also return the number of latch executions. (new_loop_vec_info): Initialize LOOP_VINFO_NITERSM1. (vect_analyze_loop_form): Likewise. (vect_generate_tmps_on_preheader): Compute the number of vectorized iterations differently. * gcc.dg/torture/pr59058.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/torture/pr59058.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-loop-manip.c trunk/gcc/tree-vect-loop.c trunk/gcc/tree-vectorizer.h
Fixed on trunk sofar.
The 4.7 branch is being closed, moving target milestone to 4.8.4.
GCC 4.8.4 has been released.
Fixed for 4.9.0.