Bug 59058 - [4.8 Regression] wrong code at -O3 on x86_64-linux-gnu (affecting gcc 4.6 to trunk)
Summary: [4.8 Regression] wrong code at -O3 on x86_64-linux-gnu (affecting gcc 4.6 to ...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.9.0
: P2 normal
Target Milestone: 4.9.0
Assignee: Richard Biener
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2013-11-09 04:52 UTC by Zhendong Su
Modified: 2015-06-23 08:43 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work: 3.3.6, 4.9.0
Known to fail: 4.1.2, 4.8.5
Last reconfirmed: 2013-11-11 00:00:00


Attachments
candidate patch (1.35 KB, patch)
2013-11-20 13:50 UTC, Richard Biener
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Zhendong Su 2013-11-09 04:52:15 UTC
The current gcc trunk (as well as 4.6, 4.7, and 4.8) miscompiles the following code on x86_64-linux-gnu at -O3 in both 32-bit and 64-bit modes. 

It also affects gcc 4.6 and 4.7 at -O2, older versions of gcc, and on other platforms. 

For trunk and 4.8 (but not for 4.6 and 4.7 though), -fno-tree-vectorize makes the bug go away. 

Interestingly also, both the current clang and icc fail on this testcase too. 

$ gcc-trunk -v
Using built-in specs.
COLLECT_GCC=gcc-trunk
COLLECT_LTO_WRAPPER=/usr/local/gcc-trunk/libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-trunk/configure --enable-languages=c,c++,objc,obj-c++,fortran,lto --disable-werror --enable-checking=release --with-gmp=/usr/local/gcc-trunk --with-mpfr=/usr/local/gcc-trunk --with-mpc=/usr/local/gcc-trunk --with-cloog=/usr/local/gcc-trunk --prefix=/usr/local/gcc-trunk
Thread model: posix
gcc version 4.9.0 20131108 (experimental) [trunk revision 204593] (GCC) 
$ 
$ gcc-trunk -O2 small.c; a.out
-1
$ gcc-trunk -O3 small.c; a.out
7
$ gcc-4.8.2 -O3 small.c; a.out
7
$ gcc-4.7.3 -O3 small.c; a.out
131071
$ gcc-4.7.3 -O2 small.c; a.out
131071
$ gcc-4.6.4 -O3 small.c; a.out
131071
$ gcc-4.6.4 -O2 small.c; a.out
131071
$ 
$ gcc-trunk -O3 -fno-tree-vectorize small.c; a.out
-1
$ gcc-4.8.2 -O3 -fno-tree-vectorize small.c; a.out
-1
$ gcc-4.7.3 -O3 -fno-tree-vectorize small.c; a.out
131071
$ gcc-4.6.4 -O3 -fno-tree-vectorize small.c; a.out
131071
$
$ clang-trunk -O3 small.c; a.out
0
$ icc -O3 small.c; a.out
1
$ 


--------------------------------------


int printf (const char *, ...);

int a, c, d;
short b = -1; 

void 
foo ()
{
  b++;
}

void 
bar ()
{
l1:
  foo ();
  d = -3;
  for (a = 0; a > -3; a = d)
    c |= b;
  if (b)
    goto l1;
}

int 
main ()
{
  b++;
l2:
  if (b)
    goto l2;
  bar ();
  printf ("%d\n", c);
  return 0;
}
Comment 1 Marc Glisse 2013-11-09 08:48:27 UTC
For the vectorizer issue on trunk, this is enough:

int printf (const char *, ...);

short b = 0; 

int 
main ()
{
  int c = 0;
l1:
  b++;
  c |= b;
  if (b)
    goto l1;
  printf ("%d\n", c);
  return 0;
}

This code relies on the gcc extension that casting to a shorter integer is done modulo 2. Note that if we move the declaration of b inside the function, the compiler (wrongly?) warns:
s.c:11:4: warning: iteration 32767u invokes undefined behavior [-Waggressive-loop-optimizations]
   b++;
    ^
s.c:13:6: note: containing loop
   if (b)
      ^
(and the return value changes from 7 to 131071)


2 missed optimizations while looking at the dumps:

  stmp_var_.12_46 = _4 + 1;
  stmp_var_.12_47 = _4 + 2;
  stmp_var_.12_48 = _4 + 3;
  stmp_var_.12_49 = _4 + 4;
  stmp_var_.12_50 = _4 + 5;
  stmp_var_.12_51 = _4 + 6;
  stmp_var_.12_52 = _4 + 7;
  vect_cst_.13_53 = {_4, stmp_var_.12_46, stmp_var_.12_47, stmp_var_.12_48, stmp_var_.12_49, stmp_var_.12_50, stmp_var_.12_51, stmp_var_.12_52};

This should really be:

  v = { _4, _4, ... };
  w = v + { 0, 1, 2, ... };

Also:

  vector(8) unsigned short vect_b.17;
  vector(8) short int vect_vec_iv_.16;
  vector(8) unsigned short vect_vec_iv_.15;
[...]
  vect_vec_iv_.16_57 = VIEW_CONVERT_EXPR<vector(8) short int>(vect_vec_iv_.15_55);
  vect_vec_iv_.15_56 = vect_vec_iv_.15_55 + vect_cst_.14_54;
  vect_b.17_58 = VIEW_CONVERT_EXPR<vector(8) unsigned short>(vect_vec_iv_.16_57);

vect_b.17_58 is vect_vec_iv_.15_55, not sure why we don't see through that.
Comment 2 Richard Biener 2013-11-11 15:48:26 UTC
Confirmed, works with 3.3.
Comment 3 Richard Biener 2013-11-19 15:04:57 UTC
I will have a look.
Comment 4 Richard Biener 2013-11-20 11:45:03 UTC
I think the issue is

 (set_nb_iterations_in_loop = ~(unsigned short) pretmp_22))
t.c:12:6: note: ==> get_loop_niters:-(unsigned short) pretmp_22

that is, number_of_exit_cond_executions which returns
number_of_latch_executions + 1 but does not check for overflow.
It checks

  ret = chrec_fold_plus (type, ret, build_int_cst (type, 1));
  if (TREE_CODE (ret) == INTEGER_CST
      && TREE_OVERFLOW (ret))
    return chrec_dont_know;

but obviously that only works for constants.

It might be best to eliminate uses of number_of_exit_cond_executions
in favor of number_of_latch_executions (while adjusting users of course,
vectorization and loop distribution).  We can also use a wider
type here (within certain limits, of course).
Comment 5 Richard Biener 2013-11-20 12:08:22 UTC
Unfortunately

@@ -2930,11 +2931,31 @@ number_of_exit_cond_executions (struct l
   if (chrec_contains_undetermined (ret))
     return ret;
 
-  ret = chrec_fold_plus (type, ret, build_int_cst (type, 1));
-  if (TREE_CODE (ret) == INTEGER_CST
-      && TREE_OVERFLOW (ret))
+  /* Handle constants without widening.  */
+  if (TREE_CODE (ret) == INTEGER_CST)
+    {
+      double_int adj = tree_to_double_int (ret) + double_int_one;
+      if (double_int_fits_to_tree_p (type, adj))
+       return double_int_to_tree (type, adj);
+    }
+
+  /* For the remaining case widen to an unsigned type from a
+     signed one or to one with at least one bit more precision
+     but not larger than using word_mode.  */
+  if (!TYPE_UNSIGNED (type))
+    type = unsigned_type_for (type);
+  else if (GET_MODE_CLASS (TYPE_MODE (type)) == MODE_INT
+          && TYPE_PRECISION (type) < GET_MODE_PRECISION (word_mode))
+    {
+      enum machine_mode mode;
+      mode = smallest_mode_for_size (TYPE_PRECISION (type) + 1, MODE_INT);
+      type = build_nonstandard_integer_type (GET_MODE_PRECISION (mode), 1);
+    }
+  else
     return chrec_dont_know;
 
+  ret = chrec_convert (type, ret, NULL);
+  ret = chrec_fold_plus (type, ret, build_int_cst (type, 1));
   return ret;
 }
 
breaks loops which use size_t as induction variable type.  But they
are possibly miscompiled anyway.  Not sure if we can rely on
the availability of arithmetics on modes > word_mode, can we?
Comment 6 Richard Biener 2013-11-20 12:28:53 UTC
I improve this with also using max_stmt_executions I at least get no
vect.exp fail but the testcase in this PR is not vectorized when using
a size_t b.
Comment 7 Richard Biener 2013-11-20 13:07:31 UTC
Still FAILs to vectorize gcc.dg/vect/pr18425.c with -m32.  But we have here

=> get_loop_niters:(unsigned long) (__n_7(D) + 4294967295) + 1

that could have been simplified.  __n is unsigned int.  So I can special
case that (but really only that ... still common enough).
Comment 8 Richard Biener 2013-11-20 13:50:52 UTC
Created attachment 31259 [details]
candidate patch

Candidate patch.

But I think it's better to remove this functions users.
Comment 9 Richard Biener 2013-11-21 09:15:08 UTC
Author: rguenth
Date: Thu Nov 21 09:15:05 2013
New Revision: 205197

URL: http://gcc.gnu.org/viewcvs?rev=205197&root=gcc&view=rev
Log:
2013-11-21  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/59058
	* tree-loop-distribution.c (struct partition_s): Add plus_one
	member.
	(build_size_arg_loc): Apply niter adjustment here.
	(generate_memset_builtin): Adjust.
	(generate_memcpy_builtin): Likewise.
	(classify_partition): Do not use number_of_exit_cond_executions
	but record whether niter needs to be adjusted.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-loop-distribution.c
Comment 10 Richard Biener 2013-11-21 14:09:18 UTC
Author: rguenth
Date: Thu Nov 21 14:09:15 2013
New Revision: 205217

URL: http://gcc.gnu.org/viewcvs?rev=205217&root=gcc&view=rev
Log:
2013-11-21  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/59058
	* tree-scalar-evolution.h (number_of_exit_cond_executions): Remove.
	* tree-scalar-evolution.c (number_of_exit_cond_executions): Likewise.
	* tree-vectorizer.h (LOOP_PEELING_FOR_ALIGNMENT): Rename to ...
	(LOOP_VINFO_PEELING_FOR_ALIGNMENT): ... this.
	(NITERS_KNOWN_P): Fold into ...
	(LOOP_VINFO_NITERS_KNOWN_P): ... this.
	(LOOP_VINFO_PEELING_FOR_NITER): Add.
	* tree-vect-loop-manip.c (vect_gen_niters_for_prolog_loop):
	Use LOOP_VINFO_PEELING_FOR_ALIGNMENT.
	(vect_do_peeling_for_alignment): Re-use precomputed niter
	instead of re-emitting it.
	* tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
	Use LOOP_VINFO_PEELING_FOR_ALIGNMENT.
	* tree-vect-loop.c (vect_get_loop_niters): Use
	number_of_latch_executions.
	(new_loop_vec_info): Initialize LOOP_VINFO_PEELING_FOR_NITER.
	(vect_analyze_loop_form): Simplify.
	(vect_analyze_loop_operations): Move epilogue peeling code ...
	(vect_analyze_loop_2): ... here and adjust it to compute
	LOOP_VINFO_PEELING_FOR_NITER.
	(vect_estimate_min_profitable_iters): Use
	LOOP_VINFO_PEELING_FOR_ALIGNMENT.
	(vect_build_loop_niters): Emit on the preheader.
	(vect_generate_tmps_on_preheader): Likewise.
	(vect_transform_loop): Use LOOP_VINFO_PEELING_FOR_NITER instead
	of recomputing it.  Adjust.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-scalar-evolution.c
    trunk/gcc/tree-scalar-evolution.h
    trunk/gcc/tree-vect-data-refs.c
    trunk/gcc/tree-vect-loop-manip.c
    trunk/gcc/tree-vect-loop.c
    trunk/gcc/tree-vectorizer.h
Comment 11 Jakub Jelinek 2013-12-03 16:31:50 UTC
So, is this fully fixed now on the trunk?
Comment 12 rguenther@suse.de 2013-12-03 18:50:03 UTC
"jakub at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote:
>http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59058
>
>Jakub Jelinek <jakub at gcc dot gnu.org> changed:
>
>           What    |Removed                     |Added
>----------------------------------------------------------------------------
>               CC|                            |jakub at gcc dot gnu.org
>
>--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
>So, is this fully fixed now on the trunk?

No, it's not fixed.

Richard.
Comment 13 Richard Biener 2013-12-06 09:23:10 UTC
Author: rguenth
Date: Fri Dec  6 09:23:07 2013
New Revision: 205730

URL: http://gcc.gnu.org/viewcvs?rev=205730&root=gcc&view=rev
Log:
2013-12-06  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/59058
	* tree-vectorizer.h (struct _loop_vec_info): Add num_itersm1
	member.
	(LOOP_VINFO_NITERSM1): New macro.
	* tree-vect-loop-manip.c (slpeel_tree_peel_loop_to_edge): Express
	the vector loop entry test in terms of scalar latch executions.
	(vect_do_peeling_for_alignment): Update LOOP_VINFO_NITERSM1.
	* tree-vect-loop.c (vect_get_loop_niters): Also return the
	number of latch executions.
	(new_loop_vec_info): Initialize LOOP_VINFO_NITERSM1.
	(vect_analyze_loop_form): Likewise.
	(vect_generate_tmps_on_preheader): Compute the number of
	vectorized iterations differently.

	* gcc.dg/torture/pr59058.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/torture/pr59058.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-loop-manip.c
    trunk/gcc/tree-vect-loop.c
    trunk/gcc/tree-vectorizer.h
Comment 14 Richard Biener 2013-12-06 12:42:00 UTC
Fixed on trunk sofar.
Comment 15 Richard Biener 2014-06-12 13:45:49 UTC
The 4.7 branch is being closed, moving target milestone to 4.8.4.
Comment 16 Jakub Jelinek 2014-12-19 13:35:36 UTC
GCC 4.8.4 has been released.
Comment 17 Richard Biener 2015-06-23 08:43:17 UTC
Fixed for 4.9.0.