60656 – [4.8 regression] x86 vectorization produces wrong code

Bug 60656 - [4.8 regression] x86 vectorization produces wrong code

Summary: [4.8 regression] x86 vectorization produces wrong code

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	unknown

Importance:	P2 normal
Target Milestone:	4.8.5
Assignee:	Richard Biener

URL:
Keywords:	wrong-code

Duplicates (1):	61108 (view as bug list)
Depends on:
Blocks:

Reported:	2014-03-25 16:02 UTC by Paul Pluzhnikov
Modified:	2015-06-10 13:51 UTC (History)
CC List:	3 users (show)

See Also:
Host:	x86_64-unknown-linux-gnu
Target:	x86_64-unknown-linux-gnu
Build:	x86_64-unknown-linux-gnu
Known to work:
Known to fail:
Last reconfirmed:	2014-03-25 00:00:00

Attachments
gcc49-pr60656.patch (816 bytes, patch) 2014-03-28 15:21 UTC, Jakub Jelinek	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Paul Pluzhnikov 2014-03-25 16:02:50 UTC

Current trunk (r208813) and gcc-4.8 are affected, 4.7 does not appear to be.

gcc -O0 t.c && ./a.out
500450210036

gcc -O3 t.c && ./a.out
500450200033

gcc -O2 t.c && ./a.out
500450210036

gcc -O2 t.c -ftree-vectorize && ./a.out
500450200033

// t.c
#include <stdio.h>

int
main ()
{
  int v[] = {5000, 5001, 5002, 5003};
  long s = 0;
  int i;

  for(i = 0; i < 4; ++i)
    {
      long P = v[i];
      s += P*P*P;
    }
  printf("%ld\n", s);
}

Comment 1 Jakub Jelinek 2014-03-25 18:09:36 UTC

Started with r189006.  Looking into it.

Comment 2 Cong Hou 2014-03-25 19:41:27 UTC

This bug is caused by an optimization in GCC vectorizer that is not
implemented properly. When a reduction operation is vectorized, the
order of elements in vectors directly used in reduction does not
matter. In some cases the vectorizer may generate less code based on
this fact. GCC assigns a property named "vect_used_by_reduction" to
all vectors participating in reductions. However, vectors that are
indirectly used in reduction also have this property. For example,
consider the following three statements (all operands are vectors):

a = b op1 c;
d = a op2 e;
s1 = s0 op3 d;

Here assume the last statement is a reduction one, then a,b,c,d,e all
have the property "vect_used_by_reduction". However, if op2 is
different from op3, then a's element order can affect the final
result. GCC does not check this.

Comment 3 Jakub Jelinek 2014-03-25 20:03:06 UTC

You are right, before r189408 this was using ordered_p = false, now it is slightly different.
In any case, perhaps we could just check if the imm use of the stmt is a reduction PHI and only assume unordered in that case?

Comment 4 Cong Hou 2014-03-25 20:12:54 UTC

Yes, there is a quick fix: we can check if the def with vect_used_by_reduction is immediately used by a reduction stmt. After all, it seems that supportable_widening_operation() is the only place that takes advantage of this "the element order doesn't matter" feature.


diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 70fb411..7442d0c 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -7827,7 +7827,16 @@ supportable_widening_operation (enum tree_code code, gimple stmt,
                                             stmt, vectype_out, vectype_in,
                                             code1, code2, multi_step_cvt,
                                             interm_types))
-       return true;
+        {
+          tree lhs = gimple_assign_lhs (stmt);
+          use_operand_p dummy;
+          gimple use_stmt;
+          stmt_vec_info use_stmt_info = NULL;
+          if (single_imm_use (lhs, &dummy, &use_stmt)
+              && (use_stmt_info = vinfo_for_stmt (use_stmt))
+              && STMT_VINFO_DEF_TYPE (use_stmt_info) == vect_reduction_def)
+            return true;
+        }
       c1 = VEC_WIDEN_MULT_LO_EXPR;
       c2 = VEC_WIDEN_MULT_HI_EXPR;
       break;

Comment 5 Jakub Jelinek 2014-03-28 11:19:37 UTC

(In reply to Cong Hou from comment #4)
> Yes, there is a quick fix: we can check if the def with
> vect_used_by_reduction is immediately used by a reduction stmt. After all,
> it seems that supportable_widening_operation() is the only place that takes
> advantage of this "the element order doesn't matter" feature.
> 
> 
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 70fb411..7442d0c 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -7827,7 +7827,16 @@ supportable_widening_operation (enum tree_code code,
> gimple stmt,
>                                              stmt, vectype_out, vectype_in,
>                                              code1, code2, multi_step_cvt,
>                                              interm_types))
> -       return true;
> +        {
> +          tree lhs = gimple_assign_lhs (stmt);
> +          use_operand_p dummy;
> +          gimple use_stmt;
> +          stmt_vec_info use_stmt_info = NULL;
> +          if (single_imm_use (lhs, &dummy, &use_stmt)
> +              && (use_stmt_info = vinfo_for_stmt (use_stmt))
> +              && STMT_VINFO_DEF_TYPE (use_stmt_info) == vect_reduction_def)
> +            return true;
> +        }
>        c1 = VEC_WIDEN_MULT_LO_EXPR;
>        c2 = VEC_WIDEN_MULT_HI_EXPR;
>        break;

Looks good to me, perhaps just no need to initialize use_stmt_info to NULL.
Are you going to bootstrap/regtest this and post to gcc-patches?

Comment 6 Jakub Jelinek 2014-03-28 15:21:20 UTC

Created attachment 32475 [details]
gcc49-pr60656.patch

I've bootstrapped/regtested in the mean time this patch on x86_64-linux and i686-linux, no regression.  As it is your patch, can you please post it?

Comment 7 Cong Hou 2014-03-28 18:48:26 UTC

Yes, will do it. Thank you a lot!

Comment 8 Cong Hou 2014-04-05 01:27:53 UTC

Author: congh
Date: Sat Apr  5 01:27:21 2014
New Revision: 209138

URL: http://gcc.gnu.org/viewcvs?rev=209138&root=gcc&view=rev
Log:
2014-04-04  Cong Hou  <congh@google.com>

    PR tree-optimization/60656
    * tree-vect-stmts.c (supportable_widening_operation):
      Fix a bug that elements in a vector with vect_used_by_reduction
      property are incorrectly reordered when the operation on it is not
      consistant with the one in reduction operation.

2014-04-04  Cong Hou  <congh@google.com>

    PR tree-optimization/60656
    * gcc.dg/vect/pr60656.c: New test.


Added:
    trunk/gcc/testsuite/gcc.dg/vect/pr60656.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-stmts.c

Comment 9 Jakub Jelinek 2014-04-09 12:25:53 UTC

Fixed.

Comment 10 Jakub Jelinek 2014-04-09 12:28:18 UTC

Actually, still not fixed on the 4.8 branch, only on the trunk.

Comment 11 Bernd Edlinger 2014-04-16 08:52:38 UTC

Hmm,

with gcc-4.9.0-RC-20140411 on arm-linux-gnueabihf I see the following:

ERROR: gcc.dg/vect/pr60656.c: error executing dg-final: can't read "et_vect_widen_mult_si_to_di_pattern_saved": no such variable
UNRESOLVED: gcc.dg/vect/pr60656.c: error executing dg-final: can't read "et_vect_widen_mult_si_to_di_pattern_saved": no such variable

what does that mean?

Comment 12 Jakub Jelinek 2014-04-16 08:56:24 UTC

Should be fixed already with r209363.

Comment 13 Richard Biener 2014-05-22 09:05:42 UTC

GCC 4.8.3 is being released, adjusting target milestone.

Comment 14 Jakub Jelinek 2014-12-19 13:34:12 UTC

GCC 4.8.4 has been released.

Comment 15 Bernhard Kaindl 2015-06-09 23:37:08 UTC

Confirmed:

*     Fixed in: gcc-4.9.2 (release)
* NOT Fixed in: gcc-4.8.4 (release)

It seems indeed only i386, x32 and x86_64 (-m32, -mx32 and -m64) are affected.

-fopt-info-vec-note show that e.g. powerpc & powerpc64 don't vectorize the loop.

The mainline fix from Comment #8 apples without fuzz and fixes this wrong code issue in gcc-4.8.4.

Apply it for 4.8.5?

Comment 16 Richard Biener 2015-06-10 09:01:36 UTC

I'll see if it is backportable easily.

Comment 17 Richard Biener 2015-06-10 11:04:25 UTC

*** Bug 61108 has been marked as a duplicate of this bug. ***

Comment 18 Richard Biener 2015-06-10 12:53:41 UTC

Author: rguenth
Date: Wed Jun 10 12:53:09 2015
New Revision: 224327

URL: https://gcc.gnu.org/viewcvs?rev=224327&root=gcc&view=rev
Log:
2015-06-10  Richard Biener  <rguenther@suse.de>

	Backport from mainline
	2014-04-04  Cong Hou  <congh@google.com>

	PR tree-optimization/60656
	* tree-vect-stmts.c (supportable_widening_operation):
	Fix a bug that elements in a vector with vect_used_by_reduction
	property are incorrectly reordered when the operation on it is not
	consistant with the one in reduction operation.

	* gcc.dg/vect/pr60656.c: New test.

	2014-01-31  Richard Biener  <rguenther@suse.de>

	PR middle-end/59990
	* builtins.c (fold_builtin_memory_op): Make sure to not
	use a floating-point mode or a boolean or enumeral type for
	the copy operation.

	* gcc.dg/torture/pr59990.c: New testcase.
	* gcc.target/i386/pr49168-1.c: Adjust.

Added:
    branches/gcc-4_8-branch/gcc/testsuite/gcc.dg/torture/pr59990.c
    branches/gcc-4_8-branch/gcc/testsuite/gcc.dg/vect/pr60656.c
Modified:
    branches/gcc-4_8-branch/gcc/ChangeLog
    branches/gcc-4_8-branch/gcc/builtins.c
    branches/gcc-4_8-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr49168-1.c
    branches/gcc-4_8-branch/gcc/tree-vect-stmts.c

Comment 19 Richard Biener 2015-06-10 13:51:09 UTC

Fixed.