80631 – [6 Regression] Compiling with -O3 -mavx2 gives wrong code

Bug 80631 - [6 Regression] Compiling with -O3 -mavx2 gives wrong code

Summary: [6 Regression] Compiling with -O3 -mavx2 gives wrong code

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	6.3.1

Importance:	P2 normal
Target Milestone:	7.3
Assignee:	Jakub Jelinek

URL:
Keywords:	wrong-code

Depends on:
Blocks:

Reported:	2017-05-04 19:07 UTC by Elias Rudberg
Modified:	2018-10-26 11:48 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2017-05-04 00:00:00

Attachments
Preprocessed source generated by gcc -v -save-temps -O3 -mavx2 thecode.c (3.32 KB, text/plain) 2017-05-04 19:07 UTC, Elias Rudberg	Details
gcc8-pr80631.patch (3.62 KB, patch) 2017-12-11 16:09 UTC, Jakub Jelinek	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Elias Rudberg 2017-05-04 19:07:27 UTC

Created attachment 41319 [details]
Preprocessed source generated by gcc -v -save-temps -O3 -mavx2 thecode.c

I ran into a problem with strange results when compiling with -O3 -mavx2 and have been able to reduce it to the following small test code:
========================================
#include <stdio.h>
int main() {
  const int N = 8;
  int v[N];
  for(int k = 0; k < N; k++)
    v[k] = k;
  v[0] = 77;
  int found_index = -1;
  for(int k = 0; k < N; k++) {
    if(v[k] == 77)
      found_index = k;
  }
  printf("found_index = %d\n", found_index);
}
========================================

If compiled correctly, running this code should give "found_index = 0".

When compiling it like this:
gcc -O3 -mavx2 thecode.c

then running the resulting a.out executable gives:
$ ./a.out
found_index = -1

which is wrong.

The output of "gcc -v -save-temps -O3 -mavx2 thecode.c" looks as follows:
========================================
$ gcc -v -save-temps -O3 -mavx2 thecode.c
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/6.3.1/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --disable-libgcj --with-isl --enable-libmpx --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 6.3.1 20161221 (Red Hat 6.3.1-1) (GCC) 
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-mavx2' '-mtune=generic' '-march=x86-64'
 /usr/libexec/gcc/x86_64-redhat-linux/6.3.1/cc1 -E -quiet -v thecode.c -mavx2 -mtune=generic -march=x86-64 -O3 -fpch-preprocess -o thecode.i
ignoring nonexistent directory "/usr/lib/gcc/x86_64-redhat-linux/6.3.1/include-fixed"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-redhat-linux/6.3.1/../../../../x86_64-redhat-linux/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include
 /usr/local/include
 /usr/include
End of search list.
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-mavx2' '-mtune=generic' '-march=x86-64'
 /usr/libexec/gcc/x86_64-redhat-linux/6.3.1/cc1 -fpreprocessed thecode.i -quiet -dumpbase thecode.c -mavx2 -mtune=generic -march=x86-64 -auxbase thecode -O3 -version -o thecode.s
GNU C11 (GCC) version 6.3.1 20161221 (Red Hat 6.3.1-1) (x86_64-redhat-linux)
	compiled by GNU C version 6.3.1 20161221 (Red Hat 6.3.1-1), GMP version 6.1.1, MPFR version 3.1.5, MPC version 1.0.2, isl version 0.14 or 0.13
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU C11 (GCC) version 6.3.1 20161221 (Red Hat 6.3.1-1) (x86_64-redhat-linux)
	compiled by GNU C version 6.3.1 20161221 (Red Hat 6.3.1-1), GMP version 6.1.1, MPFR version 3.1.5, MPC version 1.0.2, isl version 0.14 or 0.13
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 67626b9d441eed376539391e660a9413
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-mavx2' '-mtune=generic' '-march=x86-64'
 as -v --64 -o thecode.o thecode.s
GNU assembler version 2.26.1 (x86_64-redhat-linux) using BFD version version 2.26.1-1.fc25
COMPILER_PATH=/usr/libexec/gcc/x86_64-redhat-linux/6.3.1/:/usr/libexec/gcc/x86_64-redhat-linux/6.3.1/:/usr/libexec/gcc/x86_64-redhat-linux/:/usr/lib/gcc/x86_64-redhat-linux/6.3.1/:/usr/lib/gcc/x86_64-redhat-linux/
LIBRARY_PATH=/usr/lib/gcc/x86_64-redhat-linux/6.3.1/:/usr/lib/gcc/x86_64-redhat-linux/6.3.1/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/usr/lib/gcc/x86_64-redhat-linux/6.3.1/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-mavx2' '-mtune=generic' '-march=x86-64'
 /usr/libexec/gcc/x86_64-redhat-linux/6.3.1/collect2 -plugin /usr/libexec/gcc/x86_64-redhat-linux/6.3.1/liblto_plugin.so -plugin-opt=/usr/libexec/gcc/x86_64-redhat-linux/6.3.1/lto-wrapper -plugin-opt=-fresolution=thecode.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --build-id --no-add-needed --eh-frame-hdr --hash-style=gnu -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib/gcc/x86_64-redhat-linux/6.3.1/../../../../lib64/crt1.o /usr/lib/gcc/x86_64-redhat-linux/6.3.1/../../../../lib64/crti.o /usr/lib/gcc/x86_64-redhat-linux/6.3.1/crtbegin.o -L/usr/lib/gcc/x86_64-redhat-linux/6.3.1 -L/usr/lib/gcc/x86_64-redhat-linux/6.3.1/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-redhat-linux/6.3.1/../../.. thecode.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-redhat-linux/6.3.1/crtend.o /usr/lib/gcc/x86_64-redhat-linux/6.3.1/../../../../lib64/crtn.o
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-mavx2' '-mtune=generic' '-march=x86-64'
========================================

I have tested this with a few different gcc versions:
gcc 4.8.3  --> OK
gcc 4.9.4  --> OK
gcc 5.3.0  --> OK
gcc 5.4.0  --> OK
gcc 6.1.0  --> WRONG
gcc 6.2.0  --> WRONG
gcc 6.3.1  --> WRONG
gcc 7.1.0  --> WRONG

I don't know what goes wrong but it seems somehow related to the beginning of the list v in the code; if I change v[0]=77 to e.g. v[3]=77 then that gives found_index=3 as it should, it is only v[0] that somehow is missed.

Comment 1 Jakub Jelinek 2017-05-04 20:09:00 UTC

Started with r230297.
Note, in C
  const int N = 8;
  int v[N];
is a variable length array, so unnecessarily pessimizing, you need to use
#define N 8
or
  enum { N = 8 };
or something similar instead for it to be a non-VLA.  In C++ it is not a VLA.
But fixing that doesn't help here.

Comment 2 Richard Biener 2017-07-04 08:47:23 UTC

GCC 6.4 is being released, adjusting target milestone.

Comment 3 Jakub Jelinek 2017-12-08 15:56:12 UTC

More complete testcase:

int v[8] = { 77, 1, 79, 3, 4, 5, 6, 7 };

__attribute__((noipa)) void
foo ()
{
  int k, r = -1;
  for (k = 0; k < 8; k++)
    if (v[k] == 77)
      r = k;
  if (r != 0)
    __builtin_abort ();
}

__attribute__((noipa)) void
bar ()
{
  int k, r = 4;
  for (k = 0; k < 8; k++)
    if (v[k] == 79)
      r = k;
  if (r != 2)
    __builtin_abort ();
}

int
main ()
{
  foo ();
  bar ();
  return 0;
}

The conditional reduction handling is buggy.
In foo we emit:
  vect_cst__21 = { 8, 8, 8, 8, 8, 8, 8, 8 };
  vect_cst__28 = { 77, 77, 77, 77, 77, 77, 77, 77 };
  vect_cst__30 = { -1, -1, -1, -1, -1, -1, -1, -1 };

  <bb 3> [local count: 119292720]:
...
  # vect_vec_iv_.0_22 = PHI <vect_vec_iv_.0_23(9), { 0, 1, 2, 3, 4, 5, 6, 7 }(2)>
  # vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
  # vectp_v.2_25 = PHI <vectp_v.2_26(9), &v(2)>
...
  vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21;
  vect__1.4_27 = MEM[(int *)vectp_v.2_25];
  vect_r_3.6_29 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28, vect_vec_iv_.0_22, vect_r_3.1_24>;
...
  <bb 18> [local count: 119292720]:
  # vect_r_3.6_31 = PHI <vect_r_3.6_29(3)>
  stmp_r_3.7_32 = REDUC_MAX (vect_r_3.6_31);
  stmp_r_3.7_33 = stmp_r_3.7_32 == 0 ? -1 : stmp_r_3.7_32;

vect_cst__30 which seems to be the initial value of the reduction var r as a vector is unused.
The problem is that by starting with zero vector for vect_r_3.1_24 there is no difference between a condition match on the first iteration and
no match at all, both result in REDUC_MAX of 0 and the emitted code assumes REDUC_MAX of 0 means no match.

In this case (if the first iteration iterator is constant and bigger than the minimum value of the type), just initializing by a vector containing any value smaller than the first iteration IV and adjusting that:
  stmp_r_3.7_33 = stmp_r_3.7_32 == 0 ? -1 : stmp_r_3.7_32;
to
  stmp_r_3.7_33 = stmp_r_3.7_32 == the_chosen_value ? -1 : stmp_r_3.7_32;
or specially in case when the reduction var is previously initialized to a value smaller than the minimum, we could build a vector of those values and avoid the COND_EXPR on the REDUC_MAX value.

Now, in case the first iteration iterator is constant, but is the minimum value, we can't use this trick.  Perhaps we could in that case just
bias it by one, say if the reduction is with unsigned type emit e.g.:
  # vect_vec_iv_.0_22 = PHI <vect_vec_iv_.0_23(9), { 1, 2, 3, 4, 5, 6, 7, 8 }(2)>
  # vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
  # vectp_v.2_25 = PHI <vectp_v.2_26(9), &v(2)>
...
  vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21;
  vect__1.4_27 = MEM[(int *)vectp_v.2_25];
  vect_r_3.6_29 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28, vect_vec_iv_.0_22, vect_r_3.1_24>;
...
  <bb 18> [local count: 119292720]:
  # vect_r_3.6_31 = PHI <vect_r_3.6_29(3)>
  stmp_r_3.7_32 = REDUC_MAX (vect_r_3.6_31);
  stmt_r_3.7_34 = stmp_r_3.7_32 - 1;
  stmp_r_3.7_33 = stmp_r_3.7_32 == 0 ? <original_r_value> : stmt_r_3.7_34;

For the non-constant IV first value we actually emit really weird code:
int v[8] = { 77, 1, 79, 3, 4, 5, 6, 7 };

__attribute__((noipa)) void
foo (int *v, int f)
{
  int k, r = -1;
  for (k = f; k < f + 8; k++)
    if (v[k] == 77)
      r = k;
  if (r != 0)
    __builtin_abort ();
}

__attribute__((noipa)) void
bar (int *v, int f)
{
  int k, r = 4;
  for (k = f; k < f + 8; k++)
    if (v[k] == 79)
      r = k;
  if (r != 2)
    __builtin_abort ();
}

int
main ()
{
  foo (v, 0);
  bar (v, 0);
  return 0;
}

where we emit 2 VEC_COND_EXPRs and 2 REDUC_MAX.  While that testcases passes, not really sure if it is correct generally, and furthermore,
it seems unnecessarily complicated to me.  Can't we just emit what we'd emit for unsigned conditional reduction with first iteration 1, and only after the vectorized loop adjust it.
So, say for the foo in the second case, emit:

  vect_cst__21 = { 8, 8, 8, 8, 8, 8, 8, 8 };
  vect_cst__28 = { 77, 77, 77, 77, 77, 77, 77, 77 };

  <bb 3> [local count: 119292720]:
...
  # vect_vec_iv_.0_22 = PHI <vect_vec_iv_.0_23(9), { 1, 2, 3, 4, 5, 6, 7, 8 }(2)>
  # vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
  # vectp_v.2_25 = PHI <vectp_v.2_26(9), &v(2)>
...
  vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21;
  vect__1.4_27 = MEM[(int *)vectp_v.2_25];
  vect_r_3.6_29 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28, vect_vec_iv_.0_22, vect_r_3.1_24>;
...
  <bb 18> [local count: 119292720]:
  # vect_r_3.6_31 = PHI <vect_r_3.6_29(3)>
  stmp_r_3.7_32 = REDUC_MAX (vect_r_3.6_31);
  stmt_r_3.7_34 = f_9(D) + (stmp_r_3.7_32 - 1) * step;
  stmp_r_3.7_33 = stmp_r_3.7_32 == 0 ? <r_value_before_loop> : stmp_r_3.7_34;
where _22, _24, _29 would be all in vectors of unsigned_type_for (r)?
Or for signed start with { min, min, ... } as condition never seen value, and { min+1, min+2, min+3, ... } vector as the initial _22 value?

Comment 4 rguenther@suse.de 2017-12-08 17:22:45 UTC

On December 8, 2017 4:56:12 PM GMT+01:00, "jakub at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80631
>
>Jakub Jelinek <jakub at gcc dot gnu.org> changed:
>
>           What    |Removed                     |Added
>----------------------------------------------------------------------------
>             CC|                            |rguenth at gcc dot gnu.org
>
>--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
>More complete testcase:
>
>int v[8] = { 77, 1, 79, 3, 4, 5, 6, 7 };
>
>__attribute__((noipa)) void
>foo ()
>{
>  int k, r = -1;
>  for (k = 0; k < 8; k++)
>    if (v[k] == 77)
>      r = k;
>  if (r != 0)
>    __builtin_abort ();
>}
>
>__attribute__((noipa)) void
>bar ()
>{
>  int k, r = 4;
>  for (k = 0; k < 8; k++)
>    if (v[k] == 79)
>      r = k;
>  if (r != 2)
>    __builtin_abort ();
>}
>
>int
>main ()
>{
>  foo ();
>  bar ();
>  return 0;
>}
>
>The conditional reduction handling is buggy.
>In foo we emit:
>  vect_cst__21 = { 8, 8, 8, 8, 8, 8, 8, 8 };
>  vect_cst__28 = { 77, 77, 77, 77, 77, 77, 77, 77 };
>  vect_cst__30 = { -1, -1, -1, -1, -1, -1, -1, -1 };
>
>  <bb 3> [local count: 119292720]:
>...
># vect_vec_iv_.0_22 = PHI <vect_vec_iv_.0_23(9), { 0, 1, 2, 3, 4, 5, 6,
>7
>}(2)>
># vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
>  # vectp_v.2_25 = PHI <vectp_v.2_26(9), &v(2)>
>...
>  vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21;
>  vect__1.4_27 = MEM[(int *)vectp_v.2_25];
>  vect_r_3.6_29 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28,
>vect_vec_iv_.0_22, vect_r_3.1_24>;
>...
>  <bb 18> [local count: 119292720]:
>  # vect_r_3.6_31 = PHI <vect_r_3.6_29(3)>
>  stmp_r_3.7_32 = REDUC_MAX (vect_r_3.6_31);
>  stmp_r_3.7_33 = stmp_r_3.7_32 == 0 ? -1 : stmp_r_3.7_32;
>
>vect_cst__30 which seems to be the initial value of the reduction var r
>as a
>vector is unused.
>The problem is that by starting with zero vector for vect_r_3.1_24
>there is no
>difference between a condition match on the first iteration and
>no match at all, both result in REDUC_MAX of 0 and the emitted code
>assumes
>REDUC_MAX of 0 means no match.
>
>In this case (if the first iteration iterator is constant and bigger
>than the
>minimum value of the type), just initializing by a vector containing
>any value
>smaller than the first iteration IV and adjusting that:
>  stmp_r_3.7_33 = stmp_r_3.7_32 == 0 ? -1 : stmp_r_3.7_32;
>to
>stmp_r_3.7_33 = stmp_r_3.7_32 == the_chosen_value ? -1 : stmp_r_3.7_32;
>or specially in case when the reduction var is previously initialized
>to a
>value smaller than the minimum, we could build a vector of those values
>and
>avoid the COND_EXPR on the REDUC_MAX value.
>
>Now, in case the first iteration iterator is constant, but is the
>minimum
>value, we can't use this trick.  Perhaps we could in that case just
>bias it by one, say if the reduction is with unsigned type emit e.g.:
># vect_vec_iv_.0_22 = PHI <vect_vec_iv_.0_23(9), { 1, 2, 3, 4, 5, 6, 7,
>8
>}(2)>
># vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
>  # vectp_v.2_25 = PHI <vectp_v.2_26(9), &v(2)>
>...
>  vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21;
>  vect__1.4_27 = MEM[(int *)vectp_v.2_25];
>  vect_r_3.6_29 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28,
>vect_vec_iv_.0_22, vect_r_3.1_24>;
>...
>  <bb 18> [local count: 119292720]:
>  # vect_r_3.6_31 = PHI <vect_r_3.6_29(3)>
>  stmp_r_3.7_32 = REDUC_MAX (vect_r_3.6_31);
>  stmt_r_3.7_34 = stmp_r_3.7_32 - 1;
>stmp_r_3.7_33 = stmp_r_3.7_32 == 0 ? <original_r_value> :
>stmt_r_3.7_34;
>
>For the non-constant IV first value we actually emit really weird code:
>int v[8] = { 77, 1, 79, 3, 4, 5, 6, 7 };
>
>__attribute__((noipa)) void
>foo (int *v, int f)
>{
>  int k, r = -1;
>  for (k = f; k < f + 8; k++)
>    if (v[k] == 77)
>      r = k;
>  if (r != 0)
>    __builtin_abort ();
>}
>
>__attribute__((noipa)) void
>bar (int *v, int f)
>{
>  int k, r = 4;
>  for (k = f; k < f + 8; k++)
>    if (v[k] == 79)
>      r = k;
>  if (r != 2)
>    __builtin_abort ();
>}
>
>int
>main ()
>{
>  foo (v, 0);
>  bar (v, 0);
>  return 0;
>}
>
>where we emit 2 VEC_COND_EXPRs and 2 REDUC_MAX.  While that testcases
>passes,
>not really sure if it is correct generally, and furthermore,
>it seems unnecessarily complicated to me.  Can't we just emit what we'd
>emit
>for unsigned conditional reduction with first iteration 1, and only
>after the
>vectorized loop adjust it.
>So, say for the foo in the second case, emit:
>
>  vect_cst__21 = { 8, 8, 8, 8, 8, 8, 8, 8 };
>  vect_cst__28 = { 77, 77, 77, 77, 77, 77, 77, 77 };
>
>  <bb 3> [local count: 119292720]:
>...
># vect_vec_iv_.0_22 = PHI <vect_vec_iv_.0_23(9), { 1, 2, 3, 4, 5, 6, 7,
>8
>}(2)>
># vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
>  # vectp_v.2_25 = PHI <vectp_v.2_26(9), &v(2)>
>...
>  vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21;
>  vect__1.4_27 = MEM[(int *)vectp_v.2_25];
>  vect_r_3.6_29 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28,
>vect_vec_iv_.0_22, vect_r_3.1_24>;
>...
>  <bb 18> [local count: 119292720]:
>  # vect_r_3.6_31 = PHI <vect_r_3.6_29(3)>
>  stmp_r_3.7_32 = REDUC_MAX (vect_r_3.6_31);
>  stmt_r_3.7_34 = f_9(D) + (stmp_r_3.7_32 - 1) * step;
>stmp_r_3.7_33 = stmp_r_3.7_32 == 0 ? <r_value_before_loop> :
>stmp_r_3.7_34;
>where _22, _24, _29 would be all in vectors of unsigned_type_for (r)?
>Or for signed start with { min, min, ... } as condition never seen
>value, and {
>min+1, min+2, min+3, ... } vector as the initial _22 value?

There's a dup for this (the existing vect.exp execute fail) and there is an approved patch for it.

Comment 5 Jakub Jelinek 2017-12-08 18:20:39 UTC

Related to PR81179 and http://gcc.gnu.org/ml/gcc-patches/2017-11/msg02054.html
As the patch doesn't apply cleanly, can't easily verify it.

Comment 6 Jakub Jelinek 2017-12-11 16:09:09 UTC

Created attachment 42840 [details]
gcc8-pr80631.patch

Untested fix.

Comment 7 Jakub Jelinek 2017-12-12 08:55:34 UTC

Author: jakub
Date: Tue Dec 12 08:55:02 2017
New Revision: 255574

URL: https://gcc.gnu.org/viewcvs?rev=255574&root=gcc&view=rev
Log:
	PR tree-optimization/80631
	* tree-vect-loop.c (get_initial_def_for_reduction): Fix comment typo.
	(vect_create_epilog_for_reduction): Add INDUC_VAL and INDUC_CODE
	arguments, for INTEGER_INDUC_COND_REDUCTION use INDUC_VAL instead of
	hardcoding zero as the value if COND_EXPR is never true.  For
	INTEGER_INDUC_COND_REDUCTION don't emit the final COND_EXPR if
	INDUC_VAL is equal to INITIAL_DEF, and use INDUC_CODE instead of
	hardcoding MAX_EXPR as the reduction operation.
	(is_nonwrapping_integer_induction): Allow negative step.
	(vectorizable_reduction): Compute INDUC_VAL and INDUC_CODE for
	vect_create_epilog_for_reduction, if no value is suitable, don't
	use INTEGER_INDUC_COND_REDUCTION for now.  Formatting fixes.

	* gcc.dg/vect/pr80631-1.c: New test.
	* gcc.dg/vect/pr80631-2.c: New test.
	* gcc.dg/vect/pr65947-13.c: Expect integer induc cond reduction
	vectorization.

Added:
    trunk/gcc/testsuite/gcc.dg/vect/pr80631-1.c
    trunk/gcc/testsuite/gcc.dg/vect/pr80631-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/vect/pr65947-13.c
    trunk/gcc/tree-vect-loop.c

Comment 8 Jakub Jelinek 2017-12-12 09:04:45 UTC

Fixed on the trunk so far.

Comment 9 Jakub Jelinek 2017-12-15 17:52:08 UTC

Author: jakub
Date: Fri Dec 15 17:51:36 2017
New Revision: 255701

URL: https://gcc.gnu.org/viewcvs?rev=255701&root=gcc&view=rev
Log:
	PR tree-optimization/80631
	* gcc.target/i386/avx2-pr80631.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/avx2-pr80631.c
Modified:
    trunk/gcc/testsuite/ChangeLog

Comment 10 Jakub Jelinek 2017-12-15 22:13:17 UTC

Author: jakub
Date: Fri Dec 15 22:12:46 2017
New Revision: 255726

URL: https://gcc.gnu.org/viewcvs?rev=255726&root=gcc&view=rev
Log:
	Backported from mainline
	2017-12-12  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/80631
	* tree-vect-loop.c (get_initial_def_for_reduction): Fix comment typo.
	(vect_create_epilog_for_reduction): Add INDUC_VAL argument, for
	INTEGER_INDUC_COND_REDUCTION use INDUC_VAL instead of
	hardcoding zero as the value if COND_EXPR is never true.  For
	INTEGER_INDUC_COND_REDUCTION don't emit the final COND_EXPR if
	INDUC_VAL is equal to INITIAL_DEF.
	(vectorizable_reduction): Compute INDUC_VAL for
	vect_create_epilog_for_reduction, if no value is suitable, don't
	use INTEGER_INDUC_COND_REDUCTION for now.  Formatting fixes.

	* gcc.dg/vect/pr80631-1.c: New test.
	* gcc.dg/vect/pr80631-2.c: New test.

	PR tree-optimization/80631
	* gcc.target/i386/avx2-pr80631.c: New test.

Added:
    branches/gcc-7-branch/gcc/testsuite/gcc.dg/vect/pr80631-1.c
    branches/gcc-7-branch/gcc/testsuite/gcc.dg/vect/pr80631-2.c
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/avx2-pr80631.c
Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/testsuite/ChangeLog
    branches/gcc-7-branch/gcc/tree-vect-loop.c

Comment 11 Jakub Jelinek 2017-12-16 08:57:10 UTC

Fixed for 7.3+ too.

Comment 12 Jakub Jelinek 2017-12-19 07:39:56 UTC

Author: jakub
Date: Tue Dec 19 07:39:24 2017
New Revision: 255804

URL: https://gcc.gnu.org/viewcvs?rev=255804&root=gcc&view=rev
Log:
	PR tree-optimization/80631
	* tree-vect-loop.c (vect_create_epilog_for_reduction): Compare
	induc_code against MAX_EXPR or MIN_EXPR instead of reduc_fn against
	IFN_REDUC_MAX or IFN_REDUC_MIN.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-vect-loop.c

Comment 13 Jakub Jelinek 2018-10-26 11:48:32 UTC

GCC 6 branch is being closed, fixed in 7.x.