Bug 56787 - [4.8 Regression] Vectorization fails because of CLOBBER statements
Summary: [4.8 Regression] Vectorization fails because of CLOBBER statements
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: 4.9.3
Assignee: Richard Biener
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2013-03-30 14:53 UTC by Freddie Witherden
Modified: 2015-06-23 09:09 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.9.0
Known to fail: 4.8.0, 4.8.3, 4.8.5
Last reconfirmed: 2013-04-02 00:00:00


Attachments
Test case (1.73 KB, text/x-csrc)
2013-03-30 14:53 UTC, Freddie Witherden
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Freddie Witherden 2013-03-30 14:53:58 UTC
Created attachment 29754 [details]
Test case

When compiling the attached file with GCC 4.8.0 on an AVX capable system the main loop isn't vectorized.  This is a regression compared with 4.7.2 on the same system (where the loop is fully vectorized).

I apologize for the length of the test case -- smaller examples do not reproduce the behaviour in question. 

Output of -save-temps:
gcc-mp-4.8 -v -save-temps -Ofast -march=native -std=c99 -S test.c 
Using built-in specs.
COLLECT_GCC=gcc-mp-4.8
Target: x86_64-apple-darwin12
Configured with: ../gcc-4.8-20130321/configure --prefix=/opt/local --build=x86_64-apple-darwin12 --enable-languages=c,c++,objc,obj-c++,fortran,java --libdir=/opt/local/lib/gcc48 --includedir=/opt/local/include/gcc48 --infodir=/opt/local/share/info --mandir=/opt/local/share/man --datarootdir=/opt/local/share/gcc-4.8 --with-local-prefix=/opt/local --with-system-zlib --disable-nls --program-suffix=-mp-4.8 --with-gxx-include-dir=/opt/local/include/gcc48/c++/ --with-gmp=/opt/local --with-mpfr=/opt/local --with-mpc=/opt/local --with-ppl=/opt/local --with-cloog=/opt/local --enable-cloog-backend=isl --disable-cloog-version-check --enable-stage1-checking --disable-multilib --enable-lto --enable-libstdcxx-time --with-as=/opt/local/bin/as --with-ld=/opt/local/bin/ld --with-ar=/opt/local/bin/ar --with-bugurl=https://trac.macports.org/newticket --with-pkgversion='MacPorts gcc48 4.8-20130321_0'
Thread model: posix
gcc version 4.8.0 20130321 (prerelease) (MacPorts gcc48 4.8-20130321_0) 
COLLECT_GCC_OPTIONS='-mmacosx-version-min=10.8.3' '-v' '-save-temps' '-Ofast' '-march=native' '-std=c99' '-S'
 /opt/local/libexec/gcc/x86_64-apple-darwin12/4.8.0/cc1 -E -quiet -v -D__DYNAMIC__ test.c -march=corei7-avx -mcx16 -msahf -mno-movbe -maes -mpclmul -mpopcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mavx -mno-avx2 -msse4.2 -msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd -mno-f16c -mno-fsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=3072 -mtune=corei7-avx -fPIC -mmacosx-version-min=10.8.3 -std=c99 -Ofast -fpch-preprocess -o test.i
ignoring nonexistent directory "/opt/local/lib/gcc48/gcc/x86_64-apple-darwin12/4.8.0/../../../../../x86_64-apple-darwin12/include"
#include "..." search starts here:
#include <...> search starts here:
 /opt/local/lib/gcc48/gcc/x86_64-apple-darwin12/4.8.0/include
 /opt/local/include
 /opt/local/lib/gcc48/gcc/x86_64-apple-darwin12/4.8.0/include-fixed
 /usr/include
 /System/Library/Frameworks
 /Library/Frameworks
End of search list.
COLLECT_GCC_OPTIONS='-mmacosx-version-min=10.8.3' '-v' '-save-temps' '-Ofast' '-march=native' '-std=c99' '-S'
 /opt/local/libexec/gcc/x86_64-apple-darwin12/4.8.0/cc1 -fpreprocessed test.i -march=corei7-avx -mcx16 -msahf -mno-movbe -maes -mpclmul -mpopcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mavx -mno-avx2 -msse4.2 -msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd -mno-f16c -mno-fsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=3072 -mtune=corei7-avx -fPIC -quiet -dumpbase test.c -mmacosx-version-min=10.8.3 -auxbase test -Ofast -std=c99 -version -o test.s
GNU C (MacPorts gcc48 4.8-20130321_0) version 4.8.0 20130321 (prerelease) (x86_64-apple-darwin12)
        compiled by GNU C version 4.8.0 20130321 (prerelease), GMP version 5.0.5, MPFR version 3.1.1-p2, MPC version 1.0.1
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU C (MacPorts gcc48 4.8-20130321_0) version 4.8.0 20130321 (prerelease) (x86_64-apple-darwin12)
        compiled by GNU C version 4.8.0 20130321 (prerelease), GMP version 5.0.5, MPFR version 3.1.1-p2, MPC version 1.0.1
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 6291d2010395c7dee8043d72914d31cb
COMPILER_PATH=/opt/local/libexec/gcc/x86_64-apple-darwin12/4.8.0/:/opt/local/libexec/gcc/x86_64-apple-darwin12/4.8.0/:/opt/local/libexec/gcc/x86_64-apple-darwin12/:/opt/local/lib/gcc48/gcc/x86_64-apple-darwin12/4.8.0/:/opt/local/lib/gcc48/gcc/x86_64-apple-darwin12/
LIBRARY_PATH=/opt/local/lib/gcc48/gcc/x86_64-apple-darwin12/4.8.0/:/opt/local/lib/gcc48/gcc/x86_64-apple-darwin12/4.8.0/../../../:/usr/lib/
COLLECT_GCC_OPTIONS='-mmacosx-version-min=10.8.3' '-v' '-save-temps' '-Ofast' '-march=native' '-std=c99' '-S'
Comment 1 Richard Biener 2013-04-02 12:02:39 UTC
t.c:137: note: not vectorized: no vectype for stmt: u ={v} {CLOBBER};
 scalar_type: float[5]
t.c:137: note: bad data references.
t.c:137: note: ***** Re-trying analysis with vector size 16
Comment 2 Jakub Jelinek 2013-04-02 14:50:18 UTC
Somewhat reduced testcase:
inline void
bar (const float s[5], float z[3][5])
{
  float a = s[0], b = s[1], c = s[2], d = s[3], e = s[4];
  float f = 1.0f / a;
  float u = f * b, v = f * c, w = f * d;
  float p = 0.4f * (e - 0.5f * (b * u + c * v + d * w));
  z[0][3] = b * w;
  z[1][3] = c * w;
  z[2][3] = d * w + p;
}

void
foo (unsigned long n, const float *__restrict u0,
     const float *__restrict u1, const float *__restrict u2,
     const float *__restrict u3, const float *__restrict u4,
     const float *__restrict s0, const float *__restrict s1,
     const float *__restrict s2, float *__restrict t3,
     float *__restrict t4)
{
  unsigned long i;
  for (i = 0; i < n; i++)
    {
      float u[5], f[3][5];
      u[0] = u0[i]; u[1] = u1[i]; u[2] = u2[i]; u[3] = u3[i]; u[4] = u4[i];
      bar (u, f);
      t3[i] = s0[i] * f[0][3] + s1[i] * f[1][3] + s2[i] * f[2][3];
    }
}
Comment 3 Richard Biener 2013-05-28 11:52:13 UTC
The clobbers are dead and useless btw, but we only remove clobbers from
within remove_unused_locals which doesn't run inbetween after IPA inlining
and right before RTL expansion (rightfully so).

Vectorizing without removing the clobbers requires us to honor them at least
for placement of aliasing vectorized stores / loads and also IV adjustments
in case the clobber is a MEM of an SSA name and that is loop variant
(now possible, but not on the 4.8 branch).

So the simplest solution is to discard all clobbers inside the vectorized
loop body.
Comment 4 Richard Biener 2013-05-28 13:37:15 UTC
Author: rguenth
Date: Tue May 28 13:36:25 2013
New Revision: 199380

URL: http://gcc.gnu.org/viewcvs?rev=199380&root=gcc&view=rev
Log:
2013-05-28  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/56787
	* tree-vect-data-refs.c (vect_analyze_data_refs): Drop clobbers
	from the list of data references.
	* tree-vect-loop.c (vect_determine_vectorization_factor): Skip
	clobbers.
	(vect_analyze_loop_operations): Likewise.
	(vect_transform_loop): Remove clobbers.

	* gcc.dg/vect/pr56787.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/vect/pr56787.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-data-refs.c
    trunk/gcc/tree-vect-loop.c
Comment 5 Pat Haugen 2013-06-12 15:14:13 UTC
The new testcase fails on powerpc64-linux, as can be seen in http://gcc.gnu.org/ml/gcc-testresults/2013-06/msg00904.html.
Comment 6 David Edelsohn 2013-07-23 17:39:13 UTC
The patch fixed x86_64 but the new testcase fails on PPC64.
Comment 7 Jakub Jelinek 2013-10-16 09:48:17 UTC
GCC 4.8.2 has been released.
Comment 8 ktkachov 2013-12-04 11:14:16 UTC
Also fails on arm-* btw.
Comment 9 Richard Biener 2013-12-05 09:20:52 UTC
Author: rguenth
Date: Thu Dec  5 09:20:51 2013
New Revision: 205696

URL: http://gcc.gnu.org/viewcvs?rev=205696&root=gcc&view=rev
Log:
2013-12-05  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/56787
	* gcc.dg/vect/pr56787.c: Adjust to not require vector float
	division.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/vect/pr56787.c
Comment 10 Richard Biener 2013-12-05 09:21:47 UTC
Maybe it works now.
Comment 11 ktkachov 2013-12-05 09:30:45 UTC
(In reply to Richard Biener from comment #10)
> Maybe it works now.

PASSes on arm* now, thanks.
Comment 12 Pat Haugen 2013-12-05 18:53:50 UTC
Working on PowerPC also.
Comment 13 Richard Biener 2013-12-09 10:21:21 UTC
The patch cannot be backported easily, not going to fix it for 4.8.
Comment 14 Jakub Jelinek 2014-04-22 11:35:20 UTC
GCC 4.9.0 has been released
Comment 15 Jakub Jelinek 2014-07-16 13:26:34 UTC
GCC 4.9.1 has been released.
Comment 16 Jakub Jelinek 2014-10-30 10:36:33 UTC
GCC 4.9.2 has been released.
Comment 17 Richard Biener 2015-06-23 09:09:14 UTC
Fixed for 4.9.3.