Bug 82108 - [7 Regression] Wrong vectorized code generated for x86_64
Summary: [7 Regression] Wrong vectorized code generated for x86_64
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 7.2.0
: P2 normal
Target Milestone: 7.3
Assignee: Richard Biener
URL:
Keywords: wrong-code
: 86632 (view as bug list)
Depends on:
Blocks: 81410
  Show dependency treegraph
 
Reported: 2017-09-05 16:32 UTC by Ell
Modified: 2018-07-23 09:12 UTC (History)
1 user (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work: 6.4.0, 7.1.0, 7.2.1, 8.0
Known to fail: 7.2.0
Last reconfirmed: 2017-09-05 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ell 2017-09-05 16:32:47 UTC
The following snippet generates wrong vectorized code with GCC 7.2 on x86_64 with -O3:

  void downscale_2 (const float* src, int src_n, float* dst) {
    int i;

    for (i = 0; i < src_n; i += 2) {
      const float* a = src;
      const float* b = src + 4;

      dst[0] = (a[0] + b[0]) / 2;
      dst[1] = (a[1] + b[1]) / 2;
      dst[2] = (a[2] + b[2]) / 2;
      dst[3] = (a[3] + b[3]) / 2;

      src += 2 * 4;
      dst +=     4;
    }

The assembly for the vectorized version of the loop is:

  .L5:
          addl    $1, %ecx
          movups  (%rdi,%rax), %xmm0
          movups  16(%rdi,%rax,2), %xmm2
          addps   %xmm2, %xmm0
          mulps   %xmm1, %xmm0
          movups  %xmm0, (%rdx,%rax)
          addq    $16, %rax
          cmpl    %r8d, %ecx
          jb      .L5

Notice the missing ,2 on the first movups.

It can be tested with:

  #include <stdio.h>

  int main () {
    const float in[4 * 4] = {
      1, 2, 3, 4,
      5, 6, 7, 8,

      1, 2, 3, 4,
      5, 6, 7, 8
    };
    float out[2 * 4];

    downscale_2 (in, 4, out);

    /* correct: 3, 4, 5, 6 */
    printf ("%g, %g, %g, %g\n", out[0], out[1], out[2], out[3]);

    /* incorrect: 5, 6, 7, 8; should also be 3, 4, 5, 6 */
    printf ("%g, %g, %g, %g\n", out[4], out[5], out[6], out[7]);
  }

This doesn't seem to happen with 7.1 or 6.3.

For a chuckle, this affects a recent build of GIMP, and has resulted in this beauty: https://bug787222.bugzilla-attachments.gnome.org/attachment.cgi?id=359042 :)

$gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --enable-libmpx --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --disable-multilib --disable-werror --enable-checking=release --enable-default-pie --enable-default-ssp
Thread model: posix
gcc version 7.2.0 (GCC)
Comment 1 Martin Liška 2017-09-05 18:02:29 UTC
Confirmed, started with r250312.
Comment 2 Richard Biener 2017-09-06 07:26:10 UTC
Bah.  Mine.
Comment 3 Richard Biener 2017-09-06 12:32:23 UTC
Author: rguenth
Date: Wed Sep  6 12:31:52 2017
New Revision: 251790

URL: https://gcc.gnu.org/viewcvs?rev=251790&root=gcc&view=rev
Log:
2017-09-06  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/82108
	* tree-vect-stmts.c (vectorizable_load): Fix pointer adjustment
	for gap in the non-permutation SLP case.

	* gcc.dg/vect/pr82108.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/vect/pr82108.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-stmts.c
Comment 4 Richard Biener 2017-09-06 14:05:53 UTC
Fixed on trunk sofar, thanks for the report.
Comment 5 Richard Biener 2017-09-18 10:14:24 UTC
Fixed.
Comment 6 Richard Biener 2017-09-18 10:14:26 UTC
Author: rguenth
Date: Mon Sep 18 10:13:54 2017
New Revision: 252918

URL: https://gcc.gnu.org/viewcvs?rev=252918&root=gcc&view=rev
Log:
2017-09-18  Richard Biener  <rguenther@suse.de>

	Backport from mainline
	2017-09-04  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/82084
	* fold-const.h (can_native_encode_string_p): Declare.
	* fold-const.c (can_native_encode_string_p): Factor out from ...
	(native_encode_string): ... here.
	* tree-vect-stmts.c (vectorizable_store): Call it to avoid
	vectorizing stores from constants we later cannot handle.

	* g++.dg/torture/pr82084.C: New testcase.

	2017-09-06  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/82108
	* tree-vect-stmts.c (vectorizable_load): Fix pointer adjustment
	for gap in the non-permutation SLP case.

	* gcc.dg/vect/pr82108.c: New testcase.

Added:
    branches/gcc-7-branch/gcc/testsuite/g++.dg/torture/pr82084.C
    branches/gcc-7-branch/gcc/testsuite/gcc.dg/vect/pr82108.c
Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/fold-const.c
    branches/gcc-7-branch/gcc/fold-const.h
    branches/gcc-7-branch/gcc/testsuite/ChangeLog
    branches/gcc-7-branch/gcc/tree-vect-stmts.c
Comment 7 Richard Biener 2018-06-21 11:19:22 UTC
Author: rguenth
Date: Thu Jun 21 11:18:50 2018
New Revision: 261842

URL: https://gcc.gnu.org/viewcvs?rev=261842&root=gcc&view=rev
Log:
2018-06-21  Richard Biener  <rguenther@suse.de>
 
	Backport from mainline
	2017-09-06  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/82108
	* tree-vect-stmts.c (vectorizable_load): Fix pointer adjustment
	for gap in the non-permutation SLP case.

	* gcc.dg/vect/pr82108.c: New testcase.

	2017-06-18  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/81410
	* tree-vect-stmts.c (vectorizable_load): Properly adjust for
	the gap in the ! slp_perm SLP case after each group.

	* gcc.dg/vect/pr81410.c: New testcase.

	2017-03-08  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/79920
	* tree-vect-slp.c (vect_create_mask_and_perm): Remove and inline
	with ncopies == 1 to ...
	(vect_transform_slp_perm_load): ... here.  Properly compute
	all element loads by iterating VF times over the group.  Do
	not handle ncopies (computed in a broken way) in
	vect_create_mask_and_perm.

	* gcc.dg/vect/pr79920.c: New testcase.

Added:
    branches/gcc-6-branch/gcc/testsuite/gcc.dg/vect/pr79920.c
    branches/gcc-6-branch/gcc/testsuite/gcc.dg/vect/pr81410.c
    branches/gcc-6-branch/gcc/testsuite/gcc.dg/vect/pr82108.c
Modified:
    branches/gcc-6-branch/gcc/ChangeLog
    branches/gcc-6-branch/gcc/testsuite/ChangeLog
    branches/gcc-6-branch/gcc/tree-vect-slp.c
    branches/gcc-6-branch/gcc/tree-vect-stmts.c
Comment 8 Richard Biener 2018-07-23 09:12:39 UTC
*** Bug 86632 has been marked as a duplicate of this bug. ***