The following snippet generates wrong vectorized code with GCC 7.2 on x86_64 with -O3: void downscale_2 (const float* src, int src_n, float* dst) { int i; for (i = 0; i < src_n; i += 2) { const float* a = src; const float* b = src + 4; dst[0] = (a[0] + b[0]) / 2; dst[1] = (a[1] + b[1]) / 2; dst[2] = (a[2] + b[2]) / 2; dst[3] = (a[3] + b[3]) / 2; src += 2 * 4; dst += 4; } The assembly for the vectorized version of the loop is: .L5: addl $1, %ecx movups (%rdi,%rax), %xmm0 movups 16(%rdi,%rax,2), %xmm2 addps %xmm2, %xmm0 mulps %xmm1, %xmm0 movups %xmm0, (%rdx,%rax) addq $16, %rax cmpl %r8d, %ecx jb .L5 Notice the missing ,2 on the first movups. It can be tested with: #include <stdio.h> int main () { const float in[4 * 4] = { 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8 }; float out[2 * 4]; downscale_2 (in, 4, out); /* correct: 3, 4, 5, 6 */ printf ("%g, %g, %g, %g\n", out[0], out[1], out[2], out[3]); /* incorrect: 5, 6, 7, 8; should also be 3, 4, 5, 6 */ printf ("%g, %g, %g, %g\n", out[4], out[5], out[6], out[7]); } This doesn't seem to happen with 7.1 or 6.3. For a chuckle, this affects a recent build of GIMP, and has resulted in this beauty: https://bug787222.bugzilla-attachments.gnome.org/attachment.cgi?id=359042 :) $gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --enable-libmpx --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --disable-multilib --disable-werror --enable-checking=release --enable-default-pie --enable-default-ssp Thread model: posix gcc version 7.2.0 (GCC)
Confirmed, started with r250312.
Bah. Mine.
Author: rguenth Date: Wed Sep 6 12:31:52 2017 New Revision: 251790 URL: https://gcc.gnu.org/viewcvs?rev=251790&root=gcc&view=rev Log: 2017-09-06 Richard Biener <rguenther@suse.de> PR tree-optimization/82108 * tree-vect-stmts.c (vectorizable_load): Fix pointer adjustment for gap in the non-permutation SLP case. * gcc.dg/vect/pr82108.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/vect/pr82108.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-stmts.c
Fixed on trunk sofar, thanks for the report.
Fixed.
Author: rguenth Date: Mon Sep 18 10:13:54 2017 New Revision: 252918 URL: https://gcc.gnu.org/viewcvs?rev=252918&root=gcc&view=rev Log: 2017-09-18 Richard Biener <rguenther@suse.de> Backport from mainline 2017-09-04 Richard Biener <rguenther@suse.de> PR tree-optimization/82084 * fold-const.h (can_native_encode_string_p): Declare. * fold-const.c (can_native_encode_string_p): Factor out from ... (native_encode_string): ... here. * tree-vect-stmts.c (vectorizable_store): Call it to avoid vectorizing stores from constants we later cannot handle. * g++.dg/torture/pr82084.C: New testcase. 2017-09-06 Richard Biener <rguenther@suse.de> PR tree-optimization/82108 * tree-vect-stmts.c (vectorizable_load): Fix pointer adjustment for gap in the non-permutation SLP case. * gcc.dg/vect/pr82108.c: New testcase. Added: branches/gcc-7-branch/gcc/testsuite/g++.dg/torture/pr82084.C branches/gcc-7-branch/gcc/testsuite/gcc.dg/vect/pr82108.c Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/fold-const.c branches/gcc-7-branch/gcc/fold-const.h branches/gcc-7-branch/gcc/testsuite/ChangeLog branches/gcc-7-branch/gcc/tree-vect-stmts.c
Author: rguenth Date: Thu Jun 21 11:18:50 2018 New Revision: 261842 URL: https://gcc.gnu.org/viewcvs?rev=261842&root=gcc&view=rev Log: 2018-06-21 Richard Biener <rguenther@suse.de> Backport from mainline 2017-09-06 Richard Biener <rguenther@suse.de> PR tree-optimization/82108 * tree-vect-stmts.c (vectorizable_load): Fix pointer adjustment for gap in the non-permutation SLP case. * gcc.dg/vect/pr82108.c: New testcase. 2017-06-18 Richard Biener <rguenther@suse.de> PR tree-optimization/81410 * tree-vect-stmts.c (vectorizable_load): Properly adjust for the gap in the ! slp_perm SLP case after each group. * gcc.dg/vect/pr81410.c: New testcase. 2017-03-08 Richard Biener <rguenther@suse.de> PR tree-optimization/79920 * tree-vect-slp.c (vect_create_mask_and_perm): Remove and inline with ncopies == 1 to ... (vect_transform_slp_perm_load): ... here. Properly compute all element loads by iterating VF times over the group. Do not handle ncopies (computed in a broken way) in vect_create_mask_and_perm. * gcc.dg/vect/pr79920.c: New testcase. Added: branches/gcc-6-branch/gcc/testsuite/gcc.dg/vect/pr79920.c branches/gcc-6-branch/gcc/testsuite/gcc.dg/vect/pr81410.c branches/gcc-6-branch/gcc/testsuite/gcc.dg/vect/pr82108.c Modified: branches/gcc-6-branch/gcc/ChangeLog branches/gcc-6-branch/gcc/testsuite/ChangeLog branches/gcc-6-branch/gcc/tree-vect-slp.c branches/gcc-6-branch/gcc/tree-vect-stmts.c
*** Bug 86632 has been marked as a duplicate of this bug. ***