Bug 92420 - [8 Regression] Vectorization miscompilation with negative strides since r238039
Summary: [8 Regression] Vectorization miscompilation with negative strides since r238039
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 10.0
: P2 normal
Target Milestone: 8.4
Assignee: Richard Sandiford
URL:
Keywords: wrong-code
Depends on:
Blocks: 93734
  Show dependency treegraph
 
Reported: 2019-11-08 09:36 UTC by Jakub Jelinek
Modified: 2020-02-25 10:06 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2019-11-08 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2019-11-08 09:36:01 UTC
The following testcase is miscompiled on x86_64-linux, e.g. with -O3 -mavx2 or -O3 -mssse3 since r238039:

#define N 16
struct C { int r, i; };
struct C a[N], b[N], c[N], d[N], e[N];

__attribute__((noipa)) static void
foo (struct C *__restrict x, struct C *__restrict y, struct C *__restrict z, int w)
{
  int i;
  for (int i = 0; i < w; i++)
    {
      z[i].r = x[i].r * y[-1 - i].r - x[i].i * y[-1 - i].i;
      z[i].i = x[i].i * y[-1 - i].r + x[i].r * y[-1 - i].i;
    }
}

__attribute__((noipa)) static void
bar (struct C *__restrict x, struct C *__restrict y, struct C *__restrict z, int w)
{
  int i;
  for (int i = 0; i < w; i++)
    {
      z[i].r = x[i].r * y[i].r - x[i].i * y[i].i;
      z[i].i = x[i].i * y[i].r + x[i].r * y[i].i;
    }
}

int
main ()
{
  int i;
  for (i = 0; i < N; ++i)
    {
      a[i].r = N - i; a[i].i = i - N;
      b[i].r = i - N; b[i].i = i + N;
      c[i].r = -1 - i; c[i].i = 2 * N - 1 - i;
    }
  foo (a, b + N, d, N);
  bar (a, c, e, N);
  for (i = 0; i < N; ++i)
    if (d[i].r != e[i].r || d[i].i != e[i].i)
      __builtin_abort ();
  return 0;
}

In bar which looks correct it is:
  vect__6.87_69 = MEM[base: y_21(D), index: ivtmp.133_9, offset: 0B];
  vect__6.88_70 = VEC_PERM_EXPR <vect__6.87_69, vect__6.87_69, { 0, 0, 2, 2, 4, 4, 6, 6 }>;
  vect__4.84_66 = MEM[base: x_20(D), index: ivtmp.133_9, offset: 0B];
  vect__4.93_75 = VEC_PERM_EXPR <vect__4.84_66, vect__4.84_66, { 1, 0, 3, 2, 5, 4, 7, 6 }>;
  vect__6.97_79 = VEC_PERM_EXPR <vect__6.87_69, vect__6.87_69, { 1, 1, 3, 3, 5, 5, 7, 7 }>;
  vect__7.89_71 = vect__4.84_66 * vect__6.88_70;
  vect__10.98_80 = vect__4.93_75 * vect__6.97_79;
  vect__12.99_81 = vect__7.89_71 - vect__10.98_80;
  vect__12.100_82 = vect__7.89_71 + vect__10.98_80;
  _83 = VEC_PERM_EXPR <vect__12.99_81, vect__12.100_82, { 0, 9, 2, 11, 4, 13, 6, 15 }>;
  MEM[base: z_22(D), index: ivtmp.133_9, offset: 0B] = _83;
foo has the y pointer iterating with -8 step rather than 8, so I'd expect the x related permutations
to stay and for y to start with y_21(D) - 32B and use { 7, 7, 5, 5, 3, 3, 1, 1 } and { 6, 6, 4, 4, 2, 2, 0, 0 }
permutations, but we actually emit instead:
  _34 = (void *) ivtmp.64_69;
  vect__9.16_80 = MEM[base: _34, offset: 0B];
  vect__9.17_81 = VEC_PERM_EXPR <vect__9.16_80, vect__9.16_80, { 0, 0, 2, 2, 4, 4, 6, 6 }>;
  vect__4.13_76 = MEM[base: x_23(D), index: ivtmp.62_71, offset: 0B];
  vect__4.22_86 = VEC_PERM_EXPR <vect__4.13_76, vect__4.13_76, { 1, 0, 3, 2, 5, 4, 7, 6 }>;
  vect__12.25_90 = MEM[base: _34, offset: 4B];
  vect__12.26_91 = VEC_PERM_EXPR <vect__12.25_90, vect__12.25_90, { 0, 0, 2, 2, 4, 4, 6, 6 }>;
  vect__10.18_82 = vect__4.13_76 * vect__9.17_81;
  vect__13.27_92 = vect__4.22_86 * vect__12.26_91;
  vect__15.28_93 = vect__10.18_82 - vect__13.27_92;
  vect__15.29_94 = vect__10.18_82 + vect__13.27_92;
  _95 = VEC_PERM_EXPR <vect__15.28_93, vect__15.29_94, { 0, 9, 2, 11, 4, 13, 6, 15 }>;
  MEM[base: z_25(D), index: ivtmp.62_71, offset: 0B] = _95;
where ivtmp.64_69 starts at y_21(D) - 8 (!) and with step -32.
Comment 1 Richard Sandiford 2019-11-08 11:15:38 UTC
Mine
Comment 2 Richard Sandiford 2019-11-11 19:44:24 UTC
Author: rsandifo
Date: Mon Nov 11 19:43:52 2019
New Revision: 278064

URL: https://gcc.gnu.org/viewcvs?rev=278064&root=gcc&view=rev
Log:
Fix SLP downward group access classification (PR92420)

This PR was caused by the SLP handling in get_group_load_store_type
returning VMAT_CONTIGUOUS rather than VMAT_CONTIGUOUS_REVERSE for
downward groups.

A more elaborate fix would be to try to combine the reverse permutation
into SLP_TREE_LOAD_PERMUTATION for loads, but that's really a follow-on
optimisation and not backport material.  It might also not necessarily
be a win, if the target supports (say) reversing and odd/even swaps
as independent permutes but doesn't recognise the combined form.

2019-11-11  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	PR tree-optimization/92420
	* tree-vect-stmts.c (get_negative_load_store_type): Move further
	up file.
	(get_group_load_store_type): Use it for reversed SLP accesses.

gcc/testsuite/
	* gcc.dg/vect/pr92420.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/vect/pr92420.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-stmts.c
Comment 3 Richard Biener 2019-11-14 07:56:58 UTC
The GCC 7 branch is being closed, re-targeting to GCC 8.4.
Comment 4 GCC Commits 2020-02-18 08:53:09 UTC
The releases/gcc-9 branch has been updated by Richard Sandiford <rsandifo@gcc.gnu.org>:

https://gcc.gnu.org/g:2d8ea3a0a6095a56b7c59c50b1068d602cde934a

commit r9-8248-g2d8ea3a0a6095a56b7c59c50b1068d602cde934a
Author: Richard Sandiford <richard.sandiford@arm.com>
Date:   Mon Nov 11 19:43:52 2019 +0000

    Fix SLP downward group access classification [PR92420]
    
    This PR was caused by the SLP handling in get_group_load_store_type
    returning VMAT_CONTIGUOUS rather than VMAT_CONTIGUOUS_REVERSE for
    downward groups.
    
    A more elaborate fix would be to try to combine the reverse permutation
    into SLP_TREE_LOAD_PERMUTATION for loads, but that's really a follow-on
    optimisation and not backport material.  It might also not necessarily
    be a win, if the target supports (say) reversing and odd/even swaps
    as independent permutes but doesn't recognise the combined form.
    
    2020-02-18  Richard Sandiford  <richard.sandiford@arm.com>
    
    gcc/
    	Backport from mainline
    	2019-11-11  Richard Sandiford  <richard.sandiford@arm.com>
    
    	PR tree-optimization/92420
    	* tree-vect-stmts.c (get_negative_load_store_type): Move further
    	up file.
    	(get_group_load_store_type): Use it for reversed SLP accesses.
    
    gcc/testsuite/
    	PR tree-optimization/92420
    	* gcc.dg/vect/pr92420.c: New test.
Comment 5 GCC Commits 2020-02-25 09:51:12 UTC
The releases/gcc-8 branch has been updated by Richard Sandiford <rsandifo@gcc.gnu.org>:

https://gcc.gnu.org/g:785eda9390473e42f0e0b7199c42032a0432de68

commit r8-10057-g785eda9390473e42f0e0b7199c42032a0432de68
Author: Richard Sandiford <richard.sandiford@arm.com>
Date:   Mon Nov 11 19:43:52 2019 +0000

    Fix SLP downward group access classification [PR92420]
    
    This PR was caused by the SLP handling in get_group_load_store_type
    returning VMAT_CONTIGUOUS rather than VMAT_CONTIGUOUS_REVERSE for
    downward groups.
    
    A more elaborate fix would be to try to combine the reverse permutation
    into SLP_TREE_LOAD_PERMUTATION for loads, but that's really a follow-on
    optimisation and not backport material.  It might also not necessarily
    be a win, if the target supports (say) reversing and odd/even swaps
    as independent permutes but doesn't recognise the combined form.
    
    2020-02-25  Richard Sandiford  <richard.sandiford@arm.com>
    
    gcc/
    	Backport from mainline
    	2019-11-11  Richard Sandiford  <richard.sandiford@arm.com>
    
    	PR tree-optimization/92420
    	* tree-vect-stmts.c (get_negative_load_store_type): Move further
    	up file.
    	(get_group_load_store_type): Use it for reversed SLP accesses.
    
    gcc/testsuite/
    	PR tree-optimization/92420
    	* gcc.dg/vect/pr92420.c: New test.
Comment 6 Richard Sandiford 2020-02-25 10:06:03 UTC
Fixed on trunk and branches.