Bug 116766 - Missed loop vectorization case due to even IV division by two
Summary: Missed loop vectorization case due to even IV division by two
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 15.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer VRP
  Show dependency treegraph
 
Reported: 2024-09-18 15:18 UTC by Dusan Stojkovic
Modified: 2024-12-28 23:35 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-09-19 00:00:00


Attachments
A dump file of vect pass with details enabled (3.26 KB, text/plain)
2024-09-18 15:18 UTC, Dusan Stojkovic
Details
hack (559 bytes, patch)
2024-09-19 07:31 UTC, Richard Biener
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dusan Stojkovic 2024-09-18 15:18:47 UTC
Created attachment 59138 [details]
A dump file of vect pass with details enabled

Hello this report is about a loop which GCC doesn't vectorize. 
This is the loop in question: 

void
foo(int *in, int *out1, int *out2, int n)
{
    for (int i = 0; i < n; i+=2)
    {
        out1[i/2] = in[i];
        out2[i/2] = in[i+1];
    }
}

GCC isn't able to vectorize this example:
ARM: https://godbolt.org/z/Eq7vs7ssx
RISC-V: https://godbolt.org/z/Y1sr4cr8K

Clang vectorizes this loop for ARM as well as RISC-V:
ARM: https://godbolt.org/z/YvaMqTz1h
RISC-V: https://godbolt.org/z/x1MxGc7Gx

In the vect optimization pass there are two misses regarding array indexing:
analyze_innermost: t.c:6:19: missed:  failed: evolution of base is not affine.
analyze_innermost: t.c:7:19: missed:  failed: evolution of base is not affine.

There are also these misses:
t.c:6:23: missed:   possible alias involving gather/scatter between *_3 and *_7
t.c:4:23: missed:  bad data dependence.
t.c:6:23: missed:   possible alias involving gather/scatter between *_3 and *_7
t.c:4:23: missed:  bad data dependence.
t.c:6:23: missed:   possible alias involving gather/scatter between *_3 and *_7
t.c:4:23: missed:  bad data dependence.
t.c:6:23: missed:   possible alias involving gather/scatter between *_3 and *_7
t.c:4:23: missed:  bad data dependence.
t.c:6:23: missed:   possible alias involving gather/scatter between *_3 and *_7
t.c:4:23: missed:  bad data dependence.
t.c:6:23: missed:   possible alias involving gather/scatter between *_3 and *_7
t.c:4:23: missed:  bad data dependence.
t.c:6:23: missed:   possible alias involving gather/scatter between *_3 and *_7
t.c:4:23: missed:  bad data dependence.
t.c:6:23: missed:   possible alias involving gather/scatter between *_3 and *_7
t.c:4:23: missed:  bad data dependence.
t.c:6:23: missed:   possible alias involving gather/scatter between *_3 and *_7
t.c:4:23: missed:  bad data dependence.

t.c:4:23: missed: couldn't vectorize loop

void foo (int * in, int * out1, int * out2, int n)
{
  int i;
  long unsigned int _1;
  long unsigned int _2;
  int * _3;
  int _4;
  long unsigned int _5;
  long unsigned int _6;
  int * _7;
  int _8;
  sizetype _9;
  sizetype _10;
  int * _11;
  int * _12;
  int _13;

  <bb 2> [local count: 118111600]:
  if (n_17(D) > 0)
    goto <bb 5>; [89.00%]
  else
    goto <bb 4>; [11.00%]

  <bb 5> [local count: 105119324]:

  <bb 3> [local count: 955630224]:
  # i_26 = PHI <i_23(6), 0(5)>
  _1 = (long unsigned int) i_26;
  _2 = _1 * 4;
  _3 = in_18(D) + _2;
  _4 = i_26 >> 1;
  _5 = (long unsigned int) _4;
  _6 = _5 * 4;
  _7 = out1_19(D) + _6;
  _8 = *_3;
  *_7 = _8;
  _9 = _1 + 1;
  _10 = _9 * 4;
  _11 = in_18(D) + _10;
  _12 = out2_21(D) + _6;
  _13 = *_11;
  *_12 = _13;
  i_23 = i_26 + 2;
  if (n_17(D) > i_23)
    goto <bb 6>; [89.00%]
  else
    goto <bb 7>; [11.00%]

  <bb 6> [local count: 850510900]:
  goto <bb 3>; [100.00%]

  <bb 7> [local count: 105119324]:

  <bb 4> [local count: 118111600]:
  return;

}
Comment 1 Andrew Pinski 2024-09-18 22:45:12 UTC
I thought I had saw this one before ...
Comment 2 Richard Biener 2024-09-19 07:12:22 UTC
This isn't about gather/scatter but SCEV analysis failing to handle i/2
(i >> 1 in the IL) as IV.  That's only affine because i is even so it's
an exact division.  We might want to "fold" i / 2 to i /[ex] 2 somewhere
in VRP to help, but possibly somehow SCEV could "optimistically" analyze
the shift and resolve it after it finds the initial value and increment.

Bottom line - rewrite your code to

    for (int i = 0; i < n/2; i++)
    {
        out1[i] = in[2*i];
        out2[i] = in[2*i+1];
    }
Comment 3 Richard Biener 2024-09-19 07:31:38 UTC
Created attachment 59144 [details]
hack

This works for the special case of >> 1;  something more general and nicer written could work.  CHREC_LEFT might not need to be a multiple I think and
thus not constant.  EXACT_DIV_EXPR could be allowed directly in a CHREC_RIGHT,
even if we can't constant-fold it, so I think working towards that would be
better and instead handle EXACT_DIV_EXPR in SCEV.