Bug 112742 - missed vectorization with src[a*b+i] where a*b is not int rather than the same precision as size_type
Summary: missed vectorization with src[a*b+i] where a*b is not int rather than the sam...
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 14.0
: P3 enhancement
Target Milestone: ---
Assignee: Andrew Pinski
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2023-11-28 07:43 UTC by Andrew Pinski
Modified: 2023-12-15 05:58 UTC (History)
1 user (show)

See Also:
Host:
Target: x86_64-*-* aarch64-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2023-11-28 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Pinski 2023-11-28 07:43:16 UTC
Take:
```
void f(unsigned char * __restrict src, 
      unsigned stride, int h, int row)
{
  unsigned char *src1 = src+row*stride;
  for(int col = 0; col < h; col++)
    {
        src1[col]=src1[col] + 1;
    }
}

void f1(unsigned char * __restrict src, unsigned rs,
      unsigned cs, unsigned stride, int h, int row)
{
  for(int col = 0; col < h; col++)
    {
        src[row*stride+col]=src[row*stride+col] + 1;
    }
}
```

These 2 function should be vectorized. But f1 does not. If we use -m32 (or -mabi=ilp32 on aarch64) f1 does now get vectorized.
Note LLVM is able to vectorize f1 for both aarch64 and x86_64.
Comment 1 Richard Biener 2023-11-28 11:56:28 UTC
The issue is that row*stride + col doesn't evolve with an affine function as far as SCEV is concerned.

t.c:6:32: missed:  failed: evolution of base is not affine.

The "issue" is that we are dealing with pointer arithmetic here and the
row*stride + col can overflow but SCEV doesn't have "assumptions" we
could verify at runtime.

Use a signed 'stride', otherwise all arithmetic is promoted unsigned.

Duplicate of other similar bugs.
Comment 2 Richard Biener 2023-11-28 12:02:22 UTC
Ah, row*stride is loop invariant and that saves us with -m32.  With -m64
we end up with

  <bb 3> [local count: 105119324]:
  row.0_1 = (unsigned int) row_13(D);
  _2 = row.0_1 * stride_14(D);

  <bb 4> [local count: 955630224]:
  # col_20 = PHI <col_17(6), 0(3)>
  col.1_3 = (unsigned int) col_20;
  _4 = _2 + col.1_3;
  _5 = (sizetype) _4;
  _6 = src_15(D) + _5;
  _7 = *_6;

which is "unfortuante" association of the (sizetype) conversion.  But
as the col addtition is unsigned it might overflow and we can't
associate the (sizetype) conversion but it makes the result non-affine.

A runtime versioning would be necessary, guaranteeing that _2 + col.1_3
never overflows.
Comment 3 Andrew Pinski 2023-12-15 05:58:20 UTC
I am going to take a stab at implementing this ...