Bug 111032 - using small types inside loops sometimes confuses the vectorizer
Summary: using small types inside loops sometimes confuses the vectorizer
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 14.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2023-08-15 23:17 UTC by Andrew Pinski
Modified: 2023-10-27 03:37 UTC (History)
1 user (show)

See Also:
Host:
Target: aarch64-linux-gnu x6_64-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Pinski 2023-08-15 23:17:05 UTC
Take:
```
void __attribute__ ((noipa))
f0 (int *__restrict r,
   int *__restrict a,
   int *__restrict pred)
{
  for (int i = 0; i < 1024; ++i)
  {
    unsigned short p = pred[i]?3:0;
    r[i] = p ;
  }
}

void __attribute__ ((noipa))
f1 (int *__restrict r,
   int *__restrict a,
   int *__restrict pred)
{
  for (int i = 0; i < 1024; ++i)
  {
    int p = pred[i]?1<<3:0;
    r[i] = p ;
  }
}
```

These 2 functions should produce the same code, selecting between 8 and 0 but instead in f0, we have a truncation and then an extension.

This happens on x86_64 at -O3 and aarch64 at -O3.

Though aarch64 with `-O3 -march=armv8.5-a+sve2` will be fixed with the patch to PR 111006 (which I will be submitting later today) because SVE uses conversions rather than VEC_PACK_TRUNC_EXPR/vec_unpack_hi_expr/vec_unpack_lo_expr here.
Comment 1 Andrew Pinski 2023-08-15 23:49:15 UTC
One way of fixing this is to optimize the following for the scalar side:
```
  _15 = _4 != 0;
  _16 = (short unsigned int) _15;
  _17 = _16 << 3;
  _6 = (int) _17;
```
into:
```
  _t = (int) _15;
  _6 = _t << 3;
```

Note this has the same issue too:
```
void __attribute__ ((noipa))
f0_1 (int *__restrict r,
      int *__restrict pred)
{
  for (int i = 0; i < 1024; ++i)
  {
    short p = pred[i]?-1:0;
    r[i] = p ;
  }
}
```