Bug 80015 - auto vectorization leaves scalar epilogue even if it is unreachable due to dominating n % 4 check
Summary: auto vectorization leaves scalar epilogue even if it is unreachable due to do...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 7.0.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2017-03-12 16:55 UTC by Ivan Sorokin
Modified: 2023-10-08 01:21 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2017-03-13 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ivan Sorokin 2017-03-12 16:55:07 UTC
Consider these two versions of dot_product:

#include <cstdlib>

float dot_product(float const* a,
                  float const* b,
                  size_t n)
{
    a = (float const*)__builtin_assume_aligned(a, 16);
    b = (float const*)__builtin_assume_aligned(b, 16);
  
    if ((n % 4) != 0)
       return 0.;                    // (1)
//       __builtin_unreachable();    // (2)

    float result = 0.f;
  
  	for (size_t i = 0; i != n; ++i)
      result += a[i] * b[i];
  
    return result;
}

The code should be compiled with flags -O3 -ffast-math.

In case of (1) the return 0. is performed when n is not a multiple of 4, in (2) __builtin_unreachable() is invoked. The code (2) with __builtin_unreachable() is optimized to the point where only packed operations are used. In the code (1) with return the scalar operations are still left.

The expected behavior is that gcc should not emit scalar operations in both versions.
Comment 1 Richard Biener 2017-03-13 10:34:28 UTC
With __builtin_unreachable(); we early compute n as having low-order bits unset while with the return 0 this is obviously not true.  Given we have no way to
flow-sensitively attach that info to the other path the vectorizer isn't told about this.  There is also no CCP pass after vectorization which eventually
could recover from this.

The two variants are not semantically equivalent btw.

Confirmed.  I think it should be viewed as a missing feature in niter analysis.