Bug 118217 - Dot-product for square on difference of two small type integers
Summary: Dot-product for square on difference of two small type integers
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 15.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2024-12-27 09:34 UTC by Feng Xue
Modified: 2024-12-31 03:23 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-12-30 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Feng Xue 2024-12-27 09:34:35 UTC
Consider a case:

  int foo(const signed char *a, const signed char *b, int n)
  {
    int sum = 0;

    for (int i = 0; i < n; ++i) {
      int diff = a[i] - b[i];

      sum += diff * diff;
    }
    return sum;
  }

In the case, "diff" is only referenced in a square expression. For architecture that has absolute difference instruction(IFN_ABD), such as aarch64, we could think that there is a hidden abs() around "diff", which ends up with equivalent result as original, in that abs(diff) * abs(diff) = diff * diff. One advantage of this transformation is that we could compute abs(diff) with ABD instruction, at the same time, keeps the result as the same width with two operands, and this exposes an opportunity to generate a more compact dot-product to avoid type-conversions, then code-gen could be as:

  int foo(const signed char *a, const signed char *b, int n)
  {
    int sum = 0;

    for (int i = 0; i < n; i += 16) {
      vector(16) signed char v_a = *(vector(16) signed char *)(&a[i]);
      vector(16) signed char v_b = *(vector(16) signed char *)(&b[i]);
      vector(16) unsigned char v_diff = IFN_ABD(v_a, v_b);

      v_sum += DOT_PROD_EXPR(v_diff, v_diff, v_sum);
    }

    return .REDUC_PLUS(v_sum);
  }
Comment 1 Richard Biener 2024-12-30 13:41:56 UTC
pattern matching can probably be extended to allow this form.