Bug 113261 - missing vectorization for dot_prod chain.
Summary: missing vectorization for dot_prod chain.
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 14.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2024-01-08 03:25 UTC by Hongtao Liu
Modified: 2024-10-01 01:18 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-10-01 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Hongtao Liu 2024-01-08 03:25:16 UTC
int
foo (char* a, char* b, char* c, char* d)
{
    int sum = 0;
    for (int i = 0; i != 32; i++)
    {
        sum += (a[i] * b[i] + c[i] * d[i]);
    }
    return sum;
}


int
foo1 (char* a, char* b, char* c, char* d)
{
    int sum = 0;
    int sum1 = 0;
    for (int i = 0; i != 32; i++)
    {
        sum += a[i] * b[i]
        sum1 += c[i] * d[i];
    }
    return sum + sum1;
}

foo should be same as foo1, but it failed to be optimized to dot_prod_expr since current vect_recog_dot_prod_pattern only recognize sum += a[i] * b[i];

I think it can be extend to recog dot_prod_expr chain, as long as they're only used by the final sum reduction.
Comment 1 Hongtao Liu 2024-01-08 03:29:28 UTC
For foo1, 

  _99 = .REDUC_PLUS (vect_patt_79.51_97);
  _90 = .REDUC_PLUS (vect_patt_28.43_88);
  _19 = _90 + _99;

can be optimized to 

   _tmp = vect_patt_79.51_97 + vect_patt_28.43_88;
   _19 = .REDUC_PLUS (_tmp);
Comment 2 Andrew Pinski 2024-10-01 01:18:31 UTC
Confirmed.