Bug 118175 - Unable to do auto vectorization for matrix multiply due to aliasing of matrix's
Summary: Unable to do auto vectorization for matrix multiply due to aliasing of matrix's
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 15.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: alias, missed-optimization
Depends on: 102564
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2024-12-23 03:10 UTC by Huaqi
Modified: 2024-12-29 19:06 UTC (History)
1 user (show)

See Also:
Host:
Target: riscv
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-12-23 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Huaqi 2024-12-23 03:10:40 UTC
Hi there,

I tried to compile code like below

#include <stdio.h>

float *matA, *matB, *matC;
int nthreads;

typedef struct{
   int id;
   int rowsA;
   int colsA;
   int colsB;
} tArgs;

void * CalculaProdutoMatriz(void *arg) {
   int i, j, k;
   tArgs *args = (tArgs*) arg;
   int rowsA = args->rowsA;
   int colsA = args->colsA;
   int colsB = args->colsB;

   for(i = args->id; i < rowsA; i += 1) {
      for(j = 0; j < colsB; j++) {
         matC[i * colsB + j] = 0;
         for(k = 0; k < colsA; k++) {
               matC[i * colsB + j] += matA[i * colsA + k] * matB[k * colsB + j];
         }
      }
   }
   return 0;
}

Compiler options like below

-march=rv32imafc_zve32f_zvl128b -mabi=ilp32f --param=vsetvl-strategy=optim -Ofast -ftree-vectorize -mrvv-max-lmul=m8 -funroll-all-loops

I thought it could be auto vectorization using latest gcc15, but not, is there some compiler options are missed?

I also tried with clang 20 using options -march=rv32imafc_zve32f_zvl128b -mabi=ilp32f -O3  -funroll-loops and it can generate some vector instructions.

And for Arm Cortex R52 with NEON enabled, it also works for gcc options: -mcpu=cortex-r52 -O3 -ftree-vectorize -funroll-all-loops

You can also check it in this link https://godbolt.org/z/dY1r88d1c
Comment 1 Andrew Pinski 2024-12-23 03:29:28 UTC
Looks like an alias issue.
Changing the matrix definitions to:
extern float matA[], matB[], matC[];


Allows GCC to vectorize the loop even for x86_64.
Comment 2 Huaqi 2024-12-23 08:07:41 UTC
(In reply to Andrew Pinski from comment #1)
> Looks like an alias issue.
> Changing the matrix definitions to:
> extern float matA[], matB[], matC[];
> 
> 
> Allows GCC to vectorize the loop even for x86_64.

Yes, it works. But when I changed the function name to void * CalculaProdutoMatriz(float *matA, float *matB, float *matC, void *arg), then it still not work, I expected it may work, since I think it just similiar to what you suggested.

See https://godbolt.org/z/fEE5cMG9n

And when I changed to above code, clang for riscv generate better vector instruction than before.

Thanks
Comment 3 Andrew Pinski 2024-12-23 12:16:04 UTC
(In reply to Huaqi from comment #2)
> (In reply to Andrew Pinski from comment #1)
> > Looks like an alias issue.
> > Changing the matrix definitions to:
> > extern float matA[], matB[], matC[];
> > 
> > 
> > Allows GCC to vectorize the loop even for x86_64.
> 
> Yes, it works. But when I changed the function name to void *
> CalculaProdutoMatriz(float *matA, float *matB, float *matC, void *arg), then
> it still not work, I expected it may work, since I think it just similiar to
> what you suggested.

If you add restrict the arguments, GCC will also vectorizer. Again this is an aliasing issue.
Comment 4 Huaqi 2024-12-24 01:57:24 UTC
(In reply to Andrew Pinski from comment #3)
> (In reply to Huaqi from comment #2)
> > (In reply to Andrew Pinski from comment #1)
> > > Looks like an alias issue.
> > > Changing the matrix definitions to:
> > > extern float matA[], matB[], matC[];
> > > 
> > > 
> > > Allows GCC to vectorize the loop even for x86_64.
> > 
> > Yes, it works. But when I changed the function name to void *
> > CalculaProdutoMatriz(float *matA, float *matB, float *matC, void *arg), then
> > it still not work, I expected it may work, since I think it just similiar to
> > what you suggested.
> 
> If you add restrict the arguments, GCC will also vectorizer. Again this is
> an aliasing issue.

Sorry, I am not an expert in the gcc area. Could you please kindly point out some documentation explaining the gcc aliasing to me? Thank you.
Comment 5 Andrew Pinski 2024-12-24 02:10:42 UTC
(In reply to Huaqi from comment #4)
> 
> Sorry, I am not an expert in the gcc area. Could you please kindly point out
> some documentation explaining the gcc aliasing to me? Thank you.


Yes it is called the c standard. Basically the stores to matc will cause the loads from matb/mata.