Bug 118175

Summary: Unable to do auto vectorization for matrix multiply due to aliasing of matrix's
Product: gcc Reporter: Huaqi <fanghuaqi>
Component: tree-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: enhancement CC: rguenth
Priority: P3 Keywords: alias, missed-optimization
Version: 15.0   
Target Milestone: ---   
See Also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118215
Host: Target: riscv
Build: Known to work:
Known to fail: Last reconfirmed: 2024-12-23 00:00:00
Bug Depends on: 102564    
Bug Blocks: 53947    

Description Huaqi 2024-12-23 03:10:40 UTC
Hi there,

I tried to compile code like below

#include <stdio.h>

float *matA, *matB, *matC;
int nthreads;

typedef struct{
   int id;
   int rowsA;
   int colsA;
   int colsB;
} tArgs;

void * CalculaProdutoMatriz(void *arg) {
   int i, j, k;
   tArgs *args = (tArgs*) arg;
   int rowsA = args->rowsA;
   int colsA = args->colsA;
   int colsB = args->colsB;

   for(i = args->id; i < rowsA; i += 1) {
      for(j = 0; j < colsB; j++) {
         matC[i * colsB + j] = 0;
         for(k = 0; k < colsA; k++) {
               matC[i * colsB + j] += matA[i * colsA + k] * matB[k * colsB + j];
         }
      }
   }
   return 0;
}

Compiler options like below

-march=rv32imafc_zve32f_zvl128b -mabi=ilp32f --param=vsetvl-strategy=optim -Ofast -ftree-vectorize -mrvv-max-lmul=m8 -funroll-all-loops

I thought it could be auto vectorization using latest gcc15, but not, is there some compiler options are missed?

I also tried with clang 20 using options -march=rv32imafc_zve32f_zvl128b -mabi=ilp32f -O3  -funroll-loops and it can generate some vector instructions.

And for Arm Cortex R52 with NEON enabled, it also works for gcc options: -mcpu=cortex-r52 -O3 -ftree-vectorize -funroll-all-loops

You can also check it in this link https://godbolt.org/z/dY1r88d1c
Comment 1 Andrew Pinski 2024-12-23 03:29:28 UTC
Looks like an alias issue.
Changing the matrix definitions to:
extern float matA[], matB[], matC[];


Allows GCC to vectorize the loop even for x86_64.
Comment 2 Huaqi 2024-12-23 08:07:41 UTC
(In reply to Andrew Pinski from comment #1)
> Looks like an alias issue.
> Changing the matrix definitions to:
> extern float matA[], matB[], matC[];
> 
> 
> Allows GCC to vectorize the loop even for x86_64.

Yes, it works. But when I changed the function name to void * CalculaProdutoMatriz(float *matA, float *matB, float *matC, void *arg), then it still not work, I expected it may work, since I think it just similiar to what you suggested.

See https://godbolt.org/z/fEE5cMG9n

And when I changed to above code, clang for riscv generate better vector instruction than before.

Thanks
Comment 3 Andrew Pinski 2024-12-23 12:16:04 UTC
(In reply to Huaqi from comment #2)
> (In reply to Andrew Pinski from comment #1)
> > Looks like an alias issue.
> > Changing the matrix definitions to:
> > extern float matA[], matB[], matC[];
> > 
> > 
> > Allows GCC to vectorize the loop even for x86_64.
> 
> Yes, it works. But when I changed the function name to void *
> CalculaProdutoMatriz(float *matA, float *matB, float *matC, void *arg), then
> it still not work, I expected it may work, since I think it just similiar to
> what you suggested.

If you add restrict the arguments, GCC will also vectorizer. Again this is an aliasing issue.
Comment 4 Huaqi 2024-12-24 01:57:24 UTC
(In reply to Andrew Pinski from comment #3)
> (In reply to Huaqi from comment #2)
> > (In reply to Andrew Pinski from comment #1)
> > > Looks like an alias issue.
> > > Changing the matrix definitions to:
> > > extern float matA[], matB[], matC[];
> > > 
> > > 
> > > Allows GCC to vectorize the loop even for x86_64.
> > 
> > Yes, it works. But when I changed the function name to void *
> > CalculaProdutoMatriz(float *matA, float *matB, float *matC, void *arg), then
> > it still not work, I expected it may work, since I think it just similiar to
> > what you suggested.
> 
> If you add restrict the arguments, GCC will also vectorizer. Again this is
> an aliasing issue.

Sorry, I am not an expert in the gcc area. Could you please kindly point out some documentation explaining the gcc aliasing to me? Thank you.
Comment 5 Andrew Pinski 2024-12-24 02:10:42 UTC
(In reply to Huaqi from comment #4)
> 
> Sorry, I am not an expert in the gcc area. Could you please kindly point out
> some documentation explaining the gcc aliasing to me? Thank you.


Yes it is called the c standard. Basically the stores to matc will cause the loads from matb/mata.