Hi there, I tried to compile code like below #include <stdio.h> float *matA, *matB, *matC; int nthreads; typedef struct{ int id; int rowsA; int colsA; int colsB; } tArgs; void * CalculaProdutoMatriz(void *arg) { int i, j, k; tArgs *args = (tArgs*) arg; int rowsA = args->rowsA; int colsA = args->colsA; int colsB = args->colsB; for(i = args->id; i < rowsA; i += 1) { for(j = 0; j < colsB; j++) { matC[i * colsB + j] = 0; for(k = 0; k < colsA; k++) { matC[i * colsB + j] += matA[i * colsA + k] * matB[k * colsB + j]; } } } return 0; } Compiler options like below -march=rv32imafc_zve32f_zvl128b -mabi=ilp32f --param=vsetvl-strategy=optim -Ofast -ftree-vectorize -mrvv-max-lmul=m8 -funroll-all-loops I thought it could be auto vectorization using latest gcc15, but not, is there some compiler options are missed? I also tried with clang 20 using options -march=rv32imafc_zve32f_zvl128b -mabi=ilp32f -O3 -funroll-loops and it can generate some vector instructions. And for Arm Cortex R52 with NEON enabled, it also works for gcc options: -mcpu=cortex-r52 -O3 -ftree-vectorize -funroll-all-loops You can also check it in this link https://godbolt.org/z/dY1r88d1c
Looks like an alias issue. Changing the matrix definitions to: extern float matA[], matB[], matC[]; Allows GCC to vectorize the loop even for x86_64.
(In reply to Andrew Pinski from comment #1) > Looks like an alias issue. > Changing the matrix definitions to: > extern float matA[], matB[], matC[]; > > > Allows GCC to vectorize the loop even for x86_64. Yes, it works. But when I changed the function name to void * CalculaProdutoMatriz(float *matA, float *matB, float *matC, void *arg), then it still not work, I expected it may work, since I think it just similiar to what you suggested. See https://godbolt.org/z/fEE5cMG9n And when I changed to above code, clang for riscv generate better vector instruction than before. Thanks
(In reply to Huaqi from comment #2) > (In reply to Andrew Pinski from comment #1) > > Looks like an alias issue. > > Changing the matrix definitions to: > > extern float matA[], matB[], matC[]; > > > > > > Allows GCC to vectorize the loop even for x86_64. > > Yes, it works. But when I changed the function name to void * > CalculaProdutoMatriz(float *matA, float *matB, float *matC, void *arg), then > it still not work, I expected it may work, since I think it just similiar to > what you suggested. If you add restrict the arguments, GCC will also vectorizer. Again this is an aliasing issue.
(In reply to Andrew Pinski from comment #3) > (In reply to Huaqi from comment #2) > > (In reply to Andrew Pinski from comment #1) > > > Looks like an alias issue. > > > Changing the matrix definitions to: > > > extern float matA[], matB[], matC[]; > > > > > > > > > Allows GCC to vectorize the loop even for x86_64. > > > > Yes, it works. But when I changed the function name to void * > > CalculaProdutoMatriz(float *matA, float *matB, float *matC, void *arg), then > > it still not work, I expected it may work, since I think it just similiar to > > what you suggested. > > If you add restrict the arguments, GCC will also vectorizer. Again this is > an aliasing issue. Sorry, I am not an expert in the gcc area. Could you please kindly point out some documentation explaining the gcc aliasing to me? Thank you.
(In reply to Huaqi from comment #4) > > Sorry, I am not an expert in the gcc area. Could you please kindly point out > some documentation explaining the gcc aliasing to me? Thank you. Yes it is called the c standard. Basically the stores to matc will cause the loads from matb/mata.