This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[gfortran,patch] New option to use BLAS routines for matrix multiplication

Hi all,

The attached patch allows gfortran-compiled code to use the sgemm/ dgemm/cgemm/zgemm routines from BLAS to perform matrix multiplications. There is no big change from the last patch I posted. I'll say a few words about how it works, how I tested it, and then answer Janne's comments to my last patch.

OK for mainline?

*How does it work?*

The library functions matmul_{i,r,c}{4,8,10,16} take three extra arguments: int try_blas tells whether we want to try using a BLAS; int blas_limit is the size criterion for using BLAS or libgfortran code; the last argument is a pointer to the BLAS function to be used.

The front-end function gfc_conv_function_call (in trans-expr.c) is added an extra tree argument, which corresponds to arguments that will be added after all other function arguments. It's currently only used for the translation of MATMUL calls, but it could be used in the future e.g. bounds checking information.

*How did I test it?*

It was bootstrapped and regtested on i686-linux. I did some manual testing and timing on i686-linux, the result of which can be found here: These graphs report the execution time as a function of matrix size, for multiplication of square matrices of complex and real floating point kinds 4 and 8. Black curve is for unpatched gfortran, red is for patched gfortran with ATLAS (and size limit 0, i.e. all matrix multiplications performed by BLAS calls) and green is for patched gfortran with Intel MKL (again, size limit is 0).

I also attach to this file an ugly "regression-tester" I made. It builds matrices of all types, kinds, size and stride, and performs matrix multiplication by different means and compares the results. To compile and run:

$ gfortran -c matmul_blas.f90 -fexternal-blas -fblas-matmul-limit=0
$ gfortran matrix.F90 gemm.f90 matmul_blas.o  -ffree-line-length-none
$ ./a.out

The integer parameters nmax and ncheck in matrix.F90 can be changed and control the maximal size of the generated matrices, and the number of check cycles to run (the more the merrier, since some parameters are randomly chosen).

*Answers to Janne's comments* ( msg00380.html)

point 1) you're right

point 1b) the three size involved are xcount, ycount and count; the BLAS function is now called when the (geometric) mean of these is higher than the size limit specified; numbers are cast to float for calculations to avoid overflow

point 2) well, for what I have seen, the function call penalty is negligeable, especially since BLAS routines are used for heavy computations

:ADDPATCH fortran:

Attachment: blas_matmul3.ChangeLog
Description: Binary data

Attachment: blas_matmul3.diff
Description: Binary data

Attachment: gemm.f90
Description: Binary data

Attachment: matmul_blas.f90
Description: Binary data

Attachment: matrix.F90
Description: Binary data

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]