This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[gfortran,patch] New option to use BLAS routines for matrix multiplication
- From: FX Coudert <fxcoudert at gmail dot com>
- To: "fortran at gcc dot gnu dot org List" <fortran at gcc dot gnu dot org>, patch patches <gcc-patches at gcc dot gnu dot org>
- Date: Sat, 7 Oct 2006 11:03:29 +0200
- Subject: [gfortran,patch] New option to use BLAS routines for matrix multiplication
Hi all,
The attached patch allows gfortran-compiled code to use the sgemm/
dgemm/cgemm/zgemm routines from BLAS to perform matrix
multiplications. There is no big change from the last patch I posted.
I'll say a few words about how it works, how I tested it, and then
answer Janne's comments to my last patch.
OK for mainline?
*How does it work?*
The library functions matmul_{i,r,c}{4,8,10,16} take three extra
arguments: int try_blas tells whether we want to try using a BLAS;
int blas_limit is the size criterion for using BLAS or libgfortran
code; the last argument is a pointer to the BLAS function to be used.
The front-end function gfc_conv_function_call (in trans-expr.c) is
added an extra tree argument, which corresponds to arguments that
will be added after all other function arguments. It's currently only
used for the translation of MATMUL calls, but it could be used in the
future e.g. bounds checking information.
*How did I test it?*
It was bootstrapped and regtested on i686-linux. I did some manual
testing and timing on i686-linux, the result of which can be found
here: http://www.eleves.ens.fr/home/coudert/timing.png These graphs
report the execution time as a function of matrix size, for
multiplication of square matrices of complex and real floating point
kinds 4 and 8. Black curve is for unpatched gfortran, red is for
patched gfortran with ATLAS (and size limit 0, i.e. all matrix
multiplications performed by BLAS calls) and green is for patched
gfortran with Intel MKL (again, size limit is 0).
I also attach to this file an ugly "regression-tester" I made. It
builds matrices of all types, kinds, size and stride, and performs
matrix multiplication by different means and compares the results. To
compile and run:
$ gfortran -c matmul_blas.f90 -fexternal-blas -fblas-matmul-limit=0
$ gfortran matrix.F90 gemm.f90 matmul_blas.o -ffree-line-length-none
$ ./a.out
The integer parameters nmax and ncheck in matrix.F90 can be changed
and control the maximal size of the generated matrices, and the
number of check cycles to run (the more the merrier, since some
parameters are randomly chosen).
*Answers to Janne's comments* (http://gcc.gnu.org/ml/fortran/2006-09/
msg00380.html)
point 1) you're right
point 1b) the three size involved are xcount, ycount and count; the
BLAS function is now called when the (geometric) mean of these is
higher than the size limit specified; numbers are cast to float for
calculations to avoid overflow
point 2) well, for what I have seen, the function call penalty is
negligeable, especially since BLAS routines are used for heavy
computations
:ADDPATCH fortran:
Attachment:
blas_matmul3.ChangeLog
Description: Binary data
Attachment:
blas_matmul3.diff
Description: Binary data
Attachment:
gemm.f90
Description: Binary data
Attachment:
matmul_blas.f90
Description: Binary data
Attachment:
matrix.F90
Description: Binary data