This is the mail archive of the
mailing list for the GCC project.
Re: [gfortran,patch] New option to use BLAS routines for matrix multiplication
- From: Steven Bosscher <stevenb dot gcc at gmail dot com>
- To: gcc-patches at gcc dot gnu dot org
- Cc: FX Coudert <fxcoudert at gmail dot com>, "fortran at gcc dot gnu dot org List" <fortran at gcc dot gnu dot org>
- Date: Sat, 7 Oct 2006 11:16:06 +0200
- Subject: Re: [gfortran,patch] New option to use BLAS routines for matrix multiplication
- References: <BB94194A-C1ED-4F09-AC57-7A6E9FFF0BF8@gmail.com>
On Saturday 07 October 2006 11:03, FX Coudert wrote:
> Hi all,
> The attached patch allows gfortran-compiled code to use the sgemm/
> dgemm/cgemm/zgemm routines from BLAS to perform matrix
> multiplications. There is no big change from the last patch I posted.
> I'll say a few words about how it works, how I tested it, and then
> answer Janne's comments to my last patch.
> OK for mainline?
It looks like a cool improvement -- but must this be in GCC 4.2?
Don't take this as discouragement, I really like the idea of using
BLAS more when it's available (rationale: why invent the matmul
wheel twice? ;-)
Does the driver know how to link in the BLAS library itself, or is
it up to the user to provide the extra -lblas (or whatever) option?
Where is the documentation of the new options and this new feature?
> It was bootstrapped and regtested on i686-linux. I did some manual
> testing and timing on i686-linux, the result of which can be found
> here: http://www.eleves.ens.fr/home/coudert/timing.png These graphs
> report the execution time as a function of matrix size, for
> multiplication of square matrices of complex and real floating point
> kinds 4 and 8. Black curve is for unpatched gfortran, red is for
> patched gfortran with ATLAS (and size limit 0, i.e. all matrix
> multiplications performed by BLAS calls) and green is for patched
> gfortran with Intel MKL (again, size limit is 0).
Looks actually quite strange to me that there is such a large gap
between libgfortran matmul for complex numbers on the one hand and
the optimized BLASes on the other. Perhaps there is room in our
own implementation for improvements to complex matmul...?