[libgfortran, configury] BLAS-based implementation of matmul
Sat Apr 1 15:48:00 GMT 2006
I've been playing this afternoon with a toy patch to have BLAS-based
matmul routines in gfortran. What I tested, and which currently works,
is the following: the BLAS routines (the ?GEMM routines, precisely) are
called from within the libgfortran real matmul routines, depending on
conditions about floating-point type, array size and presence of strides
(if there are BLAS implementations that can perform on arrays with
general strides, I'm not aware of it).
This differs from the approach Janne wanted to pursue, i.e. having the
front-end directly generating the calls to BLAS routines. I think that
easiness of implementation and maintainability both go for the
libgfortran solution, while the performance impact shouldn't be so
Anyway, my patch currently works and gives great speedups with an
optimized BLAS on i686-linux (the Intel MKL). There are a few questions
on which I'd like general feedback from the Fortran community, as well
as people skilled in autoconf and top-level configury:
- the BLAS library should be detected at compile-time, when the
front-end is configured, since the specs need to include its path;
unlike GMP/MPFR which are linked with the front-end, it will be linked
to the created executable: are there example of how to handle this in
the current gcc code? any particular idea how we might achieve this?
- could that kind of idea raise any "political" objection?
And to the Fortran people specifically:
- we could have the BLAS selection overridable (is that a real word?)
at compile time, like -fforce-blas and -fforce-no-blas (or just -fblas
- about the current matmul: why do we have special cases for
(axstride == 1 && bxstride == 1) and the case (aystride == 1 && bxstride
== 1), but not (axstride == 1 && bystride == 1) and (aystride == 1 &&
bystride == 1)?
More information about the Gcc-patches