This is the mail archive of the
fortran@gcc.gnu.org
mailing list for the GNU Fortran project.
RFC: optimizing matmul-transpose combinations
- From: Victor Leikehman <LEI at il dot ibm dot com>
- To: fortran at gcc dot gnu dot org
- Date: Mon, 15 Nov 2004 14:26:49 +0200
- Subject: RFC: optimizing matmul-transpose combinations
Guys,
I looked at performance of galgel spec benchmark, because its performance
is disappointing: with gfortran it is three times slower than with NAG/gcc
combination, and more than four times slower than with ibm xlf compiler.
Profiling shows that galgel spends unreasonable amount of time inside
MATMUL
intrinsic, so I rewrote it for better cache behavior. That improved galgel
scores by about 50%.
Next, it turns out that the the following idiom is frequently used inside
galgel: MATMUL(TRANSPOSE(A),B). So I implemented function
MATMUL_TRANSPOSE,
which is the same as MATMUL, but expects the first argument already
transposed.
I then manually patched the benchmark, replacing the pattern
MATMUL(TRANSPOSE(A),B) with MATMUL_TRANSPOSE(A,B).
This change doubles galgel scores (on top of the previous improvement),
bringing its performance to the level of NAG/gcc.
There seems to be several possible places to put this kind of optimization,
both inside fortran front-end and during the later stages. I would
appreciate
any ideas where/how to put it.
Regards,
Victor
--
Victor Leikehman
IBM Research Labs in Haifa, Israel