This is the mail archive of the
mailing list for the GNU Fortran project.
RFC: optimizing matmul-transpose combinations
- From: Victor Leikehman <LEI at il dot ibm dot com>
- To: fortran at gcc dot gnu dot org
- Date: Mon, 15 Nov 2004 14:26:49 +0200
- Subject: RFC: optimizing matmul-transpose combinations
I looked at performance of galgel spec benchmark, because its performance
is disappointing: with gfortran it is three times slower than with NAG/gcc
combination, and more than four times slower than with ibm xlf compiler.
Profiling shows that galgel spends unreasonable amount of time inside
intrinsic, so I rewrote it for better cache behavior. That improved galgel
scores by about 50%.
Next, it turns out that the the following idiom is frequently used inside
galgel: MATMUL(TRANSPOSE(A),B). So I implemented function
which is the same as MATMUL, but expects the first argument already
I then manually patched the benchmark, replacing the pattern
MATMUL(TRANSPOSE(A),B) with MATMUL_TRANSPOSE(A,B).
This change doubles galgel scores (on top of the previous improvement),
bringing its performance to the level of NAG/gcc.
There seems to be several possible places to put this kind of optimization,
both inside fortran front-end and during the later stages. I would
any ideas where/how to put it.
IBM Research Labs in Haifa, Israel