This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RFC: optimizing matmul-transpose combinations





Guys,

I looked at performance of galgel spec benchmark, because its performance
is disappointing: with gfortran it is three times slower than with NAG/gcc
combination, and more than four times slower than with ibm xlf compiler.

Profiling shows that galgel spends unreasonable amount of time inside
MATMUL
intrinsic, so I rewrote it for better cache behavior.  That improved galgel
scores by about 50%.

Next, it turns out that the the following idiom is frequently used inside
galgel:  MATMUL(TRANSPOSE(A),B).  So I implemented function
MATMUL_TRANSPOSE,
which is the same as MATMUL, but expects the first argument already
transposed.
I then manually patched the benchmark, replacing the pattern
MATMUL(TRANSPOSE(A),B) with MATMUL_TRANSPOSE(A,B).

This change doubles galgel scores (on top of the previous improvement),
bringing its performance to the level of NAG/gcc.

There seems to be several possible places to put this kind of optimization,
both inside fortran front-end and during the later stages.   I would
appreciate
any ideas where/how to put it.

Regards,
      Victor
--
  Victor Leikehman
  IBM Research Labs in Haifa, Israel


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]