This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: optimizing matmul-transpose combinations


On Monday 15 November 2004 12:26, Victor Leikehman wrote:
> Next, it turns out that the the following idiom is frequently used inside
> galgel:  MATMUL(TRANSPOSE(A),B).  So I implemented function
> MATMUL_TRANSPOSE,
> which is the same as MATMUL, but expects the first argument already
> transposed.
> I then manually patched the benchmark, replacing the pattern
> MATMUL(TRANSPOSE(A),B) with MATMUL_TRANSPOSE(A,B).
>
> This change doubles galgel scores (on top of the previous improvement),
> bringing its performance to the level of NAG/gcc.
>
> There seems to be several possible places to put this kind of optimization,
> both inside fortran front-end and during the later stages.   I would
> appreciate
> any ideas where/how to put it.

I'd approach this from a different angle.

Our current implementation doesn't know that TRANSPOSE  is special, so usually 
makes a copy of the array. It should be possible to eliminate the transpose 
library call altogether by simply creating a new array descriptor with the 
strides reversed.

The tricky bit is getting the dependency information correct so that the 
scalarizer still creates copies when necessary. This involves tweaking the 
expression walking routines routines, and probably other stuff, so that 
TRANSPOSE is treated more like an elemental function than a transformational 
one.

This would help all cases where transpose is used, not just this specific one.

A few examples to show what I mean:
The important one being (3), the others are performance-neutral.
(2) is still sub-optimal, but no worse than what we already have.

1)
Original code:
  A = TRANSPOSE(B)
Current Code:
  _gfortran_transpose(A, B);
New code:
  array_copy(transpose_descriptor(A), B)

2)
Original code:
  A = TRANSPOSE(A)
Current Code:
  allocate(tmp)
  _gfortran_transpose(tmp, A)
  array_copy (A, tmp)
New code:
  allocate(tmp)
  array_copy(transpose_descriptor(A), tmp)
  array_copy (A, tmp)

3)
Original code:
  CALL FOO(TRANSPOSE(A))
Current code:
  allocate(tmp)
  _gfortran_transpose(tmp, A)
  foo(tmp)
New code:
  foo(transpose_descriptor(A))

4)
Original code:
  A = BAR(TRANSPOSE(A))
Current code:
  allocate (tmp)
  _gfortran_transpose (tmp, A)
  bar(A, tmp)
New code:
  allocate(tmp)
  bar(tmp, transpose_descriptor(A))
  array_copy(A, tmp)

Paul


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]