This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: RFC: optimizing matmul-transpose combinations

From: Paul Brook <paul at codesourcery dot com>
To: fortran at gcc dot gnu dot org
Cc: Victor Leikehman <LEI at il dot ibm dot com>
Date: Tue, 16 Nov 2004 12:38:08 +0000
Subject: Re: RFC: optimizing matmul-transpose combinations
Organization: CodeSourcery
References: <OFEDF19F43.2104E4C5-ON42256F4E.00284702-42256F4E.00295F70@il.ibm.com>

On Tuesday 16 November 2004 07:31, Victor Leikehman wrote:
> Paul Brook wrote:
> > I'd approach this from a different angle.
> >
> > Our current implementation doesn't know that TRANSPOSE  is special, so
>
> usually
>
> > makes a copy of the array. It should be possible to eliminate the
>
> transpose
>
> > library call altogether by simply creating a new array descriptor with
>
> the
>
> > strides reversed.
>
> I totally agree that transpose should be implemented as you describe.
> Yet, it seems to me a separate issue, for three reasons:
>
> 1. MATMUL_TRANSPOSE is inherently faster than plain MATMUL, because it
>    can accumulate the result for each (x,y) and only then store it. So,
>    in MATMUL_TRANSPOSE we have x*y stores, as opposed to x*y*n stores in
>    MATMUL.

I don't understand.  AFAICS the generic matmul implementation only does x*y 
stores.  Could you post your implementation of matmul_transpose (or point me 
at the message if you already have).

> 2. It is natural to implement a specialized version of matmul that assumes
>    that the first stride of all arguments is 1, as I did in the proposed
>    implementation.  Passing the first argument with transposed strides will
>    invoke the slower, generic version.

Yes, but this is a separate optimisation, applicable in other cases.  In 
general you can not assume the arguments to an intrinsic have unit stride.

> 3. Combining transpose and matmul is not some benchmark-dependent hack that
>    joins two unrelated functions.  It is a mathematically sound idiom,
>    which asks for compiler support.

Maybe. I'm wondering why this isn't something we get "free" once we have 
decent matmul and transpose implementations.

I'm assuming arrays large enough that the overhead of the function call is 
small.

Paul

Follow-Ups:
- Re: RFC: optimizing matmul-transpose combinations
  - From: Victor Leikehman

References:
- Re: RFC: optimizing matmul-transpose combinations
  - From: Victor Leikehman

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]