This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Is it a bug: data aligment of matrix returned by gemm


On Thu, Jun 25, 2009 at 04:07:18PM +0200, Tobias Burnus wrote:
[...]   
> Thus when passing
>    call dgemm( ...., a, ...)
> the following happens: If "a" is contiguous (i.e. it has no strides),
> it(s data component) is passed as argument; if it is not contiguous, a
> temporary array is created and the data of "a" is copied to the
> temporary array. After the call, the data is copied back (copy in, copy
> out).
> 
> That also works. The problem arises for your:
>    c(1,:,:)=matmul_func(a,b)
> 
> Here, c(1,:,:) has strides and as later in the call "dgemm" the "c"
> argument needs to be contigious, at some point a "temp_c(2,2)" has to be
> created and [copy in]/copy out of the data needs to happen, which - this
> is the bug - does not happen in this case. For details, see below:
 
I am only guessing what these strides are, but I think I figured out 
the gist of it alread (my second post). :)

[...] 
> The solution is that at some point a copying of the data needs to happen
> - a natural choice would be for the call of matmul_func; actually, a
> copy in is not needed for a function result thus only a copy out should
> be implemented.
 
Yup, doing it explicitly inside the function actually helps.

> Thanks for the bug report!
> 
> Side note: If you just want to do a matrix multiplication, consider
> using the MATMUL intrinsic directly. If you think that for larger
> matrices, the BLAS function is better and you have a highly optimized
> library, you can use with gfortran the options -fexternal-blas and
> -fblas-matmul-limit=<n> (see manpage).
 
Thanks to you and Tim for the hint. Neat, although it lowers 
portability a little bit.

[...] 
> MZ: In general, it helps if you provde a complete test case, which can
> be compiled; i.e. the "..." of the interface filled and there should be
> no rank/paramater problem. 

I wanted to write rank=2 instead of range=2 but you have probably
figured it out.

> Though the omissions where small enough to be
> filled in.

I will remember this for the future.

Thanks for clarification!
MZ


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]