This is the mail archive of the
fortran@gcc.gnu.org
mailing list for the GNU Fortran project.
Re: Is it a bug: data aligment of matrix returned by gemm
- From: Maciej Zwierzycki <Maciej dot Zwierzycki at ifmpan dot poznan dot pl>
- To: Tobias Burnus <burnus at net-b dot de>
- Cc: Maciej dot Zwierzycki at ifmpan dot poznan dot pl, fortran at gcc dot gnu dot org
- Date: Thu, 25 Jun 2009 16:34:27 +0200
- Subject: Re: Is it a bug: data aligment of matrix returned by gemm
- References: <20090625123831.GD19665@ifmpan.poznan.pl> <4A438496.1060103@net-b.de>
- Reply-to: Maciej dot Zwierzycki at ifmpan dot poznan dot pl
On Thu, Jun 25, 2009 at 04:07:18PM +0200, Tobias Burnus wrote:
[...]
> Thus when passing
> call dgemm( ...., a, ...)
> the following happens: If "a" is contiguous (i.e. it has no strides),
> it(s data component) is passed as argument; if it is not contiguous, a
> temporary array is created and the data of "a" is copied to the
> temporary array. After the call, the data is copied back (copy in, copy
> out).
>
> That also works. The problem arises for your:
> c(1,:,:)=matmul_func(a,b)
>
> Here, c(1,:,:) has strides and as later in the call "dgemm" the "c"
> argument needs to be contigious, at some point a "temp_c(2,2)" has to be
> created and [copy in]/copy out of the data needs to happen, which - this
> is the bug - does not happen in this case. For details, see below:
I am only guessing what these strides are, but I think I figured out
the gist of it alread (my second post). :)
[...]
> The solution is that at some point a copying of the data needs to happen
> - a natural choice would be for the call of matmul_func; actually, a
> copy in is not needed for a function result thus only a copy out should
> be implemented.
Yup, doing it explicitly inside the function actually helps.
> Thanks for the bug report!
>
> Side note: If you just want to do a matrix multiplication, consider
> using the MATMUL intrinsic directly. If you think that for larger
> matrices, the BLAS function is better and you have a highly optimized
> library, you can use with gfortran the options -fexternal-blas and
> -fblas-matmul-limit=<n> (see manpage).
Thanks to you and Tim for the hint. Neat, although it lowers
portability a little bit.
[...]
> MZ: In general, it helps if you provde a complete test case, which can
> be compiled; i.e. the "..." of the interface filled and there should be
> no rank/paramater problem.
I wanted to write rank=2 instead of range=2 but you have probably
figured it out.
> Though the omissions where small enough to be
> filled in.
I will remember this for the future.
Thanks for clarification!
MZ