This is the mail archive of the
fortran@gcc.gnu.org
mailing list for the GNU Fortran project.
Re: [Patch, gfortran]: Enable loop unrolling for the matmul intrinsic.
- From: Steve Kargl <sgk at troutmask dot apl dot washington dot edu>
- To: GNU GFortran <fortran at gcc dot gnu dot org>, GCC patches <gcc-patches at gcc dot gnu dot org>
- Date: Sat, 10 Dec 2005 12:55:33 -0800
- Subject: Re: [Patch, gfortran]: Enable loop unrolling for the matmul intrinsic.
- References: <20051129200114.GD16405@vipunen.hut.fi> <20051210155455.GD15073@vipunen.hut.fi>
On Sat, Dec 10, 2005 at 05:54:55PM +0200, Janne Blomqvist wrote:
> PING!
>
> Since I posted the patch, another benchmark result was reported, for
> P4 and P2:
>
> http://gcc.gnu.org/ml/fortran/2005-12/msg00083.html
>
> Those results show that unrolling is a big win for P4, and a slightt
> smaller improvement for P2. To summarize, performance improvement at
> the size where maximum performance is achieved for double precision:
>
> K8 (size 64): 30 %
>
> P4 (size 32): 55 %
>
> P2 (size 32): 35 %
>
> ppc970 (size ?): 14 %
>
> I think that covers the majority of hardware where gfortran is used.
>
OK for mainline; wait a few days for 4.1.
For the record, on i386-*-freebsd and an 1.2 GHz athlon.
Without Janne's patch
Single precision matrix multiplication test
Matrix side size Matmul (Gflops/s) sgemm (Gflops/s) Loops
====================================================================
2 0.066 0.026 100000
4 0.246 0.131 100000
8 0.474 0.324 100000
16 0.444 0.544 100000
32 0.543 0.736 15500
64 0.608 0.851 1922
128 0.634 0.834 239
256 0.557 0.694 29
512 0.151 0.165 3
1024 0.149 0.164 1
2048 0.147 0.161 1
Double precision matrix multiplication test
Matrix side size Matmul (Gflops/s) dgemm (Gflops/s) Loops
====================================================================
2 0.062 0.025 100000
4 0.239 0.128 100000
8 0.480 0.318 100000
16 0.471 0.535 100000
32 0.583 0.700 15500
64 0.644 0.790 1922
128 0.473 0.483 239
256 0.093 0.094 29
512 0.094 0.094 3
1024 0.094 0.094 1
2048 0.094 0.093 1
With the patch:
Single precision matrix multiplication test
Matrix side size Matmul (Gflops/s) sgemm (Gflops/s) Loops
====================================================================
2 0.068 0.026 100000
4 0.249 0.127 100000
8 0.479 0.326 100000
16 0.648 0.544 100000
32 0.776 0.737 15500
64 0.846 0.845 1922
128 0.816 0.826 239
256 0.679 0.675 29
512 0.170 0.164 3
1024 0.169 0.164 1
2048 0.165 0.160 1
Double precision matrix multiplication test
Matrix side size Matmul (Gflops/s) dgemm (Gflops/s) Loops
====================================================================
2 0.066 0.026 100000
4 0.243 0.129 100000
8 0.448 0.321 100000
16 0.612 0.535 100000
32 0.737 0.698 15500
64 0.788 0.786 1922
128 0.469 0.468 239
256 0.094 0.094 29
512 0.094 0.094 3
1024 0.094 0.094 1
2048 0.094 0.094 1
--
Steve