This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch, gfortran]: Enable loop unrolling for the matmul intrinsic.


On Sat, Dec 10, 2005 at 05:54:55PM +0200, Janne Blomqvist wrote:
> PING!
> 
> Since I posted the patch, another benchmark result was reported, for
> P4 and P2:
> 
> http://gcc.gnu.org/ml/fortran/2005-12/msg00083.html
> 
> Those results show that unrolling is a big win for P4, and a slightt
> smaller improvement for P2. To summarize, performance improvement at
> the size where maximum performance is achieved for double precision:
> 
> K8 (size 64): 30 %
> 
> P4 (size 32): 55 %
> 
> P2 (size 32): 35 %
> 
> ppc970 (size ?): 14 %
> 
> I think that covers the majority of hardware where gfortran is used.
> 

OK for mainline; wait a few days for 4.1.

For the record, on i386-*-freebsd and an 1.2 GHz athlon.

Without Janne's patch

Single precision matrix multiplication test
 Matrix side size    Matmul (Gflops/s)    sgemm (Gflops/s)      Loops
 ====================================================================
    2                0.066                0.026                100000
    4                0.246                0.131                100000
    8                0.474                0.324                100000
   16                0.444                0.544                100000
   32                0.543                0.736                 15500
   64                0.608                0.851                  1922
  128                0.634                0.834                   239
  256                0.557                0.694                    29
  512                0.151                0.165                     3
 1024                0.149                0.164                     1
 2048                0.147                0.161                     1
 Double precision matrix multiplication test
 Matrix side size    Matmul (Gflops/s)    dgemm (Gflops/s)      Loops
 ====================================================================
    2                0.062                0.025                100000
    4                0.239                0.128                100000
    8                0.480                0.318                100000
   16                0.471                0.535                100000
   32                0.583                0.700                 15500
   64                0.644                0.790                  1922
  128                0.473                0.483                   239
  256                0.093                0.094                    29
  512                0.094                0.094                     3
 1024                0.094                0.094                     1
 2048                0.094                0.093                     1

With the patch:
 Single precision matrix multiplication test
 Matrix side size    Matmul (Gflops/s)    sgemm (Gflops/s)      Loops
 ====================================================================
    2                0.068                0.026                100000
    4                0.249                0.127                100000
    8                0.479                0.326                100000
   16                0.648                0.544                100000
   32                0.776                0.737                 15500
   64                0.846                0.845                  1922
  128                0.816                0.826                   239
  256                0.679                0.675                    29
  512                0.170                0.164                     3
 1024                0.169                0.164                     1
 2048                0.165                0.160                     1
 Double precision matrix multiplication test
 Matrix side size    Matmul (Gflops/s)    dgemm (Gflops/s)      Loops
 ====================================================================
    2                0.066                0.026                100000
    4                0.243                0.129                100000
    8                0.448                0.321                100000
   16                0.612                0.535                100000
   32                0.737                0.698                 15500
   64                0.788                0.786                  1922
  128                0.469                0.468                   239
  256                0.094                0.094                    29
  512                0.094                0.094                     3
 1024                0.094                0.094                     1
 2048                0.094                0.094                     1

-- 
Steve


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]