This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: g77 performance on ALPHA


Hi Martin

On Tue, 31 Aug 1999, Martin Kahlert wrote:
||> Excuse my ignorance but ....
||> ||(The handcoded asm-daxpy of Mr. Goto gets 378.55 MFlops)
||> 
||> Where can I get the hnandcoded BLAS?  Mr. Goto??  Is this the Compaq
||> CXML?  Is it better?
||It is better than CXML. Kazushige Goto from Japan spent some time and effort
||and coded some blas routines in asm. E.g. the ?axpy ?gemm, ?dot routines.
||The dgemm routine for example get about 950MFlops on an 600MHz
||21164 (ev5!). I don't know, how good these routines are for 21264 cpus,
||but they should be even faster there. The only problem is, i can't reach the
||URL any more. It used to be http://www.neuro.uni-oldenburg.de/~joe/math
||but the server refuses any connects, now.

I found a more current URL:

ftp://www.netstat.ne.jp/pub/Linux/Linux-Alpha-JP/BLAS

I can now add to yesterdays results

On my 500MHz Alpha/Linux (ev6) box I get:

% g77 -O3 blas1.f
% repeat 5 ./a.out
  288.39993 MFlops
  288.450704 MFlops
  288.045032 MFlops
  288.045032 MFlops
  287.640449 MFlops
% fort -O -fast -tune ev6 -arch ev6 -O3 blas1.f
% repeat 5 ./a.out
   393.846153846154       MFlops
   393.846153846154       MFlops
   393.846153846154       MFlops
   393.846153846154       MFlops
   393.846153846154       MFlops
% fort -O -fast -tune ev6 -arch ev6 -O3 blas1-cxml.f -lcxml
% repeat 5 ./a.out
   461.261261261261       MFlops
   460.095352622334       MFlops
   460.224655977789       MFlops
   460.224655977789       MFlops
   462.302419375394       MFlops
% fort -O -fast -tune ev6 -arch ev6 -O3 blas1-cxml.f -lcxml
% repeat 5 ./a.out
  461.261261 MFlops
  460.224782 MFlops
  458.165486 MFlops
  461.261261 MFlops
  461.261261 MFlops

||Impressive, i think this comes from the out of order
||capabilities of the 21164.

As has been pointed out I think you mean 21264.

% fort -O5 -fast -tune ev6 -arch ev6  blas1-cxml.f ~/AlphaBLAS/axpy-981128/libaxpy.a
% repeat 5 ./a.out
  504.433498 MFlops
  501.960784 MFlops
  504.433498 MFlops
  500.733571 MFlops
  501.960784 MFlops

I also have a short F95 code which measures dgemm/dsymm/matmul speed
in MFlops.

% fort -O5 -fast -tune ev6 -arch ev6 tblas3-95.f90 -lcxml
% ./a.out
size                                 dsymm  dgemm matmul
200  1.0  5.6 1.5713 1.4297  2.5244 654.96 719.84 407.68 8 10000
300  3.0 19.7 3.7578 3.5859  6.4854 692.08 725.20 401.04 6 10000
400  6.0 43.8 6.2500 5.7812 11.7500 656.96 710.24 349.44 4 10000
500  9.0 65.9 6.1172 5.6406  9.9307 655.20 710.56 403.60 2 10000
600 11.0 85.6 5.2266 4.8672  8.6826 662.32 711.28 398.72 1 10000
% fort -O5 -fast -tune ev6 -arch ev6 tblas3-95.f90 ~/AlphaBLAS/gemm/libgemm.a -lcxml
% ./a.out
size                                 dsymm  dgemm matmul
200  1.0  5.4 1.4229 1.2734  2.6484 723.28 808.16 388.56 8 10000
300  3.0 18.8 3.5703 3.3203  6.2979 728.40 783.28 412.96 6 10000
400  5.0 40.1 5.6875 5.1406 10.2285 722.00 798.80 401.44 4 10000
500  8.0 61.1 5.5469 5.0791  9.8232 722.56 789.12 408.00 2 10000
600 10.0 79.8 4.7969 4.3516  8.6836 721.68 795.52 398.64 1 10000

The improvement is dsymm implies that the cxml dsymm calls the Goto dgemm I suppose.

Thanks,
kevin
-- 
___________________________________________________
| Kevin Maguire                    DisCo Support  |
| CSE                  CLRC Daresbury Laboratory  |
| e-mail:                     K.Maguire@dl.ac.uk  |
| Tel: 01925 603221            Fax: 01925 603634  |
|     -     -     -     -     -     -     -       |
|__  http://www.cse.clrc.ac.uk/Activity/DISCO  ___|



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]