This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: g77 performance on ALPHA
- To: Martin Kahlert <martin.kahlert@provi.de>
- Subject: Re: g77 performance on ALPHA
- From: Kevin Maguire <K.Maguire@dl.ac.uk>
- Date: Tue, 31 Aug 1999 18:32:37 +0100 (BST)
- cc: Kevin Maguire <K.Maguire@dl.ac.uk>, egcs@egcs.cygnus.com
- Reply-To: Kevin Maguire <K.Maguire@dl.ac.uk>
Hi Martin
On Tue, 31 Aug 1999, Martin Kahlert wrote:
||> Excuse my ignorance but ....
||> ||(The handcoded asm-daxpy of Mr. Goto gets 378.55 MFlops)
||>
||> Where can I get the hnandcoded BLAS? Mr. Goto?? Is this the Compaq
||> CXML? Is it better?
||It is better than CXML. Kazushige Goto from Japan spent some time and effort
||and coded some blas routines in asm. E.g. the ?axpy ?gemm, ?dot routines.
||The dgemm routine for example get about 950MFlops on an 600MHz
||21164 (ev5!). I don't know, how good these routines are for 21264 cpus,
||but they should be even faster there. The only problem is, i can't reach the
||URL any more. It used to be http://www.neuro.uni-oldenburg.de/~joe/math
||but the server refuses any connects, now.
I found a more current URL:
ftp://www.netstat.ne.jp/pub/Linux/Linux-Alpha-JP/BLAS
I can now add to yesterdays results
On my 500MHz Alpha/Linux (ev6) box I get:
% g77 -O3 blas1.f
% repeat 5 ./a.out
288.39993 MFlops
288.450704 MFlops
288.045032 MFlops
288.045032 MFlops
287.640449 MFlops
% fort -O -fast -tune ev6 -arch ev6 -O3 blas1.f
% repeat 5 ./a.out
393.846153846154 MFlops
393.846153846154 MFlops
393.846153846154 MFlops
393.846153846154 MFlops
393.846153846154 MFlops
% fort -O -fast -tune ev6 -arch ev6 -O3 blas1-cxml.f -lcxml
% repeat 5 ./a.out
461.261261261261 MFlops
460.095352622334 MFlops
460.224655977789 MFlops
460.224655977789 MFlops
462.302419375394 MFlops
% fort -O -fast -tune ev6 -arch ev6 -O3 blas1-cxml.f -lcxml
% repeat 5 ./a.out
461.261261 MFlops
460.224782 MFlops
458.165486 MFlops
461.261261 MFlops
461.261261 MFlops
||Impressive, i think this comes from the out of order
||capabilities of the 21164.
As has been pointed out I think you mean 21264.
% fort -O5 -fast -tune ev6 -arch ev6 blas1-cxml.f ~/AlphaBLAS/axpy-981128/libaxpy.a
% repeat 5 ./a.out
504.433498 MFlops
501.960784 MFlops
504.433498 MFlops
500.733571 MFlops
501.960784 MFlops
I also have a short F95 code which measures dgemm/dsymm/matmul speed
in MFlops.
% fort -O5 -fast -tune ev6 -arch ev6 tblas3-95.f90 -lcxml
% ./a.out
size dsymm dgemm matmul
200 1.0 5.6 1.5713 1.4297 2.5244 654.96 719.84 407.68 8 10000
300 3.0 19.7 3.7578 3.5859 6.4854 692.08 725.20 401.04 6 10000
400 6.0 43.8 6.2500 5.7812 11.7500 656.96 710.24 349.44 4 10000
500 9.0 65.9 6.1172 5.6406 9.9307 655.20 710.56 403.60 2 10000
600 11.0 85.6 5.2266 4.8672 8.6826 662.32 711.28 398.72 1 10000
% fort -O5 -fast -tune ev6 -arch ev6 tblas3-95.f90 ~/AlphaBLAS/gemm/libgemm.a -lcxml
% ./a.out
size dsymm dgemm matmul
200 1.0 5.4 1.4229 1.2734 2.6484 723.28 808.16 388.56 8 10000
300 3.0 18.8 3.5703 3.3203 6.2979 728.40 783.28 412.96 6 10000
400 5.0 40.1 5.6875 5.1406 10.2285 722.00 798.80 401.44 4 10000
500 8.0 61.1 5.5469 5.0791 9.8232 722.56 789.12 408.00 2 10000
600 10.0 79.8 4.7969 4.3516 8.6836 721.68 795.52 398.64 1 10000
The improvement is dsymm implies that the cxml dsymm calls the Goto dgemm I suppose.
Thanks,
kevin
--
___________________________________________________
| Kevin Maguire DisCo Support |
| CSE CLRC Daresbury Laboratory |
| e-mail: K.Maguire@dl.ac.uk |
| Tel: 01925 603221 Fax: 01925 603634 |
| - - - - - - - |
|__ http://www.cse.clrc.ac.uk/Activity/DISCO ___|