This page provides examples of results from compiling and executing a variety of sample code:
Polyhedron 2005 Fortran Benchmark - Polyhedron 1st November 2006
(See Compiler Comparisons at http://www.polyhedron.com) The Intel/EM64 was done on the same system as the old benchmark, the Opteron/x86_84 is a new system.
Geometric mean times |
||||||||
|
Absoft |
g95 |
gfortran |
Intel |
Lahey |
NAG |
Pathscale |
PGI |
Intel/EM64T |
25.77 |
42.17 |
31.33 |
23.74 |
29.03 |
38.27 |
25.64 |
27.10 |
Opteron/x86_64 |
17.72 |
29.20 |
22.09 |
18.92 |
21.48 |
24.13 |
17.72 |
18.88 |
Polyhedron 2005 Fortran Benchmarks - Paul Thomas 20th September 2006
(See Compiler Comparisons at http://www.polyhedron.com)
Geometric mean times |
||||||||
|
Absoft |
g95 |
gfortran |
Intel |
Lahey |
NAG |
Pathscale |
PGI |
Intel/EM64T |
25.45 |
41.64 |
31.76 |
23.82 |
29.23 |
40.11 |
25.93 |
27.03 |
Opteron/x86_64 |
21.19 |
35.28 |
27.47 |
22.77 |
25.89 |
31.37 |
21.38 |
24.11 |
so gfortran is getting in there with the "big-boys". It is interesting to note that the overall performance of gfortran and Intel is almost identical on 32-bit machines and, as seen at the Polyhedron site, the differences with 64-bit machines are concentrated on five "red-spots" (AERMOD, AIR, FATIGUE, GAS_DYN and RNFLOW).
The gfortran scores on diagnostic capability are disappointing. It is noticable that array and character bound checking is the area which would make the biggest difference.
LAPACK: Test execution results - Jerry DeLisle 19th October 2005
- The dgd, sgd, and zgd failures are known problems with the LAPACK testsuite. The remaining three failures are very near the acceptance criteria threshold. This was compiled using gcc version 4.1.0 20051019 (experimental) i686-pc-linux-gnu with -O1 -march=pentium4:
csep.out: CST drivers: 1 out of 11664 tests failed to pass the threshold dgd.out: DXV drivers: 200 out of 5000 tests failed to pass the threshold sgd.out: SXV drivers: 37 out of 5000 tests failed to pass the threshold ssep.out: SST: 1 out of 4662 tests failed to pass the threshold ssep.out: SST drivers: 1 out of 14256 tests failed to pass the threshold zgd.out: ZXV drivers: 24 out of 5000 tests failed to pass the threshold
1000s: Modified version from netlib, write statements commented out to allow simple timing - Jerry DeLisle 26th May 2005
- This is a comparison between g77 and gfortran, single precision gaussian elimination on 1000 x 1000 matrix:
g77 -O2 -march=pentium4 1000s.f -o 1000s time ./1000s real 0m3.013s user 0m3.005s sys 0m0.008s gfortran -O2 -march=pentium4 1000s.f -o 1000s time ./1000s real 0m2.354s user 0m2.324s sys 0m0.030s
NIST F77 Testsuite - Jerry DeLisle 19th October 2005
The NIST testsuite passes with no failures on all tests. The current FM923.DAT file posted on the NIST website is corrupted. A version of the NIST test suite with script can be obtained at the following location: http://mysite.verizon.net/serveall/NISTtest.tar.gz
HIRLAM - Toon Moene 16th April 2006
A paper discussing HIRLAM, a state of the art, limited area weather forecasting model, compiled and run with gfortran, can be found at http://mysite.verizon.net/serveall/moene.pdf. Additional information on HIRLAM can be found at http://hirlam.knmi.nl .
Polyhedron 2004 F90 Benchmarks (www.polyhedron.com) - Paul Thomas 21st October 2005
First the good news is that all of them compile and run! Secondly, some of them even run quite fast. However, disappointingly, two(fatigue.f90 and kepler.f90) are rather slow, under Cygwin, and one is very slow(induct.f90). I will come back to this after tabulating the results.
What I have done is to run the benchmarks from the console on a 2.5GHz Pentium, under Cygwin_NT and Windows 2000. I have not set up the harness program yet but will update the table when I have. gfortran is invoked with -march=i686 -pg -fmax-stack-var-size=1000000 -O2. The execution time was obtained from > time <program name>. The version used has incorporated the improvement to dependency.c that was discussed on the list 20051018.
For comparison, I used Digital Fortran 6.0 run from the Cygwin console and compilation was done with /FAST and linking with /STACK:20000000
[Note added 8th November 2005 Following the initial investigations, I have written an inline version of dot_product that produces an astonishing improvement in induct.f90 and a less spectacular one for kepler.f90. This will be submitted for inclusion in gfortran in the coming days.
The results of this experimental version of gfortran, using options -fdump-tree-original -march=i686 -malign-double -funroll-loops -O3 under Cygwin, have been inserted in the table. We're getting there!]
Execution times in seconds
Test | Capacita | Channel | Fatigue | Gas_dyn | Induct | Kepler | NF |
_______________________________________________________________________________________
gfortran | | | | | | | |
20051019 | 140 | 22 | **83 | 68 | **407 | **170 | 77 |
_______________________________________________________________________________________
gfortran | | | | | | | |
20051108 | 142 | 22 | **61 | 40 | 55 | **125 | 80 |
_______________________________________________________________________________________
DF6.0 | | | | | | | |
| 194 | 32 | 28 | 40 | 86 | 83 | 52 |
_______________________________________________________________________________________
Test | Protein | Rnflow | Test_fpu |
___________________________________________
gfortran | | | |
20051019 | 104 | 80 | 36 |
___________________________________________
gfortran | | | |
20051108 | 94 | 74 | 30 |
___________________________________________
DF6.0 | | | |
| 90 | 63 | 30 |
____________________________________________I have used gprof to profile the three marked with asterices. kepler.f90 spends 53% of its time in dot_product and induct.f90 57% of its time there. This is surprising, given the how sparse dot_product is; however, this is what I find! For fatigue.f90, this figure is 28% but a lot of time(about 50s) is "missing" in this case (memory allocation from Windows?).
I find it encouraging that so many of the results stand up quite well to this, for its time, top-of-the-range commercial compiler.
I have now redone the tests on FC3/Athlon1700, using ifc7.0 and gfortran 20051019; both with -O2. Note that I have made no serious attempt at optimization of either; for the time being, a rough and ready comparison suffices.
Execution times in seconds
Test | Capacita | Channel | Fatigue | Gas_dyn | Induct | Kepler | NF |
_______________________________________________________________________________________
gfortran | | | | | | | |
20051019 | 234 | 64 | 42 | 71 | **390 | **134 | 165 |
_______________________________________________________________________________________
ifc 7.0 | | | | | | | |
| 239 | 96 | 39 | 64 | 141 | 66 | 128 |
_______________________________________________________________________________________
Test | Protein | Rnflow | Test_fpu |
___________________________________________
gfortran | | | |
20051019 | 163 | 116 | 87 |
___________________________________________
ifc 7.0 | | | |
| 109 | 121 | 78 |
____________________________________________
and the f77 results are
Test | Ac | Air | Dudoc | Drag | Linpk | Mdbx | Pix | Tfft |
__________________________________________________________________________________________________
gfortran | | | | | | | | |
20051019 | 42 | 51 | 103 | 86 | 118 | 67 | **102 | 31 |
__________________________________________________________________________________________________
ifc 7.0 | | | | | | | | |
| 29 | 51 | 94 | 72 | 127 | 66 | 52 | 31 |
__________________________________________________________________________________________________These results are almost too encouraging! Notice that the anomalous result for fatigue.f90, under Cygwin, has disappeared for the Linux run. This tends to support the hypothesis that allocations of memory are to blame. The differences with induct .f90 and kepler.f90 were discussed on the list today (20051020) and I expect some progress there in the medium term, at least(we know what to do now...).
I have returned to Cygwin and taken a cursory look at the f77 cases. The pattern is exactly the same, even to the differences in pix.f; gfc giving 98s and DF6.0, 61s. Once again, from an even more cursory look, the production of temporaries before invocation of an inline version of mod is to blame. Replacing mod (a,b), inline, with a - int (a/b) * b speeds up gfc to 68s.
The Lawrence Livermore Fortran Kernels Test - Paul Thomas 21st October 2005
This comparison between gfortran and DF6.0 is marred by the coarseness of the timer(16ms). In consequence, the timing variances are of order 100% for both compilers. Nonetheless, the results indicate that gfortran is holding its own.
********************************************
THE LIVERMORE FORTRAN KERNELS: * SUMMARY *
********************************************
Computer : 2.5GHz Pentium / 1Gbyte
System : CYGWIN_NT-5.0 / Windows2000
Compiler : gfortran 20051018 -march=i686 -malign-double -funroll-loops -O3
Date : Late 1992
Testor : John K. Prentice, QCA
MFLOPS RANGE: REPORT ALL RANGE STATISTICS:
Mean DO Span = 154
Code Samples = 72
Maximum Rate = 2977.9683 Mega-Flops/Sec.
Quartile Q3 = 722.2454 Mega-Flops/Sec.
Average Rate = 571.7355 Mega-Flops/Sec.
Geometric Mean = 344.5320 Mega-Flops/Sec.
Median Q2 = 400.7000 Mega-Flops/Sec.
Harmonic Mean = 133.8293 Mega-Flops/Sec.
Quartile Q1 = 164.8394 Mega-Flops/Sec.
Minimum Rate = 10.8334 Mega-Flops/Sec.
Standard Dev. = 573.3039 Mega-Flops/Sec.
Avg Efficiency = 11.57% Program & Processor
Mean Precision = 6.24 Decimal Digits
1
Version: 22/DEC/86 mf523 6191
CHECK FOR CLOCK CALIBRATION ONLY:
Total Job Cpu Time = 2.07250E+02 Sec.
Total 24 Kernels Time = 3.18882E+01 Sec.
Total 24 Kernels Flops= 1.69491E+09 Flops
********************************************
THE LIVERMORE FORTRAN KERNELS: * SUMMARY *
********************************************
Computer : 2.5GHz Pentium / 1Gbyte
System : CYGWIN_NT-5.0 / Windows2000
Compiler : DEC DF6.0 /FAST
Date : Late 1992
Testor : John K. Prentice, QCA
MFLOPS RANGE: REPORT ALL RANGE STATISTICS:
Mean DO Span = 157
Code Samples = 72
Maximum Rate = 3210.2197 Mega-Flops/Sec.
Quartile Q3 = 692.6757 Mega-Flops/Sec.
Average Rate = 510.4755 Mega-Flops/Sec.
Geometric Mean = 339.0222 Mega-Flops/Sec.
Median Q2 = 313.3171 Mega-Flops/Sec.
Harmonic Mean = 231.3904 Mega-Flops/Sec.
Quartile Q1 = 169.4229 Mega-Flops/Sec.
Minimum Rate = 44.8465 Mega-Flops/Sec.
Standard Dev. = 526.7570 Mega-Flops/Sec.
Avg Efficiency = 10.56% Program & Processor
Mean Precision = 6.44 Decimal Digits
1
Version: 22/DEC/86 mf523 6170
CHECK FOR CLOCK CALIBRATION ONLY:
Total Job Cpu Time = 1.57906E+02 Sec.
Total 24 Kernels Time = 4.73348E+00 Sec.
Total 24 Kernels Flops= 1.69491E+09 Flops