This page provides examples of results from compiling and executing a variety of sample code:

Polyhedron 2005 Fortran Benchmark - Polyhedron 1st November 2006

(See Compiler Comparisons at http://www.polyhedron.com) The Intel/EM64 was done on the same system as the old benchmark, the Opteron/x86_84 is a new system.

Geometric mean times

Absoft

g95

gfortran

Intel

Lahey

NAG

Pathscale

PGI

Intel/EM64T

25.77

42.17

31.33

23.74

29.03

38.27

25.64

27.10

Opteron/x86_64

17.72

29.20

22.09

18.92

21.48

24.13

17.72

18.88

Polyhedron 2005 Fortran Benchmarks - Paul Thomas 20th September 2006

(See Compiler Comparisons at http://www.polyhedron.com)

Geometric mean times

Absoft

g95

gfortran

Intel

Lahey

NAG

Pathscale

PGI

Intel/EM64T

25.45

41.64

31.76

23.82

29.23

40.11

25.93

27.03

Opteron/x86_64

21.19

35.28

27.47

22.77

25.89

31.37

21.38

24.11

so gfortran is getting in there with the "big-boys". It is interesting to note that the overall performance of gfortran and Intel is almost identical on 32-bit machines and, as seen at the Polyhedron site, the differences with 64-bit machines are concentrated on five "red-spots" (AERMOD, AIR, FATIGUE, GAS_DYN and RNFLOW).

The gfortran scores on diagnostic capability are disappointing. It is noticable that array and character bound checking is the area which would make the biggest difference.

LAPACK: Test execution results - Jerry DeLisle 19th October 2005

csep.out: CST drivers:      1 out of  11664 tests failed to pass the threshold
dgd.out: DXV drivers:    200 out of   5000 tests failed to pass the threshold
sgd.out: SXV drivers:     37 out of   5000 tests failed to pass the threshold
ssep.out: SST:    1 out of  4662 tests failed to pass the threshold
ssep.out: SST drivers:      1 out of  14256 tests failed to pass the threshold
zgd.out: ZXV drivers:     24 out of   5000 tests failed to pass the threshold

1000s: Modified version from netlib, write statements commented out to allow simple timing - Jerry DeLisle 26th May 2005

g77 -O2 -march=pentium4 1000s.f -o 1000s
time ./1000s
real    0m3.013s
user    0m3.005s
sys     0m0.008s
gfortran -O2 -march=pentium4 1000s.f -o 1000s
time ./1000s
real    0m2.354s
user    0m2.324s
sys     0m0.030s

NIST F77 Testsuite - Jerry DeLisle 19th October 2005

The NIST testsuite passes with no failures on all tests. The current FM923.DAT file posted on the NIST website is corrupted. A version of the NIST test suite with script can be obtained at the following location: http://mysite.verizon.net/serveall/NISTtest.tar.gz

HIRLAM - Toon Moene 16th April 2006

A paper discussing HIRLAM, a state of the art, limited area weather forecasting model, compiled and run with gfortran, can be found at http://mysite.verizon.net/serveall/moene.pdf. Additional information on HIRLAM can be found at http://hirlam.knmi.nl .

Polyhedron 2004 F90 Benchmarks (www.polyhedron.com) - Paul Thomas 21st October 2005

First the good news is that all of them compile and run! Secondly, some of them even run quite fast. However, disappointingly, two(fatigue.f90 and kepler.f90) are rather slow, under Cygwin, and one is very slow(induct.f90). I will come back to this after tabulating the results.

What I have done is to run the benchmarks from the console on a 2.5GHz Pentium, under Cygwin_NT and Windows 2000. I have not set up the harness program yet but will update the table when I have. gfortran is invoked with -march=i686 -pg -fmax-stack-var-size=1000000 -O2. The execution time was obtained from > time <program name>. The version used has incorporated the improvement to dependency.c that was discussed on the list 20051018.

For comparison, I used Digital Fortran 6.0 run from the Cygwin console and compilation was done with /FAST and linking with /STACK:20000000

[Note added 8th November 2005 Following the initial investigations, I have written an inline version of dot_product that produces an astonishing improvement in induct.f90 and a less spectacular one for kepler.f90. This will be submitted for inclusion in gfortran in the coming days.

The results of this experimental version of gfortran, using options -fdump-tree-original -march=i686 -malign-double -funroll-loops -O3 under Cygwin, have been inserted in the table. We're getting there!]

Execution times in seconds
Test      | Capacita |  Channel | Fatigue  | Gas_dyn  | Induct   | Kepler   |    NF    |
_______________________________________________________________________________________
gfortran  |          |          |          |          |          |          |          |
20051019  |    140   |     22   |   **83   |     68   |  **407   |  **170   |    77    |
_______________________________________________________________________________________
gfortran  |          |          |          |          |          |          |          |
20051108  |    142   |     22   |   **61   |     40   |     55   |  **125   |    80    |
_______________________________________________________________________________________
DF6.0     |          |          |          |          |          |          |          |
          |    194   |     32   |     28   |     40   |     86   |     83   |    52    |
_______________________________________________________________________________________
Test      | Protein  |  Rnflow  | Test_fpu |
___________________________________________
gfortran  |          |          |          |
20051019  |    104   |     80   |     36   |
___________________________________________
gfortran  |          |          |          |
20051108  |     94   |     74   |     30   |
___________________________________________
DF6.0     |          |          |          |
          |     90   |     63   |     30   |
____________________________________________

I have used gprof to profile the three marked with asterices. kepler.f90 spends 53% of its time in dot_product and induct.f90 57% of its time there. This is surprising, given the how sparse dot_product is; however, this is what I find! For fatigue.f90, this figure is 28% but a lot of time(about 50s) is "missing" in this case (memory allocation from Windows?).

I find it encouraging that so many of the results stand up quite well to this, for its time, top-of-the-range commercial compiler.

I have now redone the tests on FC3/Athlon1700, using ifc7.0 and gfortran 20051019; both with -O2. Note that I have made no serious attempt at optimization of either; for the time being, a rough and ready comparison suffices.

Execution times in seconds
Test      | Capacita |  Channel | Fatigue  | Gas_dyn  | Induct   | Kepler   |    NF    |
_______________________________________________________________________________________
gfortran  |          |          |          |          |          |          |          |
20051019  |    234   |     64   |     42   |     71   |  **390   |  **134   |   165    |
_______________________________________________________________________________________
ifc 7.0   |          |          |          |          |          |          |          |
          |    239   |     96   |     39   |     64   |    141   |     66   |   128    |
_______________________________________________________________________________________
Test      | Protein  |  Rnflow  | Test_fpu |
___________________________________________
gfortran  |          |          |          |
20051019  |    163   |    116   |     87   |
___________________________________________
ifc 7.0   |          |          |          |
          |    109   |    121   |     78   |
____________________________________________
and the f77 results are
Test      |    Ac    |    Air   |   Dudoc  |    Drag  |   Linpk  |   Mdbx   |   Pix    |   Tfft   |
__________________________________________________________________________________________________
gfortran  |          |          |          |          |          |          |          |          |
20051019  |     42   |     51   |    103   |     86   |    118   |     67   |  **102   |    31    |
__________________________________________________________________________________________________
ifc 7.0   |          |          |          |          |          |          |          |          |
          |     29   |     51   |     94   |     72   |    127   |     66   |     52   |    31    |
__________________________________________________________________________________________________

These results are almost too encouraging! Notice that the anomalous result for fatigue.f90, under Cygwin, has disappeared for the Linux run. This tends to support the hypothesis that allocations of memory are to blame. The differences with induct .f90 and kepler.f90 were discussed on the list today (20051020) and I expect some progress there in the medium term, at least(we know what to do now...).

I have returned to Cygwin and taken a cursory look at the f77 cases. The pattern is exactly the same, even to the differences in pix.f; gfc giving 98s and DF6.0, 61s. Once again, from an even more cursory look, the production of temporaries before invocation of an inline version of mod is to blame. Replacing mod (a,b), inline, with a - int (a/b) * b speeds up gfc to 68s.

The Lawrence Livermore Fortran Kernels Test - Paul Thomas 21st October 2005

This comparison between gfortran and DF6.0 is marred by the coarseness of the timer(16ms). In consequence, the timing variances are of order 100% for both compilers. Nonetheless, the results indicate that gfortran is holding its own.

 ********************************************
 THE LIVERMORE  FORTRAN KERNELS:  * SUMMARY *
 ********************************************
              Computer :  2.5GHz Pentium / 1Gbyte
              System   :  CYGWIN_NT-5.0 / Windows2000
              Compiler :  gfortran 20051018 -march=i686 -malign-double -funroll-loops -O3
              Date     :  Late 1992
              Testor   :  John K. Prentice, QCA
         MFLOPS    RANGE:             REPORT ALL RANGE STATISTICS:
         Mean DO Span   =    154
         Code Samples   =     72
         Maximum   Rate =     2977.9683 Mega-Flops/Sec.
         Quartile  Q3   =      722.2454 Mega-Flops/Sec.
         Average   Rate =      571.7355 Mega-Flops/Sec.
         Geometric Mean =      344.5320 Mega-Flops/Sec.
         Median    Q2   =      400.7000 Mega-Flops/Sec.
         Harmonic  Mean =      133.8293 Mega-Flops/Sec.
         Quartile  Q1   =      164.8394 Mega-Flops/Sec.
         Minimum   Rate =       10.8334 Mega-Flops/Sec.
         Standard  Dev. =      573.3039 Mega-Flops/Sec.
         Avg Efficiency =       11.57%  Program & Processor
         Mean Precision =        6.24   Decimal Digits
1
 Version: 22/DEC/86  mf523           6191
 CHECK FOR CLOCK CALIBRATION ONLY:
 Total Job    Cpu Time =     2.07250E+02 Sec.
 Total 24 Kernels Time =     3.18882E+01 Sec.
 Total 24 Kernels Flops=     1.69491E+09 Flops
 ********************************************
 THE LIVERMORE  FORTRAN KERNELS:  * SUMMARY *
 ********************************************
              Computer :  2.5GHz Pentium / 1Gbyte
              System   :  CYGWIN_NT-5.0 / Windows2000
              Compiler :  DEC DF6.0 /FAST
              Date     :  Late 1992
              Testor   :  John K. Prentice, QCA
         MFLOPS    RANGE:             REPORT ALL RANGE STATISTICS:
         Mean DO Span   =    157
         Code Samples   =     72
         Maximum   Rate =     3210.2197 Mega-Flops/Sec.
         Quartile  Q3   =      692.6757 Mega-Flops/Sec.
         Average   Rate =      510.4755 Mega-Flops/Sec.
         Geometric Mean =      339.0222 Mega-Flops/Sec.
         Median    Q2   =      313.3171 Mega-Flops/Sec.
         Harmonic  Mean =      231.3904 Mega-Flops/Sec.
         Quartile  Q1   =      169.4229 Mega-Flops/Sec.
         Minimum   Rate =       44.8465 Mega-Flops/Sec.
         Standard  Dev. =      526.7570 Mega-Flops/Sec.
         Avg Efficiency =       10.56%  Program & Processor
         Mean Precision =        6.44   Decimal Digits
1
 Version: 22/DEC/86  mf523           6170
 CHECK FOR CLOCK CALIBRATION ONLY:
 Total Job    Cpu Time =     1.57906E+02 Sec.
 Total 24 Kernels Time =     4.73348E+00 Sec.
 Total 24 Kernels Flops=     1.69491E+09 Flops

None: GFortranResults (last edited 2008-05-15 18:00:08 by JanneBlomqvist)