This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

egcs code performance


The egcs developers have a great deal to be proud of, but I was a little 
surprised last night to read what I think I read, a question from Craig 
Burley asking whether g77 could use any performance improvements.  So I am 
attaching a comparison of single precision Livermore Fortran Kernel results 
between Lahey lf90 and egcs/g77 on Pentium II.  FWIW, the performance 
comparison on PPro is closer to even, probably because Lahey has more 
problems with double alignment than g77.

Options used:

lf90 -winconsole

g77 -O2 -malign-double -march=pentiumpro -funroll-loops -pipe -c

linked with egcs-1.1.2 because the current egcs library won't build on W95

My comments:

(1) In a few cases, g77 shows better accuracy than Lahey.  The precision 
ratings consist of the number of significant decimals in the result which 
agree with the check-sums.  The accuracy rating includes the influence of the 
accuracy of conversion of the data tables from ASCII to binary.

(2) Kernel 10 shows much earlier L1 cache saturation with g77, and the 
performance is 30% worse after saturation.  Much of the g77 performance loss 
may be recovered by turning off specific optimizations for this particular 
test.  This appears to be due to the use of multiple data pointers, which 
spill to memory, where only one is needed.  

(3) Lahey obtains better performance on Kernel 22 by in-lining exp().  
Apparently, there are accuracy trade-offs involved.  Much of the performance 
difference could be recovered by architecture-specific optimizations in the 
math library (newlib in this case).

(4) I chose to present single precision because the differences in accuracy 
are more significant.  Double precision performance comparisons look similar, 
except that there are more L1 cache saturation effects on P II, which swamp 
differences in code quality (aside from the Kernel 10 situation).  I present 
cygwin results because those are slightly more realistic than results from 
linux, where the resolution of standard cpu_time() is not so good.

(5) I have modified the LFK code to make it less dependent on compilers being 
tuned specifically for this benchmark, and to remove some archaisms.  There 
are both f95 and g77 versions at ftp://members.aol.com/n8tm/lloops.shar.gz.  
Those versions have been tested on cygwin, linux-gnulibc1, hppa1.1, and 
irix6.5, and I believe the proprietary compilers are treated fairly in 
comparison with g77.  Many of the speed-ups with the proprietary Unix 
compilers may be obtained only with specific tuning of compiler options and 
placement of directives in the source code.

(6) The performance of g77 on p6 architectures is outstanding in BLAS-style 
loops but suffers from failure to consolidate internally generated pointers 
in more complicated situations.  Much of my code, like LFK, presents many 
opportunities to do this, as the data arrays are defined statically in COMMON 
blocks.  FWIW, the Lahey compilers also depend on the use of COMMON to 
control alignment and pointer bloat. 

(7) The only obvious (to me) accuracy vs performance trade-off would be in 
whether spills are performed in single, double, or extended precision.  For 
g77, that issue arises only in Kernels 8 and 9.

(8) Poor performance on Kernels 13 and 14 is an artifact of the x86 
architecture, which doesn't support INT() operations without switching 
rounding modes and copying through memory.  Kernel 14 may be brought up to 
reasonable performance (without changing the numerical results) by using 
NINT() in the IEEE rounding mode.  The most feasible way of doing this with 
current gnu compilers is with f2c. 
 ********************************************
 THE LIVERMORE  FORTRAN KERNELS:  * SUMMARY *
 ********************************************

              Computer : Pentium II 232 Mhz           
              System   : cygwin                
              Compiler : Lahey lf90 vs egcs-19990412             
              Date     : 1999.4.17                
              Testor   : tprince@computer.org    

	Lahey lf90 results		egcs-19990412 results

 KERNEL   MFLOP SPAN WEIGHT  PRECIS	  MFLOP/S PRECIS
 ------   ----- ---- ------ - -----  --------- -----
  1    139.581   27   1.00    4.90     93.811  4.07
  2     61.356   15   1.00    5.76     57.238  5.15
  3    108.862   27   1.00    4.58     91.538  4.93
  4     82.951   27   1.00    5.08     88.731  4.49
  5     54.454   27   1.00    4.48     50.967  4.58
  6     48.266    8   1.00    4.50     54.196  4.51
  7    139.844   21   1.00    5.08     85.874  5.30
  8    151.030   14   1.00    6.01     84.091  5.67
  9    153.981   15   1.00    4.53     90.675  6.20
 10     76.326   15   1.00    4.59     35.366  5.73
 11     63.823   27   1.00    5.97     57.003  5.53
 12     51.770   26   1.00 -  1.62     67.085  1.60
 13      6.154    8   1.00    4.85      5.640  4.80
 14     18.334   27   1.00    1.81     15.040  1.79
 15     24.155   15   1.00    3.37     25.521  2.78
 16     46.796   15   1.00    7.92     34.533  7.92
 17     73.655   15   1.00    7.24     46.632  5.58
 18     72.929   14   1.00    5.24     69.342  6.09
 19     56.053   15   1.00    4.75     54.799  5.08
 20     33.155   26   1.00    6.64     29.694  5.08
 21     87.379   20   1.00    5.18    139.525  5.77
 22     16.821   15   1.00    4.53     10.299  6.97
 23    112.582   14   1.00    4.61    118.833  5.66
 24     26.247   27   1.00    7.92     20.221  7.92
  1    146.830  101   2.00    4.89    100.811  4.07
  2     85.623  101   2.00    5.75     92.008  5.41
  3    125.906  101   2.00    4.99    118.061  5.30
  4    106.420  101   2.00    5.08    153.776  4.49
  5     55.844  101   2.00    5.19     54.832  5.47
  6     86.864   32   2.00    4.71    100.246  4.65
  7    144.963  101   2.00    5.66     84.960  5.32
  8    152.704  100   2.00    5.23     83.872  5.98
  9    161.016  101   2.00    7.19     91.827  6.67
 10     39.517  101   2.00    6.99     30.955  6.81
 11     71.550  101   2.00    7.92     68.660  6.00
 12     54.343  100   2.00    2.69     83.997  1.30
 13      6.168   32   2.00    5.90      5.751  5.50
 14     17.411  101   2.00    2.04     15.021  2.12
 15     23.549  101   2.00    2.11     25.514  2.10
 16     44.586   40   2.00    7.92     35.244  7.92
 17     76.655  101   2.00    5.40     45.691  5.70
 18     74.348  100   2.00    5.77     70.881  6.45
 19     60.045  101   2.00    5.84     55.432  6.49
 20     33.345  100   2.00    4.07     29.892  4.45
 21     86.926   50   2.00    5.47    137.527  5.78
 22     16.809  101   2.00    6.04     10.427  5.53
 23    117.907  100   2.00    5.76    117.857  6.05
 24     32.341  101   2.00    7.92     24.291  7.92
  1    150.344 1001   1.00    4.90    103.419  4.07
  2     85.247  101   1.00    5.75     90.444  5.41
  3    130.532 1001   1.00    6.40    127.096  5.80
  4    125.215 1001   1.00    5.08    184.977  4.49
  5     56.113 1001   1.00    5.54     55.542  7.40
  6     94.788   64   1.00    4.68    102.272  4.60
  7    142.007  995   1.00    5.35     86.446  5.64
  8    151.936  100   1.00    5.23     82.897  5.98
  9    160.154  101   1.00    7.19     91.956  6.67
 10     39.685  101   1.00    6.99     30.892  6.81
 11     72.551 1001   1.00    5.97     72.928  6.30
 12     55.262 1000   1.00 -  0.00     87.600  0.00
 13      6.361   64   1.00    6.03      5.992  6.03
 14     16.221 1001   1.00    1.72     14.808  2.22
 15     23.494  101   1.00    2.11     25.557  2.10
 16     47.271   75   1.00    7.92     35.983  7.92
 17     75.441  101   1.00    5.40     45.994  5.70
 18     74.258  100   1.00    5.77     70.613  6.45
 19     60.666  101   1.00    5.84     55.581  6.49
 20     33.294 1000   1.00    3.70     29.446  4.37
 21     86.698  101   1.00    5.56    135.572  5.89
 22     16.690  101   1.00    6.04     10.439  5.53
 23    118.099  100   1.00    5.76    119.593  6.05
 24     36.397 1001   1.00    7.92     24.765  7.92


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]