This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
egcs code performance
- To: egcs at egcs dot cygnus dot com
- Subject: egcs code performance
- From: N8TM at aol dot com
- Date: Sat, 17 Apr 1999 11:22:38 EDT
- Reply-To: N8TM at aol dot com
The egcs developers have a great deal to be proud of, but I was a little
surprised last night to read what I think I read, a question from Craig
Burley asking whether g77 could use any performance improvements. So I am
attaching a comparison of single precision Livermore Fortran Kernel results
between Lahey lf90 and egcs/g77 on Pentium II. FWIW, the performance
comparison on PPro is closer to even, probably because Lahey has more
problems with double alignment than g77.
Options used:
lf90 -winconsole
g77 -O2 -malign-double -march=pentiumpro -funroll-loops -pipe -c
linked with egcs-1.1.2 because the current egcs library won't build on W95
My comments:
(1) In a few cases, g77 shows better accuracy than Lahey. The precision
ratings consist of the number of significant decimals in the result which
agree with the check-sums. The accuracy rating includes the influence of the
accuracy of conversion of the data tables from ASCII to binary.
(2) Kernel 10 shows much earlier L1 cache saturation with g77, and the
performance is 30% worse after saturation. Much of the g77 performance loss
may be recovered by turning off specific optimizations for this particular
test. This appears to be due to the use of multiple data pointers, which
spill to memory, where only one is needed.
(3) Lahey obtains better performance on Kernel 22 by in-lining exp().
Apparently, there are accuracy trade-offs involved. Much of the performance
difference could be recovered by architecture-specific optimizations in the
math library (newlib in this case).
(4) I chose to present single precision because the differences in accuracy
are more significant. Double precision performance comparisons look similar,
except that there are more L1 cache saturation effects on P II, which swamp
differences in code quality (aside from the Kernel 10 situation). I present
cygwin results because those are slightly more realistic than results from
linux, where the resolution of standard cpu_time() is not so good.
(5) I have modified the LFK code to make it less dependent on compilers being
tuned specifically for this benchmark, and to remove some archaisms. There
are both f95 and g77 versions at ftp://members.aol.com/n8tm/lloops.shar.gz.
Those versions have been tested on cygwin, linux-gnulibc1, hppa1.1, and
irix6.5, and I believe the proprietary compilers are treated fairly in
comparison with g77. Many of the speed-ups with the proprietary Unix
compilers may be obtained only with specific tuning of compiler options and
placement of directives in the source code.
(6) The performance of g77 on p6 architectures is outstanding in BLAS-style
loops but suffers from failure to consolidate internally generated pointers
in more complicated situations. Much of my code, like LFK, presents many
opportunities to do this, as the data arrays are defined statically in COMMON
blocks. FWIW, the Lahey compilers also depend on the use of COMMON to
control alignment and pointer bloat.
(7) The only obvious (to me) accuracy vs performance trade-off would be in
whether spills are performed in single, double, or extended precision. For
g77, that issue arises only in Kernels 8 and 9.
(8) Poor performance on Kernels 13 and 14 is an artifact of the x86
architecture, which doesn't support INT() operations without switching
rounding modes and copying through memory. Kernel 14 may be brought up to
reasonable performance (without changing the numerical results) by using
NINT() in the IEEE rounding mode. The most feasible way of doing this with
current gnu compilers is with f2c.
********************************************
THE LIVERMORE FORTRAN KERNELS: * SUMMARY *
********************************************
Computer : Pentium II 232 Mhz
System : cygwin
Compiler : Lahey lf90 vs egcs-19990412
Date : 1999.4.17
Testor : tprince@computer.org
Lahey lf90 results egcs-19990412 results
KERNEL MFLOP SPAN WEIGHT PRECIS MFLOP/S PRECIS
------ ----- ---- ------ - ----- --------- -----
1 139.581 27 1.00 4.90 93.811 4.07
2 61.356 15 1.00 5.76 57.238 5.15
3 108.862 27 1.00 4.58 91.538 4.93
4 82.951 27 1.00 5.08 88.731 4.49
5 54.454 27 1.00 4.48 50.967 4.58
6 48.266 8 1.00 4.50 54.196 4.51
7 139.844 21 1.00 5.08 85.874 5.30
8 151.030 14 1.00 6.01 84.091 5.67
9 153.981 15 1.00 4.53 90.675 6.20
10 76.326 15 1.00 4.59 35.366 5.73
11 63.823 27 1.00 5.97 57.003 5.53
12 51.770 26 1.00 - 1.62 67.085 1.60
13 6.154 8 1.00 4.85 5.640 4.80
14 18.334 27 1.00 1.81 15.040 1.79
15 24.155 15 1.00 3.37 25.521 2.78
16 46.796 15 1.00 7.92 34.533 7.92
17 73.655 15 1.00 7.24 46.632 5.58
18 72.929 14 1.00 5.24 69.342 6.09
19 56.053 15 1.00 4.75 54.799 5.08
20 33.155 26 1.00 6.64 29.694 5.08
21 87.379 20 1.00 5.18 139.525 5.77
22 16.821 15 1.00 4.53 10.299 6.97
23 112.582 14 1.00 4.61 118.833 5.66
24 26.247 27 1.00 7.92 20.221 7.92
1 146.830 101 2.00 4.89 100.811 4.07
2 85.623 101 2.00 5.75 92.008 5.41
3 125.906 101 2.00 4.99 118.061 5.30
4 106.420 101 2.00 5.08 153.776 4.49
5 55.844 101 2.00 5.19 54.832 5.47
6 86.864 32 2.00 4.71 100.246 4.65
7 144.963 101 2.00 5.66 84.960 5.32
8 152.704 100 2.00 5.23 83.872 5.98
9 161.016 101 2.00 7.19 91.827 6.67
10 39.517 101 2.00 6.99 30.955 6.81
11 71.550 101 2.00 7.92 68.660 6.00
12 54.343 100 2.00 2.69 83.997 1.30
13 6.168 32 2.00 5.90 5.751 5.50
14 17.411 101 2.00 2.04 15.021 2.12
15 23.549 101 2.00 2.11 25.514 2.10
16 44.586 40 2.00 7.92 35.244 7.92
17 76.655 101 2.00 5.40 45.691 5.70
18 74.348 100 2.00 5.77 70.881 6.45
19 60.045 101 2.00 5.84 55.432 6.49
20 33.345 100 2.00 4.07 29.892 4.45
21 86.926 50 2.00 5.47 137.527 5.78
22 16.809 101 2.00 6.04 10.427 5.53
23 117.907 100 2.00 5.76 117.857 6.05
24 32.341 101 2.00 7.92 24.291 7.92
1 150.344 1001 1.00 4.90 103.419 4.07
2 85.247 101 1.00 5.75 90.444 5.41
3 130.532 1001 1.00 6.40 127.096 5.80
4 125.215 1001 1.00 5.08 184.977 4.49
5 56.113 1001 1.00 5.54 55.542 7.40
6 94.788 64 1.00 4.68 102.272 4.60
7 142.007 995 1.00 5.35 86.446 5.64
8 151.936 100 1.00 5.23 82.897 5.98
9 160.154 101 1.00 7.19 91.956 6.67
10 39.685 101 1.00 6.99 30.892 6.81
11 72.551 1001 1.00 5.97 72.928 6.30
12 55.262 1000 1.00 - 0.00 87.600 0.00
13 6.361 64 1.00 6.03 5.992 6.03
14 16.221 1001 1.00 1.72 14.808 2.22
15 23.494 101 1.00 2.11 25.557 2.10
16 47.271 75 1.00 7.92 35.983 7.92
17 75.441 101 1.00 5.40 45.994 5.70
18 74.258 100 1.00 5.77 70.613 6.45
19 60.666 101 1.00 5.84 55.581 6.49
20 33.294 1000 1.00 3.70 29.446 4.37
21 86.698 101 1.00 5.56 135.572 5.89
22 16.690 101 1.00 6.04 10.439 5.53
23 118.099 100 1.00 5.76 119.593 6.05
24 36.397 1001 1.00 7.92 24.765 7.92