This is the mail archive of the
mailing list for the GCC project.
Optimal Optimizations: A Very Preliminary Report
- From: Scott Robert Ladd <coyote at coyotegulch dot com>
- To: gcc mailing list <gcc at gcc dot gnu dot org>
- Date: Mon, 11 Aug 2003 21:05:21 -0400
- Subject: Optimal Optimizations: A Very Preliminary Report
This e-mail may be a tad long; please bear with a bit of explanatory
text before I get to actual results of my research.
My job often entails producing the fastest possible code. Accomplishing
this requires testing various tools, via benchmarks, to understand code
generation and optimization choices. Sometimes, I publish benchmarks on
my web site...
...and thus unleashing a flood of "conventional wisdom." Ranging from
the polite to the insistent to the rude, these e-mails contain
contradictory suggestions for producing fast code.
In the vast majority of cases, anecdotal assertions lack any formal
proof of their validity, and, more often than not, the suggested
"improvement" is ineffective or detrimental. It has become increasingly
obvious that no one -- myself included -- knows precisely how all these
GCC options work together in generating executable code.
Given the large number of possible option combinations, it isn't
practical to test every single possibility. Yet using the -O1, -O2, and
-O3 options hasn't proven effective, either; in many cases, I (and
others) have found that -O1 produces faster code than does -O3.
Finding the optimal optimization options (!?!) is a matter of finding
order amid complexity -- and hence my choice to use a genetic algorithm
for evolving gcc command lines via natural selection.
My program, gccga, breeds new command lines from old ones, using a table
of available options (currently holding about 50 entries for ia32.) I'm
not ready to release the code for my "optimization optimizer" yet,
because I'm still futzing around trying to get the best results.
The options are evaluated against small (100-1000 code lines) C and C++
programs that perform very specific tasks, timing critical operations
internally and reporting results via pipe back to the gccga instance
that spawned them.
Here's a salient example of my early results:
The lpbench program is a modernization and extension of the classic
LINPACK benchmark, written by myself in C. Here are run-times with the
general-purpose optimization options; all tests were performed on a dual
600MHz Pentium III system that sits in a corner and chews on "stuff" for
me. All compiles included the -march=pentium3 option, and were executed
with gcc 3.4 20030806.
-O1 39.6 seconds
-O2 39.7 seconds
-O3 39.9 seconds
After testing 10,000 possibilities (selected by breeding 100
possibilities for 100 generations), gccga identified the following
command line as the "best":
-O1 -fstrength-reduce -fprefetch-loop-arrays \
-finline-functions -fgcse -freduce-all-givs
Compiled with the options above, lpbench ran in 24.5 seconds.
NOTE 1: I did not enable testing of the options entailed in -ffast-math
for this test. I'm still evaluating the real affects of "fast math" on
accuracy (with another set of benchmarks).
NOTE 2: It is the *combination* of those five "-f" options that produces
the fastest code. My algorithm provides statistical data that suggests
the importance of each option; the two most important options were
-fstrength-reduce and -fprefetch-loop-arrays, which (combined) account
for about 80% of the improvement in execution speed.
NOTE 3: I was surprised that none of the ia32-specific options
(-mfpmath, etc.) were effective, and am investigating further.
The lpbench program uses loops and floating-point math; I am running
gccga over several different kinds of programs, including largish C++
programs and integer algorithms.
I am starting testing on a SPARC system, just so I'm not rendered myopic
by Intel's universe.
I'm nowhere near finished in designing my ultimate optimization
optimizer, but I have reached some conclusions:
1) It may be time to revisit the switches enabled by -O2 and -O3.
Perhaps they enable optimizations that are not generally useful, or
which conflict with other options.
2) The selection of the "right" options is not intuitive or obvious, due
to the complexity of gcc. THIS IS NOT A FAULT WITH GCC! I appreciate the
complexity and flexibility gcc offers; however, a real problem exists:
finding the "right" set of options is beyond direct analysis in many cases.
3) Optimal optimization selection is very algorithm dependent (as, I'm
sure, most of us expected).
Given that these test runs can take days to complete, it will be a
couple of weeks before I have final results for four or five different
benchmarks. By posting my ideas and trials here, I hope to stimulate
some conversation that will benefit GCC and its users.
Thank you for tolerating this rather long message -- just be glad that I
cut it substantially from the original draft! ;)
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing