This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Optimal Optimizations: A Very Preliminary Report

This e-mail may be a tad long; please bear with a bit of explanatory text before I get to actual results of my research.

My job often entails producing the fastest possible code. Accomplishing this requires testing various tools, via benchmarks, to understand code generation and optimization choices. Sometimes, I publish benchmarks on my web site...

...and thus unleashing a flood of "conventional wisdom." Ranging from the polite to the insistent to the rude, these e-mails contain contradictory suggestions for producing fast code.

In the vast majority of cases, anecdotal assertions lack any formal proof of their validity, and, more often than not, the suggested "improvement" is ineffective or detrimental. It has become increasingly obvious that no one -- myself included -- knows precisely how all these GCC options work together in generating executable code.

Given the large number of possible option combinations, it isn't practical to test every single possibility. Yet using the -O1, -O2, and -O3 options hasn't proven effective, either; in many cases, I (and others) have found that -O1 produces faster code than does -O3.

Finding the optimal optimization options (!?!) is a matter of finding order amid complexity -- and hence my choice to use a genetic algorithm for evolving gcc command lines via natural selection.

My program, gccga, breeds new command lines from old ones, using a table of available options (currently holding about 50 entries for ia32.) I'm not ready to release the code for my "optimization optimizer" yet, because I'm still futzing around trying to get the best results.

The options are evaluated against small (100-1000 code lines) C and C++ programs that perform very specific tasks, timing critical operations internally and reporting results via pipe back to the gccga instance that spawned them.

Here's a salient example of my early results:

The lpbench program is a modernization and extension of the classic LINPACK benchmark, written by myself in C. Here are run-times with the general-purpose optimization options; all tests were performed on a dual 600MHz Pentium III system that sits in a corner and chews on "stuff" for me. All compiles included the -march=pentium3 option, and were executed with gcc 3.4 20030806.

    -O1    39.6 seconds
    -O2	   39.7 seconds
    -O3    39.9 seconds

After testing 10,000 possibilities (selected by breeding 100 possibilities for 100 generations), gccga identified the following command line as the "best":

    -O1 -fstrength-reduce -fprefetch-loop-arrays \
        -finline-functions -fgcse -freduce-all-givs

Compiled with the options above, lpbench ran in 24.5 seconds.

NOTE 1: I did not enable testing of the options entailed in -ffast-math for this test. I'm still evaluating the real affects of "fast math" on accuracy (with another set of benchmarks).

NOTE 2: It is the *combination* of those five "-f" options that produces the fastest code. My algorithm provides statistical data that suggests the importance of each option; the two most important options were -fstrength-reduce and -fprefetch-loop-arrays, which (combined) account for about 80% of the improvement in execution speed.

NOTE 3: I was surprised that none of the ia32-specific options (-mfpmath, etc.) were effective, and am investigating further.

The lpbench program uses loops and floating-point math; I am running gccga over several different kinds of programs, including largish C++ programs and integer algorithms.

I am starting testing on a SPARC system, just so I'm not rendered myopic by Intel's universe.

I'm nowhere near finished in designing my ultimate optimization optimizer, but I have reached some conclusions:

1) It may be time to revisit the switches enabled by -O2 and -O3. Perhaps they enable optimizations that are not generally useful, or which conflict with other options.

2) The selection of the "right" options is not intuitive or obvious, due to the complexity of gcc. THIS IS NOT A FAULT WITH GCC! I appreciate the complexity and flexibility gcc offers; however, a real problem exists: finding the "right" set of options is beyond direct analysis in many cases.

3) Optimal optimization selection is very algorithm dependent (as, I'm sure, most of us expected).

Given that these test runs can take days to complete, it will be a couple of weeks before I have final results for four or five different benchmarks. By posting my ideas and trials here, I hope to stimulate some conversation that will benefit GCC and its users.

Thank you for tolerating this rather long message -- just be glad that I cut it substantially from the original draft! ;)

Scott Robert Ladd
Coyote Gulch Productions (
Software Invention for High-Performance Computing

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]