This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Optimization: Conclusions from Evolutionary Analysis


> For those who haven't the time to read my full article, I thought I'd 
> pass along the conclusions section. One of my goals with Acovea was to 
> provide some empirical evidence of how optimizations behave, and to 
> dispell some common misconceptions about optimization.
> 
> The settings for -O1, -O2, and -O3 are valid. Every option implied by 
> -O3 may not be applicable to every program -- but enabling all those 
> options does not seem to be detrimental, except in the case of 
> huffbench. Several of the recently-added flags (-ftracer in particular) 
> may be candidates for inclusion in -O2 or -O3 for future versions of the 
> compiler.

I would like to see -ftracer enabled for -O3, in fact I am quite
surprised it is not.  We do that for a while in SuSE compilers and it
works fine.
It is not good -O2 candidate as it increase code size too much.  It may
be, however, nice to enable it on -O2 only when -fbranch-probabilities
is present.
> 
> Processor-specific options do not appear to be a major factor in 
> performance on these benchmarks. I don't know if this is due to the 
> nature of the processor, or if GCC can't take advantage of 
> processor-specific instructions. I have double-checked my results; 
> adding -mfpmath=sse (or any of its variants, or -msse) to a compile of 
> almabench does not make the code run any faster. The only ia32-specific 

This is quite surprising, as I can measure pretty clean benefits on SPEC
benchmark running Opteron for instance.  You need to use -mfpmath=sse
-msse2 (or -march=sse2_enabled_CPU) to get double precision arithmetics
in SSE.  I also reproduced similar results for Pentium4 in the past.
Only CPU that does not seem to preffer SSE operations appears to be
PentiumM in my notebook (the hardware implementation is probably quite
poor)
> option that showed consistent value was -momit-leaf-frame-pointer.

How does this compare to -fomit-frame-pointer?
I would like to see it enabled by default now, when we can emit proper
unwind info and debug without it.
The drawback is again the code size, so perhaps it can be -O3 only.
I guess it is time to look into your paper :)

Honza
> 
> The genetic algorithm was able to find sets of flags that produced 
> faster code than that emitted by the default -O1, -O2, and -O3 options 
> (with the exception of almabench, which is largely unoptimizable by 
> nature). In many cases, this was due to the inclusion of "new" flags 
> introduced with more recent version of GCC. Acovea discovers additional 
> options that improve performance over that provided by the default 
> settings; in essence, the genetic algorithm determines which of the 
> non-standard and processor- specific options are most effective for a 
> given algorithm.
> 
> A well-designed program will encapsulate algorithms. For most programs, 
> performance is predicated on specific algorithms that can be identified 
> via profiling; Acovea can then be applied to those algorithms for 
> finding the set of optimizations that produce the fastest code. Only 
> rarely does an entire program need to be fully optimized; instead, 
> optimization should be applied to specific, critical routines that 
> perform concise tasks.
> 
> The full article can be found at:
> 
> http://www.coyotegulch.com/acovea/index.html
> 
> 
> -- 
> Scott Robert Ladd
> Coyote Gulch Productions (http://www.coyotegulch.com)
> Software Invention for High-Performance Computing
> In development: Alex, a database for common folk


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]