Gcc 3.1 performance regressions with respect to 2.95.3

Tim Prince tprince@computer.org
Sun Mar 17 21:04:00 GMT 2002

On Sunday 17 March 2002 16:38, Daniel Berlin wrote:
> On Mon, 18 Mar 2002, Peter Schmid wrote:
> > > It is more or less special case. Overall 3.1 groks better the code
> > > with lots of abstraction, but in this case 2.95 got particulary lucky.
> > > It runs into similar slowdown only on slightly modified Stephanov as
> > > long as I can remember.
> >
> > That is only partly true. When I benchmark gcc 3.1 at the -O versus
> > -O2 optimization level with the help of the bench++ test suite, the
> > performance is better in 45 cases at -O. Interestingly, the checks
> > measuring loop overhead (L*) run all faster at the -O level. L00004 runs
> > two times faster. And even more interesting is that o000007[a,b].cpp
> > which checks the strength reduction capabilities of the compiler,
> > o000007a.cpp which checks dead code elimination, o000010b.cpp which
> > checks redundant code and o000011a.cpp checking the unreachable code
> > optimizating facility run slower at the -O2 optimization level although
> > this level provides specific optimizations for these problems.
> > Furthermore, function calls are slower at the -O2 level:
> >   p000005.cpp  Static Class Method Call: 1-int Arg: Catches Exceptions
> >   p000006.cpp  Static Class Method Call: 1-int *Arg: Catches Exceptions
> >   p000008.cpp  Procedure Call: No Parameters: Called thru pointer,
> > Catches Exceptions p000012.cpp  Procedure Call: 10-(3-int) Args: Catches
> > Exceptions p000023.cpp  Same as p000022: called in loop to see if lookup
> > is optimized
> >
> > And in addition to the decrease in the performance of the stepanov tests
> > there is a substantial decrease in the performance for processing
> > complex numbers (S000004a).
> >
> > It looks that there are some flaws in the -O2 optimizer passes. Is
> > there a chance that this is fixed for the upcoming gcc 3.1 release?
> There are so many possible reasons for these problems it's not
> funny.
> Especially on x86, where register pressure matters a lot.
> If you had smaller test cases, it would be helpful.
> In the meanwhile, i'm going to run these on powerpc and see if i come up
> with the same relative times.
Many cases which show best gcc-3.1 performance with options '-Os 
-march=pentium3 -mpreferred-stack-boundary=4' are quite small already.  For 
example, only 2 or 3 of the Livermore kernels can be speeded up much with 
other options, such as -O2 or -funroll-loops.  I wouldn't suggest that this 
"problem" be solved by reducing the performance of -Os to the level of 
I imagine it would be quite difficult for the compiler to identify the cases 
where unrolling a loop containing branches produces a benefit by increasing 
the number of branching patterns which can be "predicted."
x86 has always had a tendency to favor the small code size optimizations, and 
3.1 has increased this.  Perhaps you see it disguised by the adoption of 
non-alignment as the default for -Os.
Others have been complaining about compilers which require longer compile 
time than MSVC6 to produce code which runs faster than MSVC6 code.  gcc-3.1 
is in a good position, if only -O or -Os is needed to produce better code 
than MSVC6.
Tim Prince

More information about the Gcc mailing list