This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Gcc 3.1 performance regressions with respect to 2.95.3
- From: Tim Prince <tprince at computer dot org>
- To: Daniel Berlin <dan at dberlin dot org>,Peter Schmid <schmid at snake dot iap dot physik dot tu-darmstadt dot de>
- Cc: Jan Hubicka <jh at suse dot cz>, Jason Merrill <jason at redhat dot com>,<gcc at gcc dot gnu dot org>, <libstdc++ at gcc dot gnu dot org>
- Date: Sun, 17 Mar 2002 21:04:45 -0800
- Subject: Re: Gcc 3.1 performance regressions with respect to 2.95.3
- References: <Pine.LNX.4.44.0203171937080.20662-100000@dberlin.org>
- Reply-to: tprince at computer dot org
On Sunday 17 March 2002 16:38, Daniel Berlin wrote:
> On Mon, 18 Mar 2002, Peter Schmid wrote:
> > > It is more or less special case. Overall 3.1 groks better the code
> > > with lots of abstraction, but in this case 2.95 got particulary lucky.
> > > It runs into similar slowdown only on slightly modified Stephanov as
> > > long as I can remember.
> >
> > That is only partly true. When I benchmark gcc 3.1 at the -O versus
> > -O2 optimization level with the help of the bench++ test suite, the
> > performance is better in 45 cases at -O. Interestingly, the checks
> > measuring loop overhead (L*) run all faster at the -O level. L00004 runs
> > two times faster. And even more interesting is that o000007[a,b].cpp
> > which checks the strength reduction capabilities of the compiler,
> > o000007a.cpp which checks dead code elimination, o000010b.cpp which
> > checks redundant code and o000011a.cpp checking the unreachable code
> > optimizating facility run slower at the -O2 optimization level although
> > this level provides specific optimizations for these problems.
> > Furthermore, function calls are slower at the -O2 level:
> > p000005.cpp Static Class Method Call: 1-int Arg: Catches Exceptions
> > p000006.cpp Static Class Method Call: 1-int *Arg: Catches Exceptions
> > p000008.cpp Procedure Call: No Parameters: Called thru pointer,
> > Catches Exceptions p000012.cpp Procedure Call: 10-(3-int) Args: Catches
> > Exceptions p000023.cpp Same as p000022: called in loop to see if lookup
> > is optimized
> >
> > And in addition to the decrease in the performance of the stepanov tests
> > there is a substantial decrease in the performance for processing
> > complex numbers (S000004a).
> >
> > It looks that there are some flaws in the -O2 optimizer passes. Is
> > there a chance that this is fixed for the upcoming gcc 3.1 release?
>
> There are so many possible reasons for these problems it's not
> funny.
> Especially on x86, where register pressure matters a lot.
> If you had smaller test cases, it would be helpful.
> In the meanwhile, i'm going to run these on powerpc and see if i come up
> with the same relative times.
Many cases which show best gcc-3.1 performance with options '-Os
-march=pentium3 -mpreferred-stack-boundary=4' are quite small already. For
example, only 2 or 3 of the Livermore kernels can be speeded up much with
other options, such as -O2 or -funroll-loops. I wouldn't suggest that this
"problem" be solved by reducing the performance of -Os to the level of
gcc-3.02.
I imagine it would be quite difficult for the compiler to identify the cases
where unrolling a loop containing branches produces a benefit by increasing
the number of branching patterns which can be "predicted."
x86 has always had a tendency to favor the small code size optimizations, and
3.1 has increased this. Perhaps you see it disguised by the adoption of
non-alignment as the default for -Os.
Others have been complaining about compilers which require longer compile
time than MSVC6 to produce code which runs faster than MSVC6 code. gcc-3.1
is in a good position, if only -O or -Os is needed to produce better code
than MSVC6.
--
Tim Prince