This is the mail archive of the
mailing list for the GCC project.
Re: Loop unrolling-related SPEC regressions?
- From: Tim Prince <tprince at computer dot org>
- To: Andreas Jaeger <aj at suse dot de>,Paolo Carlini <pcarlini at unitus dot it>
- Cc: Jan Hubicka <jh at suse dot cz>,gcc at gcc dot gnu dot org
- Date: Mon, 4 Feb 2002 15:47:48 -0800
- Subject: Re: Loop unrolling-related SPEC regressions?
- References: <3C5ADF19.6A6865D9@unitus.it> <3C5EDB04.220770DE@unitus.it> <email@example.com>
- Reply-to: tprince at computer dot org
On Monday 04 February 2002 11:48, Andreas Jaeger wrote:
> Paolo Carlini <firstname.lastname@example.org> writes:
> > Jan Hubicka wrote:
> >> THe base/peak flags are not supposed to bring best performance,
> >> but be good for testing majority of gcc features.
> > That's really enlightening Honza! Thanks for the clarification.
> > We should also remember this when someone compares the SPEC numbers made
> > available by other compiler producers with those of GCC: my guess is that
> > this kind of rationale for choosing the PEAK flags it's unfortunately not
> > so widespread...
> Didn't I mention it that way? Feel free to send a patch for my SPEC
> page to clarify what we're doing...
Of course, compilers which are sold on the basis of SPEC base performance
have different approach to default options than gcc. One expects the
base option set to be the one which is the best single setting conforming to
the limit on number of options, to obtain the highest rating. Thus, a
compiler such as Intel's makes a simple option package such as
'icc -xW -Oi-'
roughly equivalent to
'gcc -msse2 -march=pentium4 -Os -funroll-loops -mpreferred-stack-boundary=4
with even the base rating depending on Profile Guided Optimization.
Of course, one expects the peak rating to be found with a set of options
which produces the fastest acceptable result for each test, not necessarily
the most aggressive group of optimizations. In that light, the SPEC
disclosures allow one to speculate as to how much trial and error work was
needed to obtain the results submitted, and how much more might be needed to
achieve equivalent performance on a typical application.
I thank Andreas and Honza for explaining the difference between what they
have done and what some of us may have expected.