This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: Performance of Integer Multiplication on PIII (Results forgcc-2.95 & Athlon)

To: Jan Hubicka <jh at suse dot cz>
Subject: Re: Performance of Integer Multiplication on PIII (Results forgcc-2.95 & Athlon)
From: pete at ltoi dot iap dot physik dot tu-darmstadt dot de
Date: Tue, 6 Nov 2001 18:09:14 +0200 (MEST)
Cc: gcc at gcc dot gnu dot org


> > > Note that with -march=i686 new gcc often perofrms worse on Athlon, but it is
> > > mainly because it do more Athlon specific stuff.
> >
> > well, actually i compare gcc-3.0.x with gcc-2.95.x (doesn't matter much
> > which, since both scheduler are switched off for i686) for my floating-point
> > heavy stuff. And  gcc-3.0.x performs worse compared with gcc-2.95.x (with
> > -march=i686) regardles of -march=athlon or -march=i686. Other "oddities"
> > may render any improvements of this flag to nearly invisibility ...
> In case you have easy enought examples to analyze by hand, I can take a look
> at it.
Well, initially i thought of a full Radix-8 example ... but that's
to large. Is a (full) Radix-2 FFT {C or Fortran} easy enough?
{together with testbed}

But please, note, at the moment, i didn't know, if this case exhibits
similar patterns, then what i could extract out of C. Whaley analysis.
I only  know {or thought that i know ;-), that gcc-3.0.x version is
slower than gcc-2.95 version. ... But since this FFT stuff did by far not
"rides the bull" as Clint's ATLAS stuff, differences are also not _that_
drastic *), but i could send you a list about how ~50 different FFT
implementations {mostly stolen from the net, some written by me} perform
with gcc-2.95 & gcc-3.0.x

*) but you can see the reason for the slowdown in the asm files of the FFT
kernels, concerning lenght & "how they look".

> Speaking about the Atlas problems, I made some patches that should track the
> problem, but I am not quite sure they got reviewed.
>
> The problem gcc is running into is quite nasty relation of Athlon on chip
> scheduler and gcc's scheduler that both do good job given the information they
> have, but because they do have incomplette information it makes problems.
>
> On FP specific code, there appears to be issues with loop optimizer and strength
> reduction (so nothing related to i386 backend itself).  Basically gcc is now
> bit stronger on strength reduction as bugs that disabled it's loop optimizer
> has been fixed.  Fortran do use -freduce-all-givs switch, that often makes
> strength reduction pass to create too many temporaries (and in other testcases
> reduce the temporaries) that in turn results in instable perofrmance.
>
> For fortran I recoment playing around with -fno-reduce-all-givs that helps
> in testcases I do have.
>
> There are also other problems I am tracking down, but the issue is that gcc's
> loop optimizer is too outdated and interferre badly with gcse and other passes.
> I believe proper solution is to rewrite it, I made some steps in this direction
> but it is more probably 3.2.x issue.

Thank you for your explanation, which improves the understanding of
someone, who can only judge from the outside {of the blackbox gcc}!

Cheers,
  Peter

References:
- Re: Performance of Integer Multiplication on PIII (Results for gcc-2.95 & Athlon)
  - From: Jan Hubicka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]