This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Haney's real matrix test regression


Hi Paolo!

> > yesterday i was not here in the office / at my E-mail account;
> > and i haven't read the gcc-mailing list yet. Thus i currently don't
> > know, if this is "urgent" anymore:
>
> Well, in the meanwhile I could establish beyond any reasonable doubt that it is not
> "simply" a question of partial register stalls, therefore an AMD test (vs any other
> x86 check) is not strictly necessary anymore :-)

Fine, that means, i had only to install the new gcc-302 this weekend ;-)

> Have a look at:
>
>     http://gcc.gnu.org/ml/gcc/2001-10/msg01314.html
>
> It is obvious that the innermost loop, that over i, the most important one, is
> *much* smaller and faster for gcc3.0.2 (or 3.1 of just a week ago... :-(
>
> > > My i686 is in fact a PII: perhaps someone may run "Haney Speed" built with
> > > today's 3.1 on an AMD core to exclude quickly the possibility of another nasty
> > > partial register stall???
> >
> > Hm, up to my knowledge, the AMD Athlon didn't suffer (at least that much)
> > from partial register stalls (??) Athlon suffers from the higher latencies
> > of the VIA (or VIA alike AMD) chipsets (Why is VIA-KT266A faster
> > than KT266 ?) and his small TLB's.
>
> If you are curious about PRS, I learned all I know about those (prompted by Richard
> Henderson) from the docs available from:
>
>     http://www.agner.org/assem/
>

Ah, that's nice. Thank you Paolo. After seeing this performance improvement
of the SIMD/3dnow fftwgel over plain FPU fftw i tried to learn some (more)
C & look what i can do for my incarnation of a fast FFT, since my plain FPU
- and old fashioned loop-based - FFT version is faster than fftw. But
gcc-inline asm still looks to complicated for me (expect for very simple
cases, which you can guess, without proper knowledge), i use the gcc compiled
asm-listings as a starting point, for the implementation of a SIMD FFT. But
without proper knowledge of x86 asm, more than a simple SIMD/3dnow Radix-2 was
to complicated (but this one still outperforms all higher Radix FPU versions),
i.e., not tryed yet.

> > Anyway, currently there is no gcc-3.1 installed on my system(s). Any
> > suggestions for a recommended version, that builds clean on a i686 (binutils
> > 2.9.1.0.xx in /usr/bin/ & 2.11.2 in /usr/local/bin) and a i586-system
> > (binutils 2.9.1.0.xx only at the moment)?
>
> Recently, I have started keeping my trees updated using CVS, but I'm seeing that a
> snapshot sufficiently recent (which should display clearly the problem) is
> available from gcc.gnu.org:
>
>     gcc.gnu.org:/pub/gcc/snapshots/2001-10-23
>
> Thank you very much Peter!
> Such sudden performance regression make me crazy!

That's another reason, why they call in "snapshot". More important is, that
those issues are detected & removed, i.e., not remain in the releases.
(as those (at least floating point) performance regressions in gcc-3.0
 compared to gcc-2.95.x i see.  C. Whaley has similar problem with
 gcc-3.0 for his ATLAS stuff)

Greetings,

Peter Schorsch


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]