This is the mail archive of the
mailing list for the GCC project.
Re: Haney's real matrix test regression
- To: Paolo Carlini <pcarlini at unitus dot it>
- Subject: Re: Haney's real matrix test regression
- From: pete at ltoi dot iap dot physik dot tu-darmstadt dot de
- Date: Sat, 27 Oct 2001 14:44:37 +0200 (MEST)
- Cc: gcc at gcc dot gnu dot org
> > yesterday i was not here in the office / at my E-mail account;
> > and i haven't read the gcc-mailing list yet. Thus i currently don't
> > know, if this is "urgent" anymore:
> Well, in the meanwhile I could establish beyond any reasonable doubt that it is not
> "simply" a question of partial register stalls, therefore an AMD test (vs any other
> x86 check) is not strictly necessary anymore :-)
Fine, that means, i had only to install the new gcc-302 this weekend ;-)
> Have a look at:
> It is obvious that the innermost loop, that over i, the most important one, is
> *much* smaller and faster for gcc3.0.2 (or 3.1 of just a week ago... :-(
> > > My i686 is in fact a PII: perhaps someone may run "Haney Speed" built with
> > > today's 3.1 on an AMD core to exclude quickly the possibility of another nasty
> > > partial register stall???
> > Hm, up to my knowledge, the AMD Athlon didn't suffer (at least that much)
> > from partial register stalls (??) Athlon suffers from the higher latencies
> > of the VIA (or VIA alike AMD) chipsets (Why is VIA-KT266A faster
> > than KT266 ?) and his small TLB's.
> If you are curious about PRS, I learned all I know about those (prompted by Richard
> Henderson) from the docs available from:
Ah, that's nice. Thank you Paolo. After seeing this performance improvement
of the SIMD/3dnow fftwgel over plain FPU fftw i tried to learn some (more)
C & look what i can do for my incarnation of a fast FFT, since my plain FPU
- and old fashioned loop-based - FFT version is faster than fftw. But
gcc-inline asm still looks to complicated for me (expect for very simple
cases, which you can guess, without proper knowledge), i use the gcc compiled
asm-listings as a starting point, for the implementation of a SIMD FFT. But
without proper knowledge of x86 asm, more than a simple SIMD/3dnow Radix-2 was
to complicated (but this one still outperforms all higher Radix FPU versions),
i.e., not tryed yet.
> > Anyway, currently there is no gcc-3.1 installed on my system(s). Any
> > suggestions for a recommended version, that builds clean on a i686 (binutils
> > 220.127.116.11.xx in /usr/bin/ & 2.11.2 in /usr/local/bin) and a i586-system
> > (binutils 18.104.22.168.xx only at the moment)?
> Recently, I have started keeping my trees updated using CVS, but I'm seeing that a
> snapshot sufficiently recent (which should display clearly the problem) is
> available from gcc.gnu.org:
> Thank you very much Peter!
> Such sudden performance regression make me crazy!
That's another reason, why they call in "snapshot". More important is, that
those issues are detected & removed, i.e., not remain in the releases.
(as those (at least floating point) performance regressions in gcc-3.0
compared to gcc-2.95.x i see. C. Whaley has similar problem with
gcc-3.0 for his ATLAS stuff)