This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Performance measurements (thanks and conclusion)
- To: axp-list at redhat dot com
- Subject: Re: Performance measurements (thanks and conclusion)
- From: Martin Kahlert <martin dot kahlert at mchp dot siemens dot de>
- Date: Thu, 25 Jun 1998 09:46:18 +0200
- Cc: scr at iis dot fhg dot de, robert at physiol dot med dot tu-muenchen dot de, egcs at cygnus dot com
- References: <199806240851.KAA06049@keksy.mchp.siemens.de> <3590D5AE.167EB0E7@iis.fhg.de> <19980624124843.A15248@keksy.mchp.siemens.de> <3591031A.2781E494@iis.fhg.de> <19980624170051.21290@haegar.physiol.med.tu-muenchen.de>
- Reply-To: martin dot kahlert at mchp dot siemens dot de
Quoting Robert Wilhelm (robert@physiol.med.tu-muenchen.de):
> > [Robert: could you please compile my code on your Alpha using
> > egcs and report your results to me]
>
> I get about 275 MFLOPS for my 533MHz 21164a for both egcs 1.0 and
> egcs-current with haifa enabled.
>
> If I use different local variables lfA*, egcs seems to shedule a bit
> better and I get 290 MFLOPS.
>
> Robert
I was really overwhelmed with the repsonse to this thread on
axp-list. Thanks a lot for all people who tried my source
and even tried to get more out of the compilers.
I tried both versions on my PPro 200:
Stefan Schroepfer's version:
pgcc:
85.98 MFLOPS
gcc-2.7.2.1:
97.10 MFLOPS
gcc-without double align:
95.46 MFLOPS
egcs-2.91.42:
84.06 MFLOPS
tcc:
17.20 MFLOPS
Robert Wilhelm's version:
pgcc:
81.62 MFLOPS
gcc-2.7.2.1:
98.81 MFLOPS
gcc-without double align:
98.81 MFLOPS
egcs-2.91.42:
83.44 MFLOPS
tcc:
16.44 MFLOPS
It seems that tcc is not the fastest and the most reliable
under the sun...
Can i conclude, that it's a good idea to insert as many local
vars as possible to get good results from compilers?
Now i have two questions:
-Why is it so difficult for gcc to transform the code
for(i=0;i<n;i++)
result[i]=a[i]+2*a[i+1]+3*a[i+2];
into something like
_tmp0=a[0];_tmp1=a[1];_tmp2=a[2];
for(i=0;i<n;i++)
{
result[i]=_tmp0+2*_tmp1+3*_tmp2;
_tmp0=_tmp1;
_tmp1=_tmp2;
_tmp2=a[i+2];
}
for itself? I think, especially in Fortran such things are a
common task.
-What's the reason for the performace loss between gcc-2.7.2.1
and egcs-2.91.42 - it's nearly 20%, that gcc-2.7.2.1 is better?
Thanks a lot,
Martin.