optimization/4487: -ffast-math fails to disable gradual underflow on Ultrasparc
Peter van Hoof
vanhoof@cita.utoronto.ca
Fri Oct 5 15:46:00 GMT 2001
>Number: 4487
>Category: optimization
>Synopsis: -ffast-math fails to disable gradual underflow for Ultrasparc
>Confidential: no
>Severity: serious
>Priority: low
>Responsible: unassigned
>State: open
>Class: pessimizes-code
>Submitter-Id: net
>Arrival-Date: Fri Oct 05 15:46:01 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator: Peter van Hoof
>Release: 3.0.1
>Organization:
Canadian Institute for Theoretical Astrophysics
>Environment:
System: SunOS scooby 5.8 Generic_108528-10 sun4u sparc SUNW,Sun-Blade-100
Architecture: sun4
host: sparc-sun-solaris2.8
build: sparc-sun-solaris2.8
target: sparc-sun-solaris2.8
configured with: ../gcc-3.0.1/configure --prefix=/opt/local --enable-threads --enable-gcj
>Description:
Ultrasparc chips do not support gradual underflow in hardware,
and therefore these instructions need to be emulated in software.
Since -ffast-math allows deviations from the IEEE-754 standard
for the sake of increasing performance, it is my opinion that
-ffast-math should flush denormalized numbers to zero (or at least
there should be some option for enabling this; to the best of my
knowledge no such flag exists for Sparc hardware). Needless to
say that software emulation can lead to substantial performance
degradation for certain programs. My machine has a 500MHz
Ultrasparc IIe processor, but I think the problem is the same
for all v9 hardware.
>How-To-Repeat:
To illustrate the degradation, here is a little program that
generates oodles of underflows:
scooby> gcc -O3 -ffast-math test.c -lm
scooby> time a.out
16.02u 135.95s 2:35.33 97.8%
The -fast option on the SunWorks compiler does flush denormalized
numbers to zero. I do not have a SunWorks compiler myself, so I
used somebody elses (running on a Sun Ultra 1):
chinook> cc -fast test.c -lm
scooby> time a.out
0.23u 0.01s 0:00.21 114.2%
There obviously is a dramatic improvement in performance.
This is test.c:
double pow(double,double);
int main()
{
long i,j;
double x[5000],y[5000],fac;
fac = 1.e-305;
for( i=0; i < 5000; i++ ) {
x[i] = pow((double)(i+1),5.);
y[i] = 0.;
}
for( j=0; j < 1000; j++ ) {
for( i=0; i < 5000; i++ ) {
y[i] += fac/x[i];
}
}
}
>Fix:
The SunWorks compiler can obviously work around the problem,
so there must be a workaround. However, I haven't found it yet.
If somebody knows how to do it, I would be happy to hear about it!
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the Gcc-prs
mailing list