optimization/4487: -ffast-math fails to disable gradual underflow on Ultrasparc

Peter van Hoof vanhoof@cita.utoronto.ca
Fri Oct 5 15:46:00 GMT 2001


>Number:         4487
>Category:       optimization
>Synopsis:       -ffast-math fails to disable gradual underflow for Ultrasparc
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    unassigned
>State:          open
>Class:          pessimizes-code
>Submitter-Id:   net
>Arrival-Date:   Fri Oct 05 15:46:01 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator:     Peter van Hoof
>Release:        3.0.1
>Organization:
Canadian Institute for Theoretical Astrophysics
>Environment:
System: SunOS scooby 5.8 Generic_108528-10 sun4u sparc SUNW,Sun-Blade-100
Architecture: sun4

	
host: sparc-sun-solaris2.8
build: sparc-sun-solaris2.8
target: sparc-sun-solaris2.8
configured with: ../gcc-3.0.1/configure --prefix=/opt/local --enable-threads --enable-gcj
>Description:
	Ultrasparc chips do not support gradual underflow in hardware,
	and therefore these instructions need to be emulated in software.
	Since -ffast-math allows deviations from the IEEE-754 standard
	for the sake of increasing performance, it is my opinion that
	-ffast-math should flush denormalized numbers to zero (or at least
	there should be some option for enabling this; to the best of my
	knowledge no such flag exists for Sparc hardware). Needless to
	say that software emulation can lead to substantial performance
	degradation for certain programs. My machine has a 500MHz
	Ultrasparc IIe processor, but I think the problem is the same
	for all v9 hardware.
>How-To-Repeat:
	To illustrate the degradation, here is a little program that
	generates oodles of underflows:
	
	scooby> gcc -O3 -ffast-math test.c -lm
	scooby> time a.out
	16.02u 135.95s 2:35.33 97.8%
	
	The -fast option on the SunWorks compiler does flush denormalized
	numbers to zero. I do not have a SunWorks compiler myself, so I
	used somebody elses (running on a Sun Ultra 1):
	
	chinook> cc -fast test.c -lm
	scooby> time a.out
	0.23u 0.01s 0:00.21 114.2%
	
	There obviously is a dramatic improvement in performance.
	This is test.c:
	
double pow(double,double);

int main()
{
	long i,j;
	double x[5000],y[5000],fac;

	fac = 1.e-305;
	for( i=0; i < 5000; i++ ) {
		x[i] = pow((double)(i+1),5.);
		y[i] = 0.;
	}
	for( j=0; j < 1000; j++ ) {
		for( i=0; i < 5000; i++ ) {
			y[i] += fac/x[i];
		}
	}
}

	
>Fix:
	The SunWorks compiler can obviously work around the problem,
	so there must be a workaround. However, I haven't found it yet.
	If somebody knows how to do it, I would be happy to hear about it!
>Release-Note:
>Audit-Trail:
>Unformatted:



More information about the Gcc-prs mailing list