This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Info], Add suport for PowerPC IEEE 128-bit floating point
- From: Michael Meissner <meissner at linux dot vnet dot ibm dot com>
- To: gcc-patches at gcc dot gnu dot org, dje dot gcc at gmail dot com
- Date: Tue, 15 Jul 2014 17:20:31 -0400
- Subject: Re: [Info], Add suport for PowerPC IEEE 128-bit floating point
- Authentication-results: sourceware.org; auth=none
- References: <20140715182354 dot GA21906 at ibm-tiger dot the-meissners dot org>
I did some timing tests to compare the new PowerPC IEEE 128-bit results to the
current implementation of long double using the IBM extended format.
The test consisted a short loop doing the operation over arrays of 1,024
elements, reading in two values, doing the operation, and then storing it back.
This loop in turn was done multiple times, with the idea that most of the
values would be in the cache, and we didn't have to worry about pre-fetching,
etc.
The float, double tests were done with vectorization disabled, while the vector
float and vector double tests, the compiler was allowed to do the normal auto
vectorization.
The number reported was how much longer the second column took over the first:
Generally, the __float128 is 2x slower than the current IBM extended double
format, except for divide, where it is 5x slower. I must say, the software
floating point emulation routines worked well, and once the proper macros were
setup, I only needed to override the type used for IEEE 128-bit.
Add loop
========
float vs double: 2.00x
float vs vector float: 4.97x
double vs vector double: 2.63x
long double vs double: 16.85x
__float128 vs double: 23.34x
__float128 vs long double: 1.39x
Subtract loop
=============
float vs double: 1.99x
float vs vector float: 4.66x
double vs vector double: 2.63x
long double vs double: 14.47x
__float128 vs double: 27.65x
__float128 vs long double: 1.91x
Multiply loop
=============
float vs double: 2.05x
float vs vector float: 5.18x
double vs vector double: 2.59x
long double vs double: 11.58x
__float128 vs double: 27.44x
__float128 vs long double: 2.37x
Divide loop
===========
float vs double: 0.82x
float vs vector float: 2.11x
double vs vector double: 2.00x
long double vs double: 5.90x
__float128 vs double: 34.57x
__float128 vs long double: 5.86x
Maximum via comparison and ?:
=============================
float vs double: 1.74x
float vs vector float: 4.62x
double vs vector double: 2.62x
long double vs double: 5.07x
__float128 vs double: 18.02x
__float128 vs long double: 3.55x
Minimum via comparison and ?:
=============================
float vs double: 1.74x
float vs vector float: 4.52x
double vs vector double: 2.62x
long double vs double: 5.38x
__float128 vs double: 15.14x
__float128 vs long double: 2.82x
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797