This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: New insns for the s390 backend (2)
- From: Bradley Lucier <lucier at math dot purdue dot edu>
- To: Roger Sayle <roger at eyesopen dot com>
- Cc: Bradley Lucier <lucier at math dot purdue dot edu>, gcc-patches at gcc dot gnu dot org
- Date: Wed, 27 Aug 2003 13:51:02 -0500
- Subject: Re: New insns for the s390 backend (2)
On Wednesday, August 27, 2003, at 01:19 PM, Roger Sayle wrote:
On Wed, 27 Aug 2003, Bradley Lucier wrote:
On rs6000 fmadd is enabled by default. I have no opinion about this.
Enabling fmadd is compatible with IEEE 754 arithmetic, since IEEE 754
does not say what happens you have operations involving more than two
operands.
If fmadd where being added as target intrinsic such as __builtin_fmadd
I couldn't agree. However, as generic RTL patterns in the .md files
these will be used whenever CSE, GCSE or combine tries mushing two
expressions into one. i.e. if the programmer writes:
x = a * b;
y = x + c;
IEEE 754 has its opinions on rounding, but GCC will still manage to
use fmadd. Perhaps a safer approach is a __builtin_fmadd (perhaps
even on all platforms using a default mult-then-add implementation
when a special isnsn isn't available), and then guard the implicit
use of fmadd with flag_unsafe_math_optimization.
I'm sorry, I don't understand this. My understanding is that the
rs6000 port will *not* use fmadd if you give the -mno-fused-madd flag.
So there is now a way to "guard the implicit use of fmadd". Yes, it
would be nice to have an intrinsic __builtin_fmadd for some purposes.
In other parts of the code, using FFT for fast bignum arithmetic for
example, I cannot tolerate flag_unsafe_math_optimizations (since that
may introduce too large errors in the precomputed sine and cosine
tables), but fmadd is just fine, since it actually reduces the errors
when used. There is a beautiful paper by Colin Percival that gives
relatively sharp global error bounds for FFTs, and the hypotheses are
satisfied with our without fmadd, but the hypotheses are not satisfied
if the sine/cosine tables are not accurate enough.
Ok, I'll read up some more on FFTs for bignum arithmetic. It was my
understanding that this was one area where fair amount of inaccuracy
was tolerated, but I guess the problem then becomes how much you can
get away with?
Well, you would like to use the floating-point fft code for as large
integers as possible, so you need to have good error bounds for the
fft; here are comments from my bignum fft code that refer to the paper
by Percival:
#|
;; see the wonderful paper
;; "Rapid multiplication modulo the sum and difference of highly
;; composite numbers" by Colin Percival, electronically published
;; by Mathematics of Computation, number S 0025-5718(02)01419-9, URL
;; http://www.ams.org/journal-getitem?pii=S0025-5718-02-01419-9
;; that gives these very nice error bounds. This should be published
;; in the paper journal sometime after March 2002.
(define epsilon (expt 2. -53))
(define bigepsilon (* epsilon (sqrt 5)))
(define n 27)
(define beta (expt 2. -53)) ;; accuracy of trigonometric inputs
(define l 8)
(define norm-x (sqrt (* (expt 2 n) (* (expt 2 l)
(expt 2 l))))) ;; overestimate by
0.5%
(define norm-y norm-x)
(define error (* norm-x
norm-y
;; the following three lines use the slight
overestimate that
;; ln(1+epsilon) = epsilon, etc.
;; there are more accurate ways to calculate this, but
we
;; don't really need them.
(- (* (exp (* 3 n epsilon))
(exp (* (+ (* 3 n) 1) bigepsilon))
(exp (* beta 3 n)))
1)))
(pp error)
;; error is .33593750000000006 < 1/2
;; so if we wanted to, we could have a result of 2^27 8-bit digits
output,
;; i.e., 2^26 8-bit digits input or 536870912 bits input.
;; But we don't have great control over the accuracy of the
trigonometric
;; inputs, so let's see what happens if we quadruple the bound on beta
and
;; reduce n to 26
(define n 26)
(define norm-x (sqrt (* (expt 2 n) (* (expt 2 l)
(expt 2 l)))))
(define norm-y norm-x)
(define beta (expt 2. -51))
(define error (* norm-x
norm-y
(- (* (exp (* 3 n epsilon))
(exp (* (+ (* 3 n) 1) bigepsilon))
(exp (* beta 3 n)))
1)))
(pp error)
;; error bound is .2763671875 < 1/2
;; so it's no problem to have 2^26 8-bit digits output, or 2^25
eight-bit digits
;; input, or 268435456 bits input. This requires 2*2^26 64-bit doubles,
;; or 2^30 bytes or 1 Gbyte to do the multiplication.
;; Because the fft algorithm as written requires temporary storage up to
;; sixteen times the size of the final result, people working with large
;; integers but barely enough memory on 64-bit machines may wish to
;; set! ##bignum.fft-mul-max-width to a slightly smaller value so that
;; karatsuba will kick in earlier for the largest integers.
|#
Would -fno-builtin-cos and -fno-builtin-sin or the
use of -mno-fancy-math-387 help with your table construction?
Arggh, this is stretching my brain. In general, it would be nice to
have -funsafe-math-optimizations split into smaller flags, but where to
put the lines? I don't know.