New insns for the s390 backend (2)
Bradley Lucier
lucier@math.purdue.edu
Wed Aug 27 18:51:00 GMT 2003
On Wednesday, August 27, 2003, at 01:19 PM, Roger Sayle wrote:
> On Wed, 27 Aug 2003, Bradley Lucier wrote:
>> On rs6000 fmadd is enabled by default. I have no opinion about this.
>> Enabling fmadd is compatible with IEEE 754 arithmetic, since IEEE 754
>> does not say what happens you have operations involving more than two
>> operands.
>
> If fmadd where being added as target intrinsic such as __builtin_fmadd
> I couldn't agree. However, as generic RTL patterns in the .md files
> these will be used whenever CSE, GCSE or combine tries mushing two
> expressions into one. i.e. if the programmer writes:
>
> x = a * b;
> y = x + c;
>
> IEEE 754 has its opinions on rounding, but GCC will still manage to
> use fmadd. Perhaps a safer approach is a __builtin_fmadd (perhaps
> even on all platforms using a default mult-then-add implementation
> when a special isnsn isn't available), and then guard the implicit
> use of fmadd with flag_unsafe_math_optimization.
I'm sorry, I don't understand this. My understanding is that the
rs6000 port will *not* use fmadd if you give the -mno-fused-madd flag.
So there is now a way to "guard the implicit use of fmadd". Yes, it
would be nice to have an intrinsic __builtin_fmadd for some purposes.
>> In other parts of the code, using FFT for fast bignum arithmetic for
>> example, I cannot tolerate flag_unsafe_math_optimizations (since that
>> may introduce too large errors in the precomputed sine and cosine
>> tables), but fmadd is just fine, since it actually reduces the errors
>> when used. There is a beautiful paper by Colin Percival that gives
>> relatively sharp global error bounds for FFTs, and the hypotheses are
>> satisfied with our without fmadd, but the hypotheses are not satisfied
>> if the sine/cosine tables are not accurate enough.
>
> Ok, I'll read up some more on FFTs for bignum arithmetic. It was my
> understanding that this was one area where fair amount of inaccuracy
> was tolerated, but I guess the problem then becomes how much you can
> get away with?
Well, you would like to use the floating-point fft code for as large
integers as possible, so you need to have good error bounds for the
fft; here are comments from my bignum fft code that refer to the paper
by Percival:
#|
;; see the wonderful paper
;; "Rapid multiplication modulo the sum and difference of highly
;; composite numbers" by Colin Percival, electronically published
;; by Mathematics of Computation, number S 0025-5718(02)01419-9, URL
;; http://www.ams.org/journal-getitem?pii=S0025-5718-02-01419-9
;; that gives these very nice error bounds. This should be published
;; in the paper journal sometime after March 2002.
(define epsilon (expt 2. -53))
(define bigepsilon (* epsilon (sqrt 5)))
(define n 27)
(define beta (expt 2. -53)) ;; accuracy of trigonometric inputs
(define l 8)
(define norm-x (sqrt (* (expt 2 n) (* (expt 2 l)
(expt 2 l))))) ;; overestimate by
0.5%
(define norm-y norm-x)
(define error (* norm-x
norm-y
;; the following three lines use the slight
overestimate that
;; ln(1+epsilon) = epsilon, etc.
;; there are more accurate ways to calculate this, but
we
;; don't really need them.
(- (* (exp (* 3 n epsilon))
(exp (* (+ (* 3 n) 1) bigepsilon))
(exp (* beta 3 n)))
1)))
(pp error)
;; error is .33593750000000006 < 1/2
;; so if we wanted to, we could have a result of 2^27 8-bit digits
output,
;; i.e., 2^26 8-bit digits input or 536870912 bits input.
;; But we don't have great control over the accuracy of the
trigonometric
;; inputs, so let's see what happens if we quadruple the bound on beta
and
;; reduce n to 26
(define n 26)
(define norm-x (sqrt (* (expt 2 n) (* (expt 2 l)
(expt 2 l)))))
(define norm-y norm-x)
(define beta (expt 2. -51))
(define error (* norm-x
norm-y
(- (* (exp (* 3 n epsilon))
(exp (* (+ (* 3 n) 1) bigepsilon))
(exp (* beta 3 n)))
1)))
(pp error)
;; error bound is .2763671875 < 1/2
;; so it's no problem to have 2^26 8-bit digits output, or 2^25
eight-bit digits
;; input, or 268435456 bits input. This requires 2*2^26 64-bit doubles,
;; or 2^30 bytes or 1 Gbyte to do the multiplication.
;; Because the fft algorithm as written requires temporary storage up to
;; sixteen times the size of the final result, people working with large
;; integers but barely enough memory on 64-bit machines may wish to
;; set! ##bignum.fft-mul-max-width to a slightly smaller value so that
;; karatsuba will kick in earlier for the largest integers.
|#
> Would -fno-builtin-cos and -fno-builtin-sin or the
> use of -mno-fancy-math-387 help with your table construction?
Arggh, this is stretching my brain. In general, it would be nice to
have -funsafe-math-optimizations split into smaller flags, but where to
put the lines? I don't know.
More information about the Gcc-patches
mailing list