New insns for the s390 backend (2)

Wed Aug 27 18:51:00 GMT 2003

On Wednesday, August 27, 2003, at 01:19  PM, Roger Sayle wrote:

> On Wed, 27 Aug 2003, Bradley Lucier wrote:
>> On rs6000 fmadd is enabled by default.  I have no opinion about this.
>> Enabling fmadd is compatible with IEEE 754 arithmetic, since IEEE 754
>> does not say what happens you have operations involving more than two
>> operands.
>
> If fmadd where being added as target intrinsic such as __builtin_fmadd
> I couldn't agree.  However, as generic RTL patterns in the .md files
> these will be used whenever CSE, GCSE or combine tries mushing two
> expressions into one.  i.e. if the programmer writes:
>
> 	x = a * b;
> 	y = x + c;
>
> IEEE 754 has its opinions on rounding, but GCC will still manage to
> use fmadd.  Perhaps a safer approach is a __builtin_fmadd (perhaps
> even on all platforms using a default mult-then-add implementation
> when a special isnsn isn't available), and then guard the implicit
> use of fmadd with flag_unsafe_math_optimization.

I'm sorry, I don't understand this.  My understanding is that the 
rs6000 port will *not* use fmadd if you give the -mno-fused-madd flag.  
So there is now a way to "guard the implicit use of fmadd".  Yes, it 
would be nice to have an intrinsic __builtin_fmadd for some purposes.

>> In other parts of the code, using FFT for fast bignum arithmetic for
>> example, I cannot tolerate flag_unsafe_math_optimizations (since that
>> may introduce too large errors in the precomputed sine and cosine
>> tables), but fmadd is just fine, since it actually reduces the errors
>> when used.  There is a beautiful paper by Colin Percival that gives
>> relatively sharp global error bounds for FFTs, and the hypotheses are
>> satisfied with our without fmadd, but the hypotheses are not satisfied
>> if the sine/cosine tables are not accurate enough.
>
> Ok, I'll read up some more on FFTs for bignum arithmetic.  It was my
> understanding that this was one area where fair amount of inaccuracy
> was tolerated, but I guess the problem then becomes how much you can
> get away with?

Well, you would like to use the floating-point fft code for as large 
integers as possible, so you need to have good error bounds for the 
fft; here are comments from my bignum fft code that refer to the paper 
by Percival:

#|
;; see the wonderful paper
;; "Rapid multiplication modulo the sum and difference of highly
;; composite numbers" by Colin Percival, electronically published
;; by Mathematics of Computation, number S 0025-5718(02)01419-9, URL
;; http://www.ams.org/journal-getitem?pii=S0025-5718-02-01419-9
;; that gives these very nice error bounds.  This should be published
;; in the paper journal sometime after March 2002.

(define epsilon (expt 2. -53))
(define bigepsilon (* epsilon (sqrt 5)))
(define n 27)
(define beta (expt 2. -53))      ;; accuracy of trigonometric inputs
(define l 8)
(define norm-x (sqrt (* (expt 2 n) (* (expt 2 l)
                                       (expt 2 l))))) ;; overestimate by 
0.5%
(define norm-y norm-x)
(define error (* norm-x
                  norm-y
                  ;; the following three lines use the slight 
overestimate that
                  ;; ln(1+epsilon) = epsilon, etc.
                  ;; there are more accurate ways to calculate this, but 
we
                  ;; don't really need them.
                  (- (* (exp (* 3 n epsilon))
                        (exp (* (+ (* 3 n) 1) bigepsilon))
                        (exp (* beta 3 n)))
                     1)))
(pp error)
;; error is .33593750000000006 < 1/2
;; so if we wanted to, we could have a result of 2^27 8-bit digits 
output,
;; i.e., 2^26 8-bit digits input or 536870912 bits input.

;; But we don't have great control over the accuracy of the 
trigonometric
;; inputs, so let's see what happens if we quadruple the bound on beta 
and
;; reduce n to 26

(define n 26)
(define norm-x (sqrt (* (expt 2 n) (* (expt 2 l)
                                       (expt 2 l)))))
(define norm-y norm-x)

(define beta (expt 2. -51))
(define error (* norm-x
                  norm-y
                  (- (* (exp (* 3 n epsilon))
                        (exp (* (+ (* 3 n) 1) bigepsilon))
                        (exp (* beta 3 n)))
                     1)))

(pp error)

;; error bound is .2763671875 < 1/2
;; so it's no problem to have 2^26 8-bit digits output, or 2^25 
eight-bit digits
;; input, or 268435456 bits input.  This requires 2*2^26 64-bit doubles,
;; or 2^30 bytes or 1 Gbyte to do the multiplication.

;; Because the fft algorithm as written requires temporary storage up to
;; sixteen times the size of the final result, people working with large
;; integers but barely enough memory on 64-bit machines may wish to
;; set! ##bignum.fft-mul-max-width to a slightly smaller value so that
;; karatsuba will kick in earlier for the largest integers.

|#

> Would -fno-builtin-cos and -fno-builtin-sin or the
> use of -mno-fancy-math-387 help with your table construction?

Arggh, this is stretching my brain.  In general, it would be nice to 
have -funsafe-math-optimizations split into smaller flags, but where to 
put the lines?  I don't know.