[PATCH] Practical Improvement to libgcc Complex Divide

Wed Sep 9 19:49:30 GMT 2020

On 9/9/2020 2:13 AM, Richard Biener wrote:
> Thanks for working on this.  Speaking about performance and
> accuracy I spot a few opportunities to use FMAs [and eventually
> vectorization] - do FMAs change anything on the accuracy analysis
> (is there the chance they'd make it worse?).  We might want to use
> IFUNCs in libgcc to version for ISA variants (with/without FMA)?
>
> Thanks,
> Richard.

Richard, Thank you for bringing up the issue of fused multiply-add
(fma).  All the results I presented in the latest patch were measured
with fma active.  That's because in my early testing I ran experiments
and found that fma was consistently more accurate than no fma.

In response to your query, I repeated that set of tests on my
final submission and present them in the following table.

Number of results out of 10 million with greater than
or equal to the listed number of bits in error.

              full range       limited exponents
           no fma  with fma    no fma  with fma
  1 bits=   20088   16664       34479   24707
  2 bits=    1110     900        2359    1762
  3 bits=     518     440        1163     882
  4 bits=     197     143         612     445
  5 bits=     102      72         313     232
  6 bits=      49      43         170     119
  7 bits=      25      21          82      49
  8 bits=      16      11          33      26
  9 bits=       9       5          14      14
10 bits=       3       3           8       4
11 bits=       2       2           3       2
12 bits=       1       1           0       2
No differences for 13 or greater bits.

Errors for both cases drop off rapidly as we increase the
number of bits required for a result to be considered
an error.

While using fma shows a consistent advantage in fewer errors,
there are cases were no fma gives a more accurate answer.
A detailed examination of the full range/7 bit case
which is listed as having 25 errors greater than 7 bits
for "no fma" and 21 errors greater than 7 bits for "fma"

In that test,
  1 case had the same size error for both
  8 cases had a larger error with no fma
21 cases had a larger error with fma.

Further examination showed those differences were generally less than
two bits. That summary makes clear that while using fma does not always
give a better answer, it occasionally provides a slight improvement.
Even in the limited exponent case where 1 bit
difference is counted as an error, using fma or not using
fma only shows a different result about 1 time in 1000.
While interesting, that size and frequency of difference
is not enough to support having two versions the library
routine in my mind.

I'd rather put further effort into improving the accuracy or performance
of other libgcc/glibc math functions. Just as one example, Paul 
Zimmerman's work shows
opportunities. For more details, see:
https://urldefense.com/v3/__https://members.loria.fr/PZimmermann/papers/accuracy.pdf__;!!GqivPVa7Brio!PREWxi54-6JnIBbz8jjKEYGoZ3x6Nz5_4dXoalIf8uR1i3NKHHCgdGZJbzEXQmRMrmKmk38$ 

- patrick