11706 vs ([lno] Canonical iv creation)

Sat Mar 13 14:02:00 GMT 2004

On Sat, 13 Mar 2004, Richard Guenther wrote:
> > Really interesting issue...
> >
> > Anyway, from a practical point of view I'm not really worried, since:
> >
> > 1- Glibc uses internally the very same algorithm, at least on x86 (just
> >    checked).
> >
> > 2- Which requires O(log n) multiplications and is discussed by Knuth in
> >    Sec. 4.6.3 of the second volume. You can find it almost *everywhere*,
> >    for instance also in SGI's power.
>
> So there is no point in not optimizing ::pow(x, 4) without -ffast-math?

There is a potential accuracy problem, which is why GCC doesn't inline
::pow(x,4) without -ffast-math.

To demonstrate via an example, consider a decimal floating point format
whose mantissa is only a single digit.  The correct, perfectly rounded
result for pow(4,4) is 3e2, as 256 correctly rounded to nearest is 300.
However, x*x produces the result 2e1 (as 16 rounds to 20), and this value
squared results in 4e2.  [Interestingly, for this example (((x*x)*x)*x)
with three multiplications/roundings produces a better result for this
particular example, but the general case error analysis is beyond me]

Basically, "pow" can only be guaranteed to be rounded perfectly when
implemented using multiplications, if only a single multiplication
(i.e. rounding step) is required.  Hence exponents -2, -1, 0, 1 and 2
of ::pow are the only ones optimized without -ffast-math.

The great philosophical issue is GCC's decision that if it can't produce
a perfectly rounded 0.5 ulp result itself it should call the system
library, which theoretically may implement a more accurate algorithm.
Often in practice, however, the system library isn't perfect and uses
the same flawed method that would be used by -ffast-math.  A good example
are "sin" and "cos" on x86, where GCC knows that using the x87's fsin
and fcos instructions produces significant errors for some arguments,
so call the libm versions rather than use these opcode inline.  Most
system libraries [glibc, libc, newlib, msvcrt etc...] favour speed over
accuracy and therefore use the same instructions that GCC was concerned
about!

As for implementing pow(3) using the binary method for constructing
addition chains *everywhere*, these actually require more multiplications
that the method GCC uses to inline exponentiation by a constant, see:
http://gcc.gnu.org/ml/gcc-patches/2003-06/msg02447.html.

Compare std::pow(x,15) which uses 6 multiplications with ::pow(x,15)
which uses only 5.

The binary method is a reasonable choice where the exponent isn't known
ahead of time (for small calculations), but GCC has the compile-time
luxury for constant powers or choosing a more efficient addition chain.
A quick peek at OpenSSL and similar cryptographic software, for example,
shows that raising very large numbers to large (almost) prime exponents
almost never use the binary method.  A 1024-bit RSA encryption/decryption
exponentiation can typically be done in far fewer than the 1536
multiplications required on average by the binary method.  Indeed the
section of Knuth quoted above describes the NP-complete addition chain
problem, and the "obvious" non-optimality of the binary method.

Whether libstdc++-v3 can be more pragmatic than GCC about system library
accuracy, and even whether -ffast-math is a reasonable default for the
compiler (as its is with several commercial compilers), are larger
arguments.  I doubt anyone would practically use a method more accurate
than two multiplications to implement pow(x,4).  So I agree with Paolo's
disclaimer "from a practical point of view I'm not really worried.." :>

I hope this helps,

Roger
--