This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] Change swdiv evaluation order (x86), remove pass_convert_to_rsqrt


On Thu, Oct 29, 2009 at 3:22 PM, Dominique Dhumieres <dominiq@lps.ens.fr> wrote:
> Richard,
>
>> But if 2 - b*rcp(b) is rounded to 1 then b * rcp (b) is already
>> within 1 ulp of 1 and thus rcp (b) is within 2 ulp of b which means
>> that the initial approximation of rcp (b) was good enough.
>
> As I said, I am trying to remember what I did almost 2 years ago.
> For sure at that time my interest was about inverses of powers of 2
> (including 1.0) for which it's really too bad to introduce some inaccuracy
> on exact results. What I am sure of is that using (2-b*rcp(b)) prevents
> the NR iteration to converge in some cases (more precisely as soon as
> 2 - b*rcp(b) is rounded to 1.0). As a consequence 1.0/1.0 will never
> give you 1.0, even with an infinite number of NR steps, if you use
> (2-b*rcp(b))*rcp(b) (a=rcp(b) in my dirty formulae) and this has nothing
> to do with the initial approximation of rcp (b).

It should give you a result within the square of the error of the
reciprocal approximation which should be 1 bits less than
the single-precision mantissa or 2 ulps (well, for certain exponents
the error is worse, and the roundings may add up as well - which
was the motivation of our change).

>> ... you essentially run into issues with large exponents of b when ...
>
> I think the right (intel) way to do the NR is:
> (rcp(b)+rcp(b))-rcp(b)*(b*rcp(b)) for which you should not have
> problems with large exponents (from an accuracy point of view
> rcp(b)+rcp(b)*(1-b*rcp(b)) may be better, but increases the latency).

The intel document AP-803 lists both 2 * rcp(b) - (b * rcp(b)) * rcp (b)
and rcp (b) * (2 - b * rcp (b)) as valid implementations, only differing
in the load of the CPU instruction scheduler.

Note that we were talking about the a / b case, not just the
1.0 / b case.  Though I'm not convinced there is a theoretical
difference here.

Richard.

> Dominique
>
> PS. Just to be sure we use the same notations x0=rcp(b)~1/b=1/x,
> NR is x_{n+1}=x_n*(2-b*x_n) for n>=0. x_n=(1+e_n)/b then
> x_{n+1}=(1-e_n^2)/b for infinite accuracy.
>
> If b*rcp(b)<1, it is shifted 2 places to the right before being
> subtracted to 2, so the loss of accuracy may be 1 ulp rather than 1(?).

You get 0.5 ulp maximum error for each intermediate rounding step.
But that wasn't what you were asking?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]