This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [ARM] implement division using vrecpe/vrecps with -funsafe-math-optimizations


On 31 July 2015 at 10:34, Ramana Radhakrishnan
<ramana.radhakrishnan@foss.arm.com> wrote:
> I've tried this in the past and never been convinced that 2 iterations are enough to get to stability with this given that the results are only precise for 8 bits / iteration. Thus I've always believed you need 3 iterations rather than 2 at which point I've never been sure that it's worth it. So the testing that you've done with this currently is not enough for this to go into the tree.

My understanding is that 2 iterations is sufficient for single
precision floating point (although not for double precision), because
each iteration of Newton-Raphson doubles the number of bits of
accuracy.

I haven't worked through the maths myself, but
    https://en.wikipedia.org/wiki/Division_algorithm#Newton.E2.80.93Raphson_division
says
    "This squaring of the error at each iteration step â the so-called
    quadratic convergence of NewtonâRaphson's method â has the
    effect that the number of correct digits in the result roughly
    doubles for every iteration, a property that becomes extremely
    valuable when the numbers involved have many digits"

Therefore:
vrecpe -> 8 bits of accuracy
+1 iteration -> 16 bits of accuracy
+2 iterations -> 32 bits of accuracy (but in reality limited to
precision of 32bit float)

Since 32 bits is much more accuracy than the 24 bits of precision in a
single precision FP value, 2 iterations should be sufficient.

> I'd like this to be tested on a couple of different AArch32 implementations with a wider range of inputs to verify that the results are acceptable as well as running something like SPEC2k(6) with atleast one iteration to ensure correctness.

I can't argue with confirming theory matches practice :)

Some corner cases (eg numbers around FLT_MAX, FLT_MIN etc) may result
in denormals or out of range values during the reciprocal calculation
which could result in answers which are less accurate than the typical
case but I think that is acceptable with -ffast-math.

Charles


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]