This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [AArch64] Emit square root using the Newton series
- From: Evandro Menezes <e dot menezes at samsung dot com>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>
- Date: Thu, 10 Mar 2016 16:15:23 -0600
- Subject: Re: [AArch64] Emit square root using the Newton series
- Authentication-results: sourceware.org; auth=none
- References: <AM3PR08MB00886499882773F3C8B9F71D83B30 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com> <011d01d17a26$31b3ade0$951b09a0$ at samsung dot com> <AM3PR08MB0088D558E387C1B736785AA883B40 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com> <56E1A7AD dot 90408 at samsung dot com> <AM3PR08MB00886A32AE872304290F0B1483B40 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com>
On 03/10/16 13:10, Wilco Dijkstra wrote:
frsqrte s1, s0
fmul s2, s1, s1
frsqrts s2, s0, s2
fcmp s0, 0.0
fmul s1, s1, s2
fmul s2, s1, s1
fmul s1, s0, s1
frsqrts s2, s0, s2
fcsel s1, s0, s1, eq
fmul s0, s1, s2
That's what I had in mind too, but around the approximation for x^-1/2
and using masks for vector cases thusly:
fcmne v3.4s, v0.4s, #0.0
frsqrte v1.4s, v0.4s
fmul v2.4s, v1.4s, v1.4s
frsqrts v2.4s, v0.4s, v2.4s
fmul v1.4s, v1.4s, v2.4s
fmul v2.4s, v1.4s, v1.4s
frsqrts v2.4s, v0.4s, v2.4s
fmul v1.4s, v1.4s, v2.4s
and v1.4s, v3.4s
fmul v0.4s, v1.4s, v0.4s
Thanks,
--
Evandro Menezes