This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [AArch64] Emit square root using the Newton series
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: Evandro Menezes <e dot menezes at samsung dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>
- Date: Fri, 11 Mar 2016 01:06:33 +0000
- Subject: Re: [AArch64] Emit square root using the Newton series
- Authentication-results: sourceware.org; auth=none
- Nodisclaimer: True
- References: <AM3PR08MB00886499882773F3C8B9F71D83B30 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com> <011d01d17a26$31b3ade0$951b09a0$ at samsung dot com> <AM3PR08MB0088D558E387C1B736785AA883B40 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com> <56E1A7AD dot 90408 at samsung dot com> <AM3PR08MB00886A32AE872304290F0B1483B40 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com>,<56E1F1FB dot 6070405 at samsung dot com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:23
Evandro Menezes <e.menezes@samsung.com> wrote:
>
> That's what I had in mind too, but around the approximation for x^-1/2
> and using masks for vector cases thusly:
>
> fcmne v3.4s, v0.4s, #0.0
> frsqrte v1.4s, v0.4s
> fmul v2.4s, v1.4s, v1.4s
> frsqrts v2.4s, v0.4s, v2.4s
> fmul v1.4s, v1.4s, v2.4s
> fmul v2.4s, v1.4s, v1.4s
> frsqrts v2.4s, v0.4s, v2.4s
> fmul v1.4s, v1.4s, v2.4s
> and v1.4s, v3.4s
> fmul v0.4s, v1.4s, v0.4s
That's possible but the overall latency is higher - according to exynos-1.md the
above takes 44 cycles while my version would be 37.
Wilco