This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [AArch64] Add more precision choices for the reciprocal square root approximation

From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
To: Evandro Menezes <e dot menezes at samsung dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
Cc: James Greenhalgh <James dot Greenhalgh at arm dot com>, Andrew Pinski <pinskia at gmail dot com>, nd <nd at arm dot com>
Date: Fri, 1 Apr 2016 23:08:37 +0000
Subject: Re: [AArch64] Add more precision choices for the reciprocal square root approximation
Authentication-results: sourceware.org; auth=none
Nodisclaimer: True
References: <56EB2BDC dot 30209 at samsung dot com> <AM3PR08MB00883C48B491A1BA92CD0783838C0 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com> <56EC2A91 dot 2030604 at samsung dot com> <AM3PR08MB0088D90F31B84E852FF3100C838C0 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com> <56EC8870 dot 1030108 at samsung dot com> <56FDA338 dot 4050108 at samsung dot com> <AM3PR08MB00889651F672A4F0157BDE17839A0 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com> <56FE8B0B dot 1060303 at samsung dot com>,<56FECE90 dot 9 at samsung dot com>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:23

Evandro Menezes wrote:
>
> I hope that this gets in the ballpark of what's been discussed previously.

Yes that's very close to what I had in mind. A minor issue is that the vector 
modes cannot work as they start at MAX_MODE_FLOAT (which is > 32):

+/* Control approximate alternatives to certain FP operators.  */
+#define AARCH64_APPROX_MODE(MODE) \
+  ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \
+   ? (1 << ((MODE) - MIN_MODE_FLOAT)) \
+   : (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <= MAX_MODE_VECTOR_FLOAT) \
+     ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT + MAX_MODE_FLOAT + 1)) \
+     : (0))

That should be: 

+     ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT + MAX_MODE_FLOAT - MIN_MODE_FLOAT + 1)) \

It would be worth testing all the obvious cases to be sure they work.

Also I don't think it is a good idea to enable all modes on Exynos-M1 and XGene-1 -
I haven't seen any evidence that shows it gives a speedup on real code for all modes
(or at least on a good micro benchmark like the unit vector test I suggested - a simple
throughput test does not count!).

The issue is it hides performance gains from an improved divider/sqrt on new revisions
or microarchitectures. That means you should only enable cases where there is evidence
of a major speedup that cannot be matched by a future improved divider/sqrt.

Wilco

Follow-Ups:
- Re: [AArch64] Add more precision choices for the reciprocal square root approximation
  - From: Evandro Menezes

References:
- Re: [AArch64] Add precision choices for the reciprocal square root approximation
  - From: Wilco Dijkstra
- Re: [AArch64] Add precision choices for the reciprocal square root approximation
  - From: Evandro Menezes
- Re: [AArch64] Add more precision choices for the reciprocal square root approximation
  - From: Evandro Menezes

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]