[AArch64] Add more precision choices for the reciprocal square root approximation
Wilco Dijkstra
Wilco.Dijkstra@arm.com
Fri Apr 1 23:08:00 GMT 2016
Evandro Menezes wrote:
>
> I hope that this gets in the ballpark of what's been discussed previously.
Yes that's very close to what I had in mind. A minor issue is that the vector
modes cannot work as they start at MAX_MODE_FLOAT (which is > 32):
+/* Control approximate alternatives to certain FP operators. */
+#define AARCH64_APPROX_MODE(MODE) \
+ ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \
+ ? (1 << ((MODE) - MIN_MODE_FLOAT)) \
+ : (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <= MAX_MODE_VECTOR_FLOAT) \
+ ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT + MAX_MODE_FLOAT + 1)) \
+ : (0))
That should be:
+ ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT + MAX_MODE_FLOAT - MIN_MODE_FLOAT + 1)) \
It would be worth testing all the obvious cases to be sure they work.
Also I don't think it is a good idea to enable all modes on Exynos-M1 and XGene-1 -
I haven't seen any evidence that shows it gives a speedup on real code for all modes
(or at least on a good micro benchmark like the unit vector test I suggested - a simple
throughput test does not count!).
The issue is it hides performance gains from an improved divider/sqrt on new revisions
or microarchitectures. That means you should only enable cases where there is evidence
of a major speedup that cannot be matched by a future improved divider/sqrt.
Wilco
More information about the Gcc-patches
mailing list