This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
RE: [AArch64] Add more precision choices for the reciprocal square root approximation
- From: Evandro Menezes <e dot menezes at samsung dot com>
- To: 'Wilco Dijkstra' <Wilco dot Dijkstra at arm dot com>, 'GCC Patches' <gcc-patches at gcc dot gnu dot org>
- Cc: 'James Greenhalgh' <James dot Greenhalgh at arm dot com>, 'Andrew Pinski' <pinskia at gmail dot com>, 'nd' <nd at arm dot com>
- Date: Thu, 21 Apr 2016 13:39:49 -0500
- Subject: RE: [AArch64] Add more precision choices for the reciprocal square root approximation
- Authentication-results: sourceware.org; auth=none
- References: <56EB2BDC dot 30209 at samsung dot com> <AM3PR08MB00883C48B491A1BA92CD0783838C0 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com> <56EC2A91 dot 2030604 at samsung dot com> <AM3PR08MB0088D90F31B84E852FF3100C838C0 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com> <56EC8870 dot 1030108 at samsung dot com> <56FDA338 dot 4050108 at samsung dot com> <AM3PR08MB00889651F672A4F0157BDE17839A0 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com> <56FE8B0B dot 1060303 at samsung dot com> <56FECE90 dot 9 at samsung dot com> <AM3PR08MB008867649DBF969AADABAD03839A0 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com> <57029297 dot 2050908 at samsung dot com> <570D3B6E dot 5090600 at samsung dot com>
> On 04/04/16 11:13, Evandro Menezes wrote:
> > On 04/01/16 18:08, Wilco Dijkstra wrote:
> >> Evandro Menezes wrote:
> >>> I hope that this gets in the ballpark of what's been discussed
> >>> previously.
> >> Yes that's very close to what I had in mind. A minor issue is that
> >> the vector modes cannot work as they start at MAX_MODE_FLOAT (which
> >> is > 32):
> >>
> >> +/* Control approximate alternatives to certain FP operators. */
> >> +#define AARCH64_APPROX_MODE(MODE) \
> >> + ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \
> >> + ? (1 << ((MODE) - MIN_MODE_FLOAT)) \
> >> + : (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <=
> >> MAX_MODE_VECTOR_FLOAT) \
> >> + ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT + MAX_MODE_FLOAT + 1)) \
> >> + : (0))
> >>
> >> That should be:
> >>
> >> + ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT + MAX_MODE_FLOAT -
> >> MIN_MODE_FLOAT + 1)) \
> >>
> >> It would be worth testing all the obvious cases to be sure they work.
> >>
> >> Also I don't think it is a good idea to enable all modes on Exynos-M1
> >> and XGene-1 - I haven't seen any evidence that shows it gives a
> >> speedup on real code for all modes (or at least on a good micro
> >> benchmark like the unit vector test I suggested - a simple throughput
> >> test does not count!).
> >
> > This approximation does benefit M1 in general across several
> > benchmarks. As for my choice for Xgene1, it preserves the original
> > setting. I believe that, with this more granular option, developers
> > can fine tune their targets.
> >
> >> The issue is it hides performance gains from an improved divider/sqrt
> >> on new revisions or microarchitectures. That means you should only
> >> enable cases where there is evidence of a major speedup that cannot
> >> be matched by a future improved divider/sqrt.
> >
> > I did notice that some benchmarks with heavy use of multiplication or
> > multiply-accumulation, the series may be detrimental, since it
> > increases the competition for the unit(s) that do(es) such operations.
> >
> > But those micro-architectures that get a better unit for division or
> > sqrt() are free to add their own tuning parameters. Granted, I assume
> > that running legacy code is not much of an issue only in a few markets.
>
> Ping^1
Ping^2
--
Evandro Menezes Austin, TX