This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [AArch64] Add precision choices for the reciprocal square root approximation


On 03/18/16 10:21, Wilco Dijkstra wrote:
Hi Evandro,

For example, though this approximation is improves the performance
noticeably for DF on A57, for SF, not so much, if at all.
I'm still skeptical that you ever can get any gain on scalars. I bet the only gain is on
4x vectorized floats.

I created a simple test that loops around an inline asm version of the Newton series using scalar insns and got these results on A57:

   1/sqrt(x):    18290898/s
   Fast:         45896823/s

   1/sqrtf(x):   69618490/s
   Fast:         61865874/s


So what I would like to see is this implemented in a more general way. We should
be able choose whether to expand depending on the mode - including whether it is
vectorized. For example enable on V4SFmode and maybe V2DFmode, but not
on any scalars.

Then we'd add new CPU tuning settings for division, sqrt and rsqrt (rather than adding lots
of extra tune flags).

If I understood you correctly, would something like coarse tuning flags along with target-specific cost or parameters tables be what you have in mind?

Note the md file should call a function in aarch64.c to decide whether to
expand or not (your division approximation patch makes the decision in the md file which
does not seem a good idea).

I agree.  Will modify it.

Thank you,

--
Evandro Menezes


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]