This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
- From: Michael Matz <matz at suse dot de>
- To: Benedikt Huber <benedikt dot huber at theobroma-systems dot com>
- Cc: pinskia at gmail dot com, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, "philipp dot tomsich at theobroma-systems dot com" <philipp dot tomsich at theobroma-systems dot com>
- Date: Thu, 25 Jun 2015 15:27:39 +0200 (CEST)
- Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
- Authentication-results: sourceware.org; auth=none
- References: <1434629045-24650-1-git-send-email-benedikt dot huber at theobroma-systems dot com> <8B73CF78-11D4-4963-A60A-E1C2A3B219E2 at gmail dot com> <F2FF9755-1DF9-4000-8602-77AB12077240 at theobroma-systems dot com>
Hi,
On Thu, 25 Jun 2015, Benedikt Huber wrote:
> > This is NOT a win on thunderX at least for single precision because
> > you have to do the divide and sqrt in the same time as it takes 5
> > multiples (estimate and step are multiplies in the thunderX pipeline).
> > Doubles is 10 multiplies which is just the same as what the patch does
> > (but it is really slightly less than 10, I rounded up). So in the end
> > this is NOT a win at all for thunderX unless we do one less step for
> > both single and double.
>
> Yes, the expected benefit from rsqrt estimation is implementation
> specific. If one has a better initial rsqrte or an application that can
> trade precision for execution time, we could offer a command line option
> to do only 2 steps for doulbe and 1 step for float; similar to
> -mrecip-precision for PowerPC. What are your thoughts on that?
On x86-64, under -ffast-math we only do one NR step. Generally the
rule-of-thumb take on fast-math is, that common benchmarks should still
validate with that option in effect.
(And yes, I also never found a speedup for approximated reciprocals so
that benchmarks would still generally validate, you always had to do two
NR steps, and then it became as slow as a general divide). See also
http://gcc.gnu.org/ml/gcc-patches/2009-11/msg00099.html and the followup
thread.