This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

From: Michael Matz <matz at suse dot de>
To: Benedikt Huber <benedikt dot huber at theobroma-systems dot com>
Cc: pinskia at gmail dot com, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, "philipp dot tomsich at theobroma-systems dot com" <philipp dot tomsich at theobroma-systems dot com>
Date: Thu, 25 Jun 2015 15:27:39 +0200 (CEST)
Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
Authentication-results: sourceware.org; auth=none
References: <1434629045-24650-1-git-send-email-benedikt dot huber at theobroma-systems dot com> <8B73CF78-11D4-4963-A60A-E1C2A3B219E2 at gmail dot com> <F2FF9755-1DF9-4000-8602-77AB12077240 at theobroma-systems dot com>

Hi,

On Thu, 25 Jun 2015, Benedikt Huber wrote:

> > This is NOT a win on thunderX at least for single precision because 
> > you have to do the divide and sqrt in the same time as it takes 5 
> > multiples (estimate and step are multiplies in the thunderX pipeline).  
> > Doubles is 10 multiplies which is just the same as what the patch does 
> > (but it is really slightly less than 10, I rounded up). So in the end 
> > this is NOT a win at all for thunderX unless we do one less step for 
> > both single and double.
> 
> Yes, the expected benefit from rsqrt estimation is implementation 
> specific. If one has a better initial rsqrte or an application that can 
> trade precision for execution time, we could offer a command line option 
> to do only 2 steps for doulbe and 1 step for float; similar to 
> -mrecip-precision for PowerPC. What are your thoughts on that?

On x86-64, under -ffast-math we only do one NR step.  Generally the 
rule-of-thumb take on fast-math is, that common benchmarks should still 
validate with that option in effect.

(And yes, I also never found a speedup for approximated reciprocals so 
that benchmarks would still generally validate, you always had to do two 
NR steps, and then it became as slow as a general divide).  See also 
http://gcc.gnu.org/ml/gcc-patches/2009-11/msg00099.html and the followup 
thread.

References:
- [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
  - From: Benedikt Huber
- Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
  - From: pinskia
- Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
  - From: Benedikt Huber

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]