This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Performance analysis of Polyhedron/gas_dyn

From: Geert Bosch <bosch at adacore dot com>
To: Janne Blomqvist <blomqvist dot janne at gmail dot com>
Cc: Richard Guenther <richard dot guenther at gmail dot com>, gfortran <fortran at gcc dot gnu dot org>, gcc at gcc dot gnu dot org
Date: Fri, 27 Apr 2007 10:38:05 -0400
Subject: Re: Performance analysis of Polyhedron/gas_dyn
References: <4631A891.5020002@gmail.com> <84fc9c000704270127s4eb0030p9d27fd5fdff73cf8@mail.gmail.com> <4631CCAA.3010403@gmail.com>

On Apr 27, 2007, at 06:12, Janne Blomqvist wrote:

I agree it can be an issue, but OTOH people who care about precision probably 1. avoid -ffast-math 2. use double precision (where these reciprocal instrs are not available). Intel calls it - no-prec-div, but it's enabled for the "-fast" catch-all option.

On a related note, our beloved competitors generally have some high level flag for combining all these fancy and potentially unsafe optimizations (e.g. -O4, -fast, -fastsse, -Ofast, etc.). For gcc, at least FP benchmarks seem to do generally well with something like "-O3 -funroll-loops -ftree-vectorize -ffast-math -march=native -mfpmath=sse", but it's quite a mouthful.


No, using only 12 bits of precision is just ridiculous and should
not be included in -ffast-math. You should always use a Newton-Rhapson
step after getting the 12-bit approximation. When done correctly
this doubles the precision and gets you just about the 24 bits of
precision needed for float. Reciprocal approximations are meant
to be used that way, and it's no accident the lookup provides
exactly half the bits needed. For double precision you just do
two more iterations, which is why there is no need for double
precision variants of these instructions.

The cost for the extra step is small, and you get good results.
There are many variations possible, and using fused-multiply add
it's even possible to get correctly rounded results at low cost.
I truly doubt that any of the compilers you mention use these
instructions without NR iteration to get required precision.

-Geert

Follow-Ups:
- Re: Performance analysis of Polyhedron/gas_dyn
  - From: Robert Dewar
- Re: Performance analysis of Polyhedron/gas_dyn
  - From: Janne Blomqvist

References:
- Re: Performance analysis of Polyhedron/gas_dyn
  - From: Richard Guenther
- Re: Performance analysis of Polyhedron/gas_dyn
  - From: Janne Blomqvist

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]