This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Performance analysis of Polyhedron/gas_dyn



On Apr 27, 2007, at 06:12, Janne Blomqvist wrote:
I agree it can be an issue, but OTOH people who care about precision probably 1. avoid -ffast-math 2. use double precision (where these reciprocal instrs are not available). Intel calls it - no-prec-div, but it's enabled for the "-fast" catch-all option.

On a related note, our beloved competitors generally have some high level flag for combining all these fancy and potentially unsafe optimizations (e.g. -O4, -fast, -fastsse, -Ofast, etc.). For gcc, at least FP benchmarks seem to do generally well with something like "-O3 -funroll-loops -ftree-vectorize -ffast-math -march=native -mfpmath=sse", but it's quite a mouthful.

No, using only 12 bits of precision is just ridiculous and should not be included in -ffast-math. You should always use a Newton-Rhapson step after getting the 12-bit approximation. When done correctly this doubles the precision and gets you just about the 24 bits of precision needed for float. Reciprocal approximations are meant to be used that way, and it's no accident the lookup provides exactly half the bits needed. For double precision you just do two more iterations, which is why there is no need for double precision variants of these instructions.

The cost for the extra step is small, and you get good results.
There are many variations possible, and using fused-multiply add
it's even possible to get correctly rounded results at low cost.
I truly doubt that any of the compilers you mention use these
instructions without NR iteration to get required precision.

-Geert


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]