Floating point performance issue

Segher Boessenkool segher@kernel.crashing.org
Tue Dec 20 12:43:00 GMT 2011

I tested this on a PowerPC 970 so I could get lovely charts from
the Shark.  The problem is much less severe there, but it is
totally obvious the problem is that with the default rounding
mode (round to nearest, tie break even) the denormal sticks
around for > 0.5 .

>> Therefore the flags needed are -msse2 -mfpmath=sse -ffast-math
> I would discourage the use of -ffast-math, which can affect generic
> code very badly (due to -funsafe-math-optimizations). Isn't there
> an option to enable FTZ?

Dunno about that(*), but you can portably do


and that prevents the problem from occurring as well.


(*) So I looked it up, gcc/config/i386/crtfastmath.c, the code is
(for x86-64):

#define MXCSR_DAZ (1 << 6)      /* Enable denormals are zero mode */
#define MXCSR_FTZ (1 << 15)     /* Enable flush to zero mode */

   unsigned int mxcsr = __builtin_ia32_stmxcsr ();
   mxcsr |= MXCSR_DAZ | MXCSR_FTZ;
   __builtin_ia32_ldmxcsr (mxcsr);

