We currently miscompare 482.sphinx3 with -Ofast -mrecip because for float foo (float x, float y) { return ((int)(x/y + 0.5)) * y; } we use rcpss for the division by y. This results in a possible error of +-1 for the integer intermediate result and an error of +-y for the overall result.
Similarly 464.h264ref miscompares because of fprintf(stdout, "Freq. for encoded bitstream: %1.0f\n", img->framerate/(float)(input->jumpd+1)); where both img->framerate and input->jumpd are input parameters (15.0 and 1). Here the rounding to integer happens inside fprintf. Feeding rcps sequences into call stmts is probably never a very good idea. Mine. I'm going to move rcps expansion up into tree-ssa-math-opts, the same place where we apply LCM for CSE-ing 1/x. Probably replace the division by a builtin.
Same for 481.wrf, hope for dealing with this with taking into account context of the division vanishes here. The code is obfuscated with several levels of array lookup. In all cases the Intel compiler simply only uses rcp instructions for vectorized loops.
A similar problem occurs with the polyhedron test aermod.f90 (see pr34702). > Feeding rcps sequences into call stmts is probably never a very good idea. Probably the same thing for tests. > In all cases the Intel compiler simply only uses rcp instructions for > vectorized loops. I think this would be a good idea, however the last time I have looked at it (some time ago!-), gcc was not as good as intel to vectorize rcp.
Author: uros Date: Thu Oct 20 15:13:30 2011 New Revision: 180256 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=180256 Log: PR target/47989 * config/i386/i386.h (RECIP_MASK_DEFAULT): New define. * config/i386/i386.op (recip_mask): Initialize with RECIP_MASK_DEFAULT. * doc/invoke.texi (ix86 Options, -mrecip): Document that GCC implements vectorized single float division and vectorized sqrtf(x) with reciprocal sequence with additional Newton-Raphson step with -ffast-math. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.h trunk/gcc/config/i386/i386.opt trunk/gcc/doc/invoke.texi
We now use reciprocals for vectorized operators by default, see threads at [1], [2] and [3] for the discussion. [1] http://gcc.gnu.org/ml/gcc-patches/2011-08/msg02550.html [2] http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00212.html [3] http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01825.html So, fixed.