47989 – -mrecip causes 482.sphinx3, 464.h264ref and 481.wrf to miscompare

Bug 47989 - -mrecip causes 482.sphinx3, 464.h264ref and 481.wrf to miscompare

Summary: -mrecip causes 482.sphinx3, 464.h264ref and 481.wrf to miscompare

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.6.0

Importance:	P3 enhancement
Target Milestone:	4.7.0
Assignee:	Richard Biener

URL:	http://gcc.gnu.org/ml/gcc-patches/201...
Keywords:

Depends on:
Blocks:

Reported:	2011-03-04 14:37 UTC by Richard Biener
Modified:	2021-08-07 22:59 UTC (History)
CC List:	0 users

See Also:	55760
Host:
Target:	x86_64--
Build:
Known to work:
Known to fail:
Last reconfirmed:	2011-03-04 16:06:42

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Richard Biener 2011-03-04 14:37:44 UTC

We currently miscompare 482.sphinx3 with -Ofast -mrecip because for

float foo (float x, float y)
{    
  return ((int)(x/y + 0.5)) * y;
} 

we use rcpss for the division by y.  This results in a possible error
of +-1 for the integer intermediate result and an error of +-y for
the overall result.

Comment 1 Richard Biener 2011-03-04 16:06:42 UTC

Similarly 464.h264ref miscompares because of

  fprintf(stdout, "Freq. for encoded bitstream: %1.0f\n",
          img->framerate/(float)(input->jumpd+1));

where both img->framerate and input->jumpd are input parameters
(15.0 and 1).  Here the rounding to integer happens inside fprintf.

Feeding rcps sequences into call stmts is probably never a very good idea.



Mine.  I'm going to move rcps expansion up into tree-ssa-math-opts, the
same place where we apply LCM for CSE-ing 1/x.  Probably replace the
division by a builtin.

Comment 2 Richard Biener 2011-03-10 13:27:22 UTC

Same for 481.wrf, hope for dealing with this with taking into account context
of the division vanishes here.  The code is obfuscated with several
levels of array lookup.

In all cases the Intel compiler simply only uses rcp instructions for
vectorized loops.

Comment 3 Dominique d'Humieres 2011-03-10 14:01:25 UTC

A similar problem occurs with the polyhedron test aermod.f90 (see pr34702).

> Feeding rcps sequences into call stmts is probably never a very good idea.

Probably the same thing for tests.

> In all cases the Intel compiler simply only uses rcp instructions for
> vectorized loops.

I think this would be a good idea, however the last time I have looked at it (some time ago!-), gcc was not as good as intel to vectorize rcp.

Comment 4 uros 2011-10-20 15:13:39 UTC

Author: uros
Date: Thu Oct 20 15:13:30 2011
New Revision: 180256

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=180256
Log:
	PR target/47989
	* config/i386/i386.h (RECIP_MASK_DEFAULT): New define.
	* config/i386/i386.op (recip_mask): Initialize with RECIP_MASK_DEFAULT.
	* doc/invoke.texi (ix86 Options, -mrecip): Document that GCC
	implements vectorized single float division and vectorized sqrtf(x)
	with reciprocal sequence with additional Newton-Raphson step with
	-ffast-math.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.h
    trunk/gcc/config/i386/i386.opt
    trunk/gcc/doc/invoke.texi

Comment 5 Uroš Bizjak 2011-10-20 15:29:51 UTC

We now use reciprocals for vectorized operators by default, see threads at [1], [2] and [3] for the discussion.

[1] http://gcc.gnu.org/ml/gcc-patches/2011-08/msg02550.html
[2] http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00212.html
[3] http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01825.html

So, fixed.