17603 – [4.0 Regression] cpowf and cpowl give wrong results

Bug 17603 - [4.0 Regression] cpowf and cpowl give wrong results

Summary: [4.0 Regression] cpowf and cpowl give wrong results

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.0.0

Importance:	P1 critical
Target Milestone:	4.0.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	ABI, patch, wrong-code

Depends on:
Blocks:

Reported:	2004-09-22 09:38 UTC by Paolo Carlini
Modified:	2005-07-23 22:49 UTC (History)
CC List:	5 users (show)

See Also:
Host:	x86_64-linux-gnu
Target:	x86_64-linux-gnu
Build:	x86_64-linux-gnu
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Paolo Carlini 2004-09-22 09:38:54 UTC

Hi,

on some platforms, 26_numerics/complex/13450.cc has been failing for some time.
I tracked down the problem to the complex pow builtins. Just run (-O0 suffices)
the below:

#include <cassert>

// libstdc++/13450
int main()
{
  __complex__ long double x;
  __complex__ long double y;
  __complex__ long double z;
  __real__ x = -1.0l;
  __imag__ x = 0.0l;
  __real__ y = 0.5l;
  __imag__ y = 0.0l;
  z = __builtin_cpowl(x, y);
  assert( __imag__ z > 0.99l && __imag__ z < 1.01l);
}

Indeed, if I print z inside gdb:

$1 = <invalid float value> + 3.6451995318824746025284059336194198e-4951 * I

instead of the expected (3.4 is ok): ~0 + 1 * I.

Similar problems affect __builtin_cpowf, just let me know and I can prepare
a suited testcase. In any case, in order to double check that everything is
finally ok, just run the libstdc++ testsuite ;) ...

Comment 1 Andrew Pinski 2004-09-22 12:21:49 UTC

I don't see anything on powerpc-darwin but I think this might be target related, it might also be related 
to changes by RTH or zack with respect to XFmode and TFmode.

I really doubt that the builtin is broken at all.

Comment 2 Paolo Carlini 2004-09-22 12:30:49 UTC

Indeed is target related: x86, s390 and ia64 are ok, for instance; x86_64 and
powerpc64 (linux) are broken.

About powerpc-darwin, according to testresults, 26_numerics/complex/13450.cc
doesn't even build! Why?

Comment 3 Andrew Pinski 2004-09-22 12:39:51 UTC

Earlier versions of darwin (pre Tiger, aka 8.0) don't have the C99 long double math functions (like 
cpowl).  Looks like a TFmode problem.

Comment 4 Paolo Carlini 2004-09-22 12:44:40 UTC

Ok, thanks for the info.

Please note, however, that 13450 includes tests for float, double and long double
and, on x86_64, at -O2 at least, also float fails.

This is a *critical* blocker for x86_64 and powerpc64, IMO: among other things,
the C++ runtime library cannot be used for serious mathematical work.

Comment 5 Andrew Pinski 2004-09-22 12:57:03 UTC

Hmm this works for me even with the float version, I really suspect that this glibc bug.

The tree level looks right:
  REALPART_EXPR <x> = -1.0e+0;
  IMAGPART_EXPR <x> = 0.0;
  REALPART_EXPR <y> = 5.0e-1;
  IMAGPART_EXPR <y> = 0.0;
  z = __builtin_cpow (x, y);
  D.1119 = IMAGPART_EXPR <z>;

So does the Assembly on powerpc-darwin:
        li r5,0
        mflr r31
        li r6,0
        stw r0,8(r1)
        lis r3,0xbff0
        stwu r1,-96(r1)
        li r4,0
        lis r7,0x3fe0
        li r8,0
        mr r9,r5
        mr r10,r6

I wonder if someone broke the ABI.

Comment 6 Andrew Pinski 2004-09-28 03:06:00 UTC

Looking at the result here: <http://gcc.gnu.org/ml/gcc-testresults/2004-09/msg01231.html>, 
libstdc++ passes all its tests.

Comment 7 Paolo Carlini 2004-09-28 08:07:03 UTC

Andrew, we all know that ia64 is OK (see Comment #2): x86_64 and powerpc64 are
*broken*, no news :(

Comment 8 Paolo Carlini 2004-09-28 15:44:26 UTC

In fact, builtins are not directly responsible.
This trivial, pure C, testcase:

#include <complex.h>
#include <stdio.h>

int main()
{
  long double complex b, a, r;

  b = -1;
  a = 0.5;
  r = cpowl(b, a);

  printf("%Lg, %Lg\n", creall(r), cimagl(r));
  
  return 0;
}

Suffices to show the problem on x86_64:
3.69315e-4664, 0

instead of the expected (3.3 and 3.4 are ok -> *not* a glibc problem!):
-2.71051e-20, 1

Comment 9 Scott Robert Ladd 2004-11-02 03:07:52 UTC

The problem is broader.

For example, the Fortran 95 abs() intrinsic is also broken for x86_64 when
compiled with -O1, as illustrated by the following Fortran 95 program (testing
on my Opteron box with cvs-20041101 mainline):

program cabs
    implicit none

    ! Program works if K = 8, breaks if K = 4
    integer, parameter :: K = 4
    complex(kind=K)    :: z, c
    real(kind=K)       :: a1, a2

    z = (2.0_K, 3.0_K) ** 2.0_K
    c = z - (-5.0_K, 12.0_K)
    a1 = abs(c)
    a2 = sqrt(aimag(c) ** 2.0_K + real(c) ** 2.0_K)

    write (*,*) 'z = ', z
    write (*,*) 'c = ', c
    write (*,*) 'a1 = ', a1
    write (*,*) 'a2 = ', a2

end program cabs

Without optimization, I get the expected results:

$ gfortran -o runme cabs.f90
$ ./runme
 z =  ( -5.000000    ,  12.00000    )
 c =  (  0.000000    ,-9.5367432E-07)
 a1 =   9.5367432E-07
 a2 =   9.5367432E-07

With -O1, the program breaks:

$ gfortran -O1 -o runme cabs.f90
$ ./runme
 z =  ( -5.000000    ,  12.00000    )
 c =  (  0.000000    ,-9.5367432E-07)
 a1 =    13.00000
 a2 =   9.5367432E-07

With -O1 *AND* the unfairly maligned -ffast-math, the result is:

$ gfortran -O1 -ffast-math -o runme cabs.f90
$ ./runme
 z =  ( -5.000000    ,  12.00000    )
 c =  (  0.000000    ,-9.5367432E-07)
 a1 =   9.5367432E-07
 a2 =   9.5367432E-07

The ** operation works with all three compiles; the error appears to be in the
call to cabsf(). I don't believe the problem is cabsf() itself, because *both*
the unoptimized and -O1 versions call cabsf(). Using -ffast-math changes the
code substantially, avoiding the optimizer bug.

Note that my Pentium 4 system produces correct code, using -march=pentium4
-mfpmath=sse (to correspond with the x86_64's automatic use of SSE). The code is
very similar on the two architectures, save that the x86_64 uses the extended
registers (and gets abs() wrong, of course).

I *think* the wrong arguments are being passed to cabsf() on the x86_64 --  as
evidenced by the fact that the "wrong" output is abs(c), as if the embedded
subtraction did not exist. This *may* be the same problem as with the cpowl() in
your C example -- incorrect arguments as opposed to bas C lib functions.

I'm rather burned out tonight, but I thought I'd pass this along. If no one else
runs with this, I'll keep going. I'm learning a lot about GCC internals... ;)

Comment 10 Andrew Pinski 2004-12-05 20:42:55 UTC

Looking at:
<http://gcc.gnu.org/ml/gcc-testresults/2004-12/msg00119.html>
it looks likes it is only for -m32 case.

I think this is a target problem now.
As ppc64 is fixed at least according to <http://gcc.gnu.org/ml/gcc-testresults/2004-12/
msg00183.html>.

Comment 11 Paolo Carlini 2004-12-05 21:02:55 UTC

> Looking at:
> <http://gcc.gnu.org/ml/gcc-testresults/2004-12/msg00119.html>
> it looks likes it is only for -m32 case.

Look better: is the other way 'round: -m32 is ok, 64-bit is broken (I'm 100% sure
because often I build --disable-multilib and the failure is always there :(

Comment 12 roger 2004-12-05 21:24:29 UTC

There does appear to be some form of ABI change on x86_64.  When compiling with
gcc-3.4.3 the results of cpowl are assumed to be returned on the x87 stack, but
when compiling with 4.0.0 (20041205 experimental) the generated code assumes that
the floating point results are returned in memory!!  I suspect that an internal
change in the representation of complex types, has broken platforms with special
parameter passing/returning conventions for complex types.

Comment 13 Steven Bosscher 2004-12-15 22:07:31 UTC

The bug appears between "2004-07-10 00:00" and "2004-07-10 12:00".  I hope 
to reduce it a bit further to pinpoint the offending patch.

Comment 14 Paolo Carlini 2004-12-15 23:51:25 UTC

Thanks Steven. This bus is really a show stopper, in my opinion.

Comment 15 Steven Bosscher 2004-12-16 01:05:28 UTC

Narrowed it down to "2004-07-10 00:00" and "2004-07-10 01:00".

Comment 16 Steven Bosscher 2004-12-16 01:09:15 UTC

This patch caused it: 
 
+2004-07-09  Jan Beulich  <jbeulich@novell.com> 
+ 
+       * config/i386/i386.c (classify_argument): Treat V1xx modes the same as 
+       their base modes. CTImode, TCmode, and XCmode must be passed in 
memory. 
+       TFmode (__float128) must be is an SSE/SSEUP pair. V2SImode, V4HImode, 
+       and V8QI are class SSE. All sufficiently small remaining vector modes 
+       must be passed in one or two integer registers. 
+       (ix86_libcall_value): TFmode must be returned in xmm0, XCmode must be 
+       returned in memory. 
+       (bdesc_2arg, ix86_init_mmx_sse_builtins): __builtin_ia32_pmuludq and 
+       __builtin_ia32_pmuludq128 have non-uniform argument and return types 
+       and must thus be handled explicitly. 
+       * config/i386/i386.md (*movdi_1_rex64): Add cases for moving between 
+       MMX and XMM regs. 
+       (movv8qi_internal, movv4hi_internal, movv2si_internal, 
+       movv2sf_internal): Permit moving between MMX and XMM registers (since 
+       MMX areguments and return values are passed in XMM registers). 
+       (sse2_umulsidi3): Correct type and mode.

Comment 17 Andrew Pinski 2004-12-16 01:13:59 UTC

The patch which caused it was <http://gcc.gnu.org/ml/gcc-patches/2004-07/msg00468.html>.

But if I read the patch, this sounds like GCC before 4.0.0 was not implementing the ABI right.

Comment 18 jbeulich 2004-12-16 12:59:22 UTC

This happens when the ABI is (was) imprecise. Earlier versions did not special
case long double complex, but the current version does. Hence I have a patch
ready that partly reverses the original change (actually, implements the
original behavior for this type in a form slightly closer matching the ABI
definition); expecting to submit this later today or tomorrow (currently
building/testing).

As to the same(?) problem observed with float complex: I doubt the mentioned
patch causes this, since behavior for that type wasn't modified by it.

Comment 19 jbeulich 2004-12-16 15:46:05 UTC

Patch for the x86-64 long double complex part submitted.

Comment 20 Andrew Pinski 2004-12-16 16:33:05 UTC

Patch here: <http://gcc.gnu.org/ml/gcc-patches/2004-12/msg01223.html>.

Comment 21 GCC Commits 2004-12-17 08:54:21 UTC

Subject: Bug 17603

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	jbeulich@gcc.gnu.org	2004-12-17 08:54:02

Modified files:
	gcc            : ChangeLog 
	gcc/config/i386: i386.c 

Log message:
	2004-12-17  Jan Beulich  <jbeulich@novell.com>
	
	PR target/17603
	* config/i386/i386.c (enum x86_64_reg_class): Define
	X86_64_COMPLEX_X87_CLASS.
	(x86_64_reg_class_names): Add name for X86_64_COMPLEX_X87_CLASS.
	(merge_classes): Handle X86_64_COMPLEX_X87_CLASS.
	(classify_argument): XCmode is X86_64_COMPLEX_X87_CLASS.
	(examine_argument): X86_64_COMPLEX_X87_CLASS requires two
	registers when dealing with a return value.
	(construct_container): Handle X86_64_COMPLEX_X87_CLASS.
	Eliminate impossible case of two X87/X87UP pairs (this now is
	being expressed by a single COMPLEX_X87).
	(x86_libcall_value): XCmode gets returned in st0/st1.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6866&r2=2.6867
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.c.diff?cvsroot=gcc&r1=1.754&r2=1.755

Comment 22 Andrew Pinski 2004-12-17 12:23:45 UTC

Fixed.