Bug 19974 - incorrect complex division on ia-64 with flag_complex_method = 2
Summary: incorrect complex division on ia-64 with flag_complex_method = 2
Status: RESOLVED WORKSFORME
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: unknown
: P2 normal
Target Milestone: ---
Assignee: Richard Henderson
URL:
Keywords: wrong-code
Depends on:
Blocks: 5900 18902
  Show dependency treegraph
 
Reported: 2005-02-15 16:18 UTC by Thomas Koenig
Modified: 2005-03-31 21:38 UTC (History)
2 users (show)

See Also:
Host:
Target: ia64-unknown-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2005-02-16 23:17:12


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Koenig 2005-02-15 16:18:18 UTC
Checking Lapack with flag_complex_method = 2 on
ia64-unknown-linux-gnu with -O0, I got a hang in xeigtstc
when processing sep.in, in the routine slarrb.f.
The compiler was the 20050213 snapshot.

This may not have anything to do with the setting of
flag_complex_method, because the routine in question
does not have any complex variables.

The code where it hangs looks like this:

            DO 70 J = 1, N - 1
               TMP = D( J ) + S
               S = S*( LD( J ) / TMP )*L( J ) - RIGHT
               IF( TMP.LT.ZERO )
     $            CNT = CNT + 1
   70       CONTINUE

Here's a sample gdb session:

$ gdb xeigtstc
GNU gdb Red Hat Linux (6.1post-1.20040607.52rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "ia64-redhat-linux-gnu"...Using host libthread_db
library "/lib/tls/libthread_db.so.1".

(gdb) r
Starting program: /home/zfkts/zfkts/LAPACK-gfortran/TESTING/xeigtstc

Program received signal SIGINT, Interrupt.
0x200000000037bbc1 in read () from /lib/tls/libc.so.6.1
(gdb) r < sep.in
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /home/zfkts/zfkts/LAPACK-gfortran/TESTING/xeigtstc < sep.in
 Tests of the Hermitian Eigenvalue Problem routines

 LAPACK VERSION 3.0, released June 30, 1999

 The following parameter values will be used:
    M:         0     1     2     3     5    20
    N:         0     1     2     3     5    20
    NB:        1     3     3     3    10
    NBMIN:     2     2     2     2     2
    NX:        1     0     5     9     1

 Relative machine underflow is taken to be    0.117549E-37
 Relative machine overflow  is taken to be    0.340282E+39
 Relative machine precision is taken to be    0.596046E-07

 Routines pass computational tests if test ratio is less than   50.00


 CST routines passed the tests of the error exits (114 tests done)


 SEP:  NB =   1, NBMIN =   2, NX =   1

 All tests for CST passed the threshold ( 4662 tests run)

 All tests for CST drivers  passed the threshold ( 11664 tests run)


 SEP:  NB =   3, NBMIN =   2, NX =   0

 CST -- Complex Hermitian eigenvalue problem
 Matrix types (see CCHKST for details):

 Special Matrices:
  1=Zero matrix.                          5=Diagonal: clustered entries.
  2=Identity matrix.                      6=Diagonal: large, evenly spaced.
  3=Diagonal: evenly spaced entries.      7=Diagonal: small, evenly spaced.
  4=Diagonal: geometr. spaced entries.
 Dense Hermitian Matrices:
  8=Evenly spaced eigenvals.             12=Small, evenly spaced eigenvals.
  9=Geometrically spaced eigenvals.      13=Matrix with random O(1) entries.
 10=Clustered eigenvalues.               14=Matrix with large random entries.
 11=Large, evenly spaced eigenvals.      15=Matrix with small random entries.
 16=Positive definite, evenly spaced eigenvalues
 17=Positive definite, geometrically spaced eigenvlaues
 18=Positive definite, clustered eigenvalues
 19=Positive definite, small evenly spaced eigenvalues
 20=Positive definite, large evenly spaced eigenvalues
 21=Diagonally dominant tridiagonal, geometrically spaced eigenvalues

Test performed:  see CCHKST for details.

 Matrix order=    5, type= 9, seed= 894,3587, 994, 217, result  36 is   62.23
 CST:    1 out of  4662 tests failed to pass the threshold

 All tests for CST drivers  passed the threshold ( 11664 tests run)


 SEP:  NB =   3, NBMIN =   2, NX =   5

 All tests for CST passed the threshold ( 4662 tests run)

 All tests for CST drivers  passed the threshold ( 11664 tests run)


 SEP:  NB =   3, NBMIN =   2, NX =   9

 CST -- Complex Hermitian eigenvalue problem
 Matrix types (see CCHKST for details):

 Special Matrices:
  1=Zero matrix.                          5=Diagonal: clustered entries.
  2=Identity matrix.                      6=Diagonal: large, evenly spaced.
  3=Diagonal: evenly spaced entries.      7=Diagonal: small, evenly spaced.
  4=Diagonal: geometr. spaced entries.
 Dense Hermitian Matrices:
  8=Evenly spaced eigenvals.             12=Small, evenly spaced eigenvals.
  9=Geometrically spaced eigenvals.      13=Matrix with random O(1) entries.
 10=Clustered eigenvalues.               14=Matrix with large random entries.
 11=Large, evenly spaced eigenvals.      15=Matrix with small random entries.
 16=Positive definite, evenly spaced eigenvalues
 17=Positive definite, geometrically spaced eigenvlaues
 18=Positive definite, clustered eigenvalues
 19=Positive definite, small evenly spaced eigenvalues
 20=Positive definite, large evenly spaced eigenvalues
 21=Diagonally dominant tridiagonal, geometrically spaced eigenvalues

Test performed:  see CCHKST for details.

 Matrix order=   20, type= 9, seed=1052,3651,3662,3633, result  35 is   69.27
 Matrix order=   20, type= 9, seed=1052,3651,3662,3633, result  36 is 2182.68
 CST:    2 out of  4662 tests failed to pass the threshold

 CST -- Complex Hermitian eigenvalue problem
 Matrix types (see xDRVST for details):

 Special Matrices:
  1=Zero matrix.                          5=Diagonal: clustered entries.
  2=Identity matrix.                      6=Diagonal: large, evenly spaced.
  3=Diagonal: evenly spaced entries.      7=Diagonal: small, evenly spaced.
  4=Diagonal: geometr. spaced entries.
 Dense Hermitian Matrices:
  8=Evenly spaced eigenvals.             12=Small, evenly spaced eigenvals.
  9=Geometrically spaced eigenvals.      13=Matrix with random O(1) entries.
 10=Clustered eigenvalues.               14=Matrix with large random entries.
 11=Large, evenly spaced eigenvals.      15=Matrix with small random entries.

 Tests performed:  See cdrvst.f
 Matrix order=   20, type= 9, seed=1494,3156,1807,2209, result 101 is  353.18

(wait for a long time...)

Program received signal SIGINT, Interrupt.
slarrb_ (n=@0x4, d=0x4, l=0x4, ld=0x4, lld=0x4, ifirst=@0x4, ilast=@0x4,
    sigma=@0x4, reltol=Cannot access memory at address 0x1
) at slarrb.f:191
191                 TMP = D( N ) + S
Current language:  auto; currently fortran
(gdb) c
Continuing.

Program received signal SIGINT, Interrupt.
slarrb_ (n=@0x7fc000007f800000, d=0x7fc000007f800000, l=0x7fc000007f800000,
    ld=0x7fc000007f800000, lld=0x7fc000007f800000, ifirst=@0x7fc000007f800000,
    ilast=@0x7fc000007f800000, sigma=@0x7fc000007f800000, reltol=Cannot access
memory at address 0x0
)
    at slarrb.f:195
195                    DELTA = TWO*DELTA
(gdb) c
Continuing.

Program received signal SIGINT, Interrupt.
slarrb_ (n=Cannot access memory at address 0x1
) at slarrb.f:185
185                 DO 70 J = 1, N - 1
(gdb) c
Continuing.

Program received signal SIGINT, Interrupt.
slarrb_ (n=@0x47fc00000, d=0x47fc00000, l=0x47fc00000, ld=0x47fc00000,
    lld=0x47fc00000, ifirst=@0x47fc00000, ilast=@0x47fc00000,
    sigma=@0x47fc00000, reltol=Cannot access memory at address 0x0
) at slarrb.f:183
183                 S = -RIGHT
(gdb) c
Continuing.

Program received signal SIGINT, Interrupt.
0x400000000047fd71 in slarrb_ (n=@0x47fc00000, d=0x47fc00000, l=0x47fc00000,
    ld=0x47fc00000, lld=0x47fc00000, ifirst=@0x47fc00000, ilast=@0x47fc00000,
    sigma=@0x47fc00000, reltol=Cannot access memory at address 0x0
) at slarrb.f:183
183                 S = -RIGHT
(gdb) c
Continuing.

Program received signal SIGINT, Interrupt.
0x400000000047fe91 in slarrb_ (n=Cannot access memory at address 0x3
) at slarrb.f:187
187                    S = S*( LD( J ) / TMP )*L( J ) - RIGHT
(gdb) c
Continuing.

Program received signal SIGINT, Interrupt.
0x4000000000480111 in slarrb_ (n=@0x7fc000007fc00000, d=0x7fc000007fc00000,
    l=0x7fc000007fc00000, ld=0x7fc000007fc00000, lld=0x7fc000007fc00000,
    ifirst=@0x7fc000007fc00000, ilast=@0x7fc000007fc00000,
    sigma=@0x7fc000007fc00000, reltol=Cannot access memory at address 0x3
) at slarrb.f:191
191                 TMP = D( N ) + S
(gdb) c
Continuing.

Program received signal SIGINT, Interrupt.
0x400000000047fe70 in slarrb_ (n=@0x7fc000007fc00000, d=0x7fc000007fc00000,
    l=0x7fc000007fc00000, ld=0x7fc000007fc00000, lld=0x7fc000007fc00000,
    ifirst=@0x7fc000007fc00000, ilast=@0x7fc000007fc00000,
    sigma=@0x7fc000007fc00000, reltol=Cannot access memory at address 0x1
) at slarrb.f:186
186                    TMP = D( J ) + S
(gdb) c
Continuing.

Program received signal SIGINT, Interrupt.
slarrb_ (n=@0x60000000000d5cf8, d=0x60000000000d5cf8, l=0x60000000000d5cf8,
    ld=0x60000000000d5cf8, lld=0x60000000000d5cf8, ifirst=@0x60000000000d5cf8,
    ilast=@0x60000000000d5cf8, sigma=@0x60000000000d5cf8, reltol=Cannot access
memory at address 0x1
)
    at slarrb.f:187
187                    S = S*( LD( J ) / TMP )*L( J ) - RIGHT
Comment 1 Thomas Koenig 2005-02-15 20:14:32 UTC
This does not happen on an Athlon-xp with -march=athlon-xp
-mfmath=sse.  Might be target or 64-bit specific.
Comment 2 Thomas Koenig 2005-02-16 12:41:39 UTC
I think I have identified the problem.

The hang itself is probably caused by a Lapack bug, because slarrb is
only fed 0. and NaN as arguments.

The reason why this is so is probably due to a problem in complex
division with flag_complex_method = 2.  Here's a test case:

$ cat c-tst.c
#include <stdio.h>
#include <math.h>
#include <complex.h>

int main()
{
    float complex a,b,c;
    a = 1.e20-I*1.e12;
    b = 1. - I*1.e-8;
    c = a/b;
    printf("%.5g %.5g\n",creal(c),cimag(c));
    return 0;
}

This has flag_complex_medhod = 2:

$ gcc -B ~/gcc-C2/gcc c-tst.c
$ ./a.out
1e+20 18059
      ^^^^^^
       wrong

This has flag_complex_method = 1:

$ gcc -B ~/gcc-C1/gcc c-tst.c
$ ./a.out
1e+20 0
      ^^
      correct

Probably a wrong sign when recovering from overflow in the
new complex division routines.

Adding rth to the cc list.
Comment 3 Richard Henderson 2005-02-16 19:11:12 UTC
I don't reproduce this on amd64.  It was raining in the machine room yesterday,
so I don't have access to my ia64 machine to see if it's something special there.
Comment 4 Thomas Koenig 2005-02-16 20:51:09 UTC
Checking this on i686-unknown-linux-gnu, I find the same
result with flag_complex_method=2 as on ia-64.  I am also
seeing the same result with logarithmic scaling (using frexp and
ldexp). Maybe I'm being bitten by the decimal fractions not being exactly
representable in binary, and the Lapack failure has some other reason.

I'll keep looking.
Comment 5 Richard Henderson 2005-02-16 23:17:11 UTC
Confirmed.  It's an ia64-specific miscompilation.  Doesn't happen if libgcc 
is built with -O0 instead of -O2.
Comment 6 Richard Henderson 2005-02-17 02:47:55 UTC
The "problem" is the use of the fused multiply-and-add instructions.  

1672          y = (b - (a * ratio)) / denom;
  d6:   70 70 24 0c 54 00                   fms.s.s0 f7=f9,f6,f14

This computes the intermediate results of the entire numerator with 82 bits of
precision.  One can argue that this is inappropriate, since it's not what the
source wrote.

HOWEVER, if I rewrite __divsc3 to explicitly use "long double" to compute
intermediate results -- which as far as I can see should be legitimate -- I 
also get 18059 for the imaginary part of the result.  As best I can figure,
that result is more mathematically correct than 0.
Comment 7 Andrew Pinski 2005-02-17 03:30:28 UTC
Using Mathematica I get for
(10^20 + 10^12 I)/(1 - 10^-8) = 10^20 + 2 * 10^12 I

so really neither of them are mathematically correct.
Comment 8 Thomas Koenig 2005-02-17 09:42:02 UTC
(In reply to comment #7)
> Using Mathematica I get for
> (10^20 + 10^12 I)/(1 - 10^-8) = 10^20 + 2 * 10^12 I
> 
> so really neither of them are mathematically correct.

The test case was (10^20-10^12*I)/(1-10^(-8)*I), which is 10^20,
obviously.

For what it is worth, ifort also gets 18059 as the result.  I'll
do some checking to see what this should really be.
Comment 9 Thomas Koenig 2005-02-17 13:52:40 UTC
Another datapoint - the fact that slarrb "has problems"
has been confirmed by a Lapack developer.  A new version is
slated to appear as a patch soon.  Hopefully, this will reduce
the potential hang in the test program to a mere failure.
Comment 10 Richard Henderson 2005-03-31 21:38:16 UTC
I'm going to assume there's no bug here.  Re-open if you disagree.