The following program: FUNCTION F(r,a,e) REAL*8 :: a(2:15) REAL*8 :: r,f,e f=0 DO i = 2, 15 f = f + a(i)/(r**(i-1)*REAL(i-1,8)) END DO f=f-e END FUNCTION F PROGRAM TEST REAL*8 :: a(2:15)=(/-195.771601327700D0, 15343.7861339500D0, & -530864.458651600D0, 10707934.3905800D0, & -140099704.789000D0, 1250943273.78500D0, & -7795458330.67600D0, 33955897217.3100D0, & -101135640744.000D0, 193107995718.700D0, & -193440560940.000D0, -4224406093.91800D0, & 217192386506.500D0, -157581228915.500D0/) REAL*8 :: r=4.51556282882533D0 REAL*8 :: e=-2.21199635966809471000D0 REAL*8 :: f write(6,*) f(r,a,e) END PROGRAM TEST generates different results with gfortran 4.3.0 at O0 and O3 > gfortran -O0 -march=k8 -msse test.f90 > ./a.out 0.00000000000000 > gfortran -O3 -march=k8 -msse test.f90 > ./a.out -1.143973804573761E-012 notice that this is without -ffast-math and with sse. gfortran 4.1 gives 0.0 in all cases. I suspect that the reason is that __powidf2 is replaced by inline code at -O3 that gives slightly different results. If this is expected behavior, maybe this substitution should be guarded by -ffast-math ?

Confirmed. It changes for me -O2 vs. -O2 -fpeel-loops which causes the loop in f to be completely unrolled (which at the first sight doesn't look wrong). And we expand __powidf2 inline because the exponent is now a constant for all calls. This indeed might explain the difference as __powidf2 is implemented as TYPE NAME (TYPE x, int m) { unsigned int n = m < 0 ? -m : m; TYPE y = n % 2 ? x : 1; while (n >>= 1) { x = x * x; if (n % 2) y = y * x; } return m < 0 ? 1/y : y; } while we expand a constant integral power to an optimal (as of the count of multiplications / additions) series of multiplications and additions. As Fortran allows open-coding of integral powers this is not a bug. But I'll leave final closing as INVALID to more fortran-knowing people.

(In reply to comment #1) > As Fortran allows open-coding of integral powers this is not a bug. But > I'll leave final closing as INVALID to more fortran-knowing people. I agree that Fortran-wise this is not a bug. However, gcc tends to do some of the things that Fortran allows by default (x/y -> x*(1/y) ; x+Y+z -> (x=z)+y ; ..) only at -ffast-math. AFAIK, this seems to be the only optimisation in gcc that causes numerical results with CP2K to change going from -O0 to -O3.

> that causes numerical results with CP2K to change going from -O0 to -O3. If you do expect that optimization optimizes your computation, you should expect some change of the numerical results, so put some tolerance in the comparisons. How small can be the tolerance depends on your problem. I have modified your code to compute the polynomial in three different ways (yours, plus two others, the third one being the recommended one for "numerical stability", if I did not make some errors): FUNCTION F(r,a,e,pf,qf) REAL*8 :: a(2:15) REAL*8 :: r,f,e, pr,pf,qf,rr f=0 pf=0 pr=r rr=1.0d0/r DO i = 2, 15 f = f + a(i)/(r**(i-1)*REAL(i-1,8)) pf = pf + a(i)/(pr*REAL(i-1,8)) pr = r*pr END DO f=f-e pf=pf-e qf=a(15)/REAL(14,8) do i = 14, 2, -1 qf=rr*qf+a(i)/REAL(i-1,8) end do qf=rr*qf-e END FUNCTION F PROGRAM TEST REAL*8 :: a(2:15)=(/-195.771601327700D0, 15343.7861339500D0, & -530864.458651600D0, 10707934.3905800D0, & -140099704.789000D0, 1250943273.78500D0, & -7795458330.67600D0, 33955897217.3100D0, & -101135640744.000D0, 193107995718.700D0, & -193440560940.000D0, -4224406093.91800D0, & 217192386506.500D0, -157581228915.500D0/) REAL*8 :: r=4.51556282882533D0 REAL*8 :: e=-2.21199635966809471000D0 REAL*8 :: pf, qf REAL*8 :: f write(6,*) f(r,a,e,pf,qf) write(6,*) pf, qf END PROGRAM TEST [karma] f90/bug% gfc pr33780_red.f90 [karma] f90/bug% a.out 0.0000000000000000 1.36957112317759311E-012 7.66053886991358013E-012 [karma] f90/bug% gfc -O3 pr33780_red.f90 [karma] f90/bug% a.out -1.14397380457376130E-012 1.36957112317759311E-012 1.04861186749364478E-011 So for your problem the tolerance should be ~1.0E-11, and since pf and qf don't use powers, this is not a question of inlining, but how the operations are shuffled to "optimize" your computation.

> while we expand a constant integral power to an optimal (as of the count of > multiplications / additions) series of multiplications and additions. It seems the difference is coming from something else. I'll bet that with -O3 the powers are computed at compile time with some extra precision. At least it is what I conclude from the following code: integer :: ia(16) = (/ 1, 1, 2, 2, 3, 3, 4, 4, 6, 5, 6, 6, 10, 7, 9, 8 /) real(8) :: a(16), b(16), c(16), d(16) real(16) :: e(16) real(8), parameter :: r=4.51556282882533D0 a(1) = r a(2) = a(1)*a(1) a(3:4) = a(2)*a(1:2) a(5:8) = a(4)*a(1:4) a(9:16) = a(8)*a(1:8) b(1) = r do i = 2, 16 b(i) = b(ia(i))*b(i-ia(i)) end do e(1) = r e(2) = e(1)*e(1) e(3:4) = e(2)*e(1:2) e(5:8) = e(4)*e(1:4) e(9:16) = e(8)*e(1:8) d = real(e,8) do i=1, 16 c(i) = r**i end do do i=1, 16 print *, a(i)/d(i)-1.0d0, b(i)/d(i)-1.0d0, c(i)/d(i)-1.0d0 end do print * print *, sum(abs(a/d-1.0d0)), sum(abs(b/d-1.0d0)), sum(abs(c/d-1.0d0)), epsilon(r) end a stores the powers computed from the usual shift, b those computed from the "optimal" tree (from gcc/builtins.c, c those computed from the power, and d the values rounded from values with higher precision. At -O2 the output is: 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 2.22044604925031308E-016 2.22044604925031308E-016 2.22044604925031308E-016 2.22044604925031308E-016 2.22044604925031308E-016 2.22044604925031308E-016 2.22044604925031308E-016 0.0000000000000000 2.22044604925031308E-016 2.22044604925031308E-016 2.22044604925031308E-016 2.22044604925031308E-016 4.44089209850062616E-016 4.44089209850062616E-016 4.44089209850062616E-016 2.22044604925031308E-016 0.0000000000000000 2.22044604925031308E-016 4.44089209850062616E-016 2.22044604925031308E-016 4.44089209850062616E-016 4.44089209850062616E-016 2.22044604925031308E-016 4.44089209850062616E-016 4.44089209850062616E-016 0.0000000000000000 4.44089209850062616E-016 4.44089209850062616E-016 4.44089209850062616E-016 4.44089209850062616E-016 4.44089209850062616E-016 2.22044604925031308E-016 4.44089209850062616E-016 6.66133814775093924E-016 0.0000000000000000 6.66133814775093924E-016 6.66133814775093924E-016 6.66133814775093924E-016 6.66133814775093924E-016 5.10702591327572009E-015 2.88657986402540701E-015 5.10702591327572009E-015 2.22044604925031308E-016 a and c are equal, while at -O3 the output is 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 2.22044604925031308E-016 2.22044604925031308E-016 0.0000000000000000 2.22044604925031308E-016 2.22044604925031308E-016 0.0000000000000000 2.22044604925031308E-016 0.0000000000000000 0.0000000000000000 2.22044604925031308E-016 2.22044604925031308E-016 0.0000000000000000 4.44089209850062616E-016 4.44089209850062616E-016 0.0000000000000000 2.22044604925031308E-016 0.0000000000000000 0.0000000000000000 4.44089209850062616E-016 2.22044604925031308E-016 0.0000000000000000 4.44089209850062616E-016 2.22044604925031308E-016 0.0000000000000000 4.44089209850062616E-016 0.0000000000000000 0.0000000000000000 4.44089209850062616E-016 4.44089209850062616E-016 0.0000000000000000 4.44089209850062616E-016 2.22044604925031308E-016 0.0000000000000000 6.66133814775093924E-016 0.0000000000000000 0.0000000000000000 6.66133814775093924E-016 6.66133814775093924E-016 0.0000000000000000 5.10702591327572009E-015 2.88657986402540701E-015 0.0000000000000000 2.22044604925031308E-016 c and d are equal. If the difference were coming only from the "optimal" tree, c should be equal to b.

The difference with your last testcase is indeed that we unroll the loop setting c[] and constant-fold the result by computing the integral powers with 0.5ulp precision (and a single rounding(!)). But all your other computations involve intermediate roundings, so this is still invalid IMHO. Or are you complaining that we constant fold integer powers as if they were non-integer powers (not using the fact that Fortran would allow expansion to multiple FLOPs)?

(In reply to comment #5) > Or are you complaining that we constant fold integer powers as if they were > non-integer powers Richard, I think I'm suggesting more a enhancement than a bug fix. Some good quality compiler like IBM's xlf90 guarantee bit-wise identical results at all optimisation levels, unless one specifies -qno-strict (which is included by default). My idea was/is that optimization independent results are (on the right hardware, i.e. not x87) also provided by gcc, unless one specifies -ffast-math. The above testcase is the only one I found in our 500kLOC simulation package that changed results as I changed optimisation levels, so I think it is valuable to make the underlying optimisation depending on a compile time flag (which is included in -ffast-math). From a Fortran language point of view the compiler is free to use any implementation to compute a**i, including one that varies with optimisation level or context.