Bug 32057 - Random failure on gfortran.dg/secnds.f
Summary: Random failure on gfortran.dg/secnds.f
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: testsuite (show other bugs)
Version: 4.3.0
: P3 normal
Target Milestone: ---
Assignee: Jerry DeLisle
URL:
Keywords:
: 32196 32202 32206 (view as bug list)
Depends on:
Blocks:
 
Reported: 2007-05-23 14:52 UTC by H.J. Lu
Modified: 2007-06-25 03:02 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2007-05-30 02:09:54


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2007-05-23 14:52:18 UTC
I got random failure on gfortran.dg/secnds.f, which has

      t1 = secnds (0.0)
      call date_and_time (dum1, dum2, dum3, values)
      t1a = secnds (0.0)
      dat1 = 0.001*real (values(8)) + real (values(7)) +
     &        60.0*real (values(6)) + 3600.0* real (values(5))
      if (((dat1 - t1) < 0.) .or. ((dat1 - t1) > (t1a - t1))) call abort ()
      do j=1,10000
        do i=1,10000
        end do
      end do
      t2a = secnds (t1a)
      call date_and_time (dum1, dum2, dum3, values)
      t2 = secnds (t1)
      dat2 = 0.001*real (values(8)) + real (values(7)) +
     &        60.0*real (values(6)) + 3600.0* real (values(5))
      if (((dat2 - dat1) < t2a) .or. ((dat2 - dat1) > t2)) call abort ()

I assume the dummy loop is used for delay. I don't think it is that reliable.
Can't we use sleep here?
Comment 1 Dominique d'Humieres 2007-05-23 15:34:48 UTC
> I assume the dummy loop is used for delay. I don't think it is that reliable.

I think you are right about the delay, but not about the real problem: you have the same with secnds-1.f which does not cantain the loop. The goal of the tests is not to measure some time, but to check that intervals are properly ordered, i.e., t1<=dat1<=t1a and t2a<=dat2-dat1<= t2.

I have started to investigate the problem, see the thread:

http://gcc.gnu.org/ml/fortran/2007-05/msg00216.html

I have started to prepare a report of my findings, but I got distracted because gcc is no longer building on Darwin. 

The main cause of failures is because the tests do not account for round off errors.  This can easily handled for first test by comparing to the nearest values from below and above: the failure rate seems below a couple ppb (part per billion) measured on secnds-1.f, using a naive way to measure it. I think this is due to the fact that t1 and t1a are absolute times.  For the second test I had to use wider tolerances (such as spacing(24.0*3600.0)) and even so I get some errors on Pentium machines ~1/10000.

After having removed this main source of failure I have found three more:
(1) the tests do not handle timing around midnight when the counts go from 86400.0 to 0.0,
(2) there is a rounding problem in secnds such that secnds(86400.0) returns 86400.0 during the 3ms before midnight,
(3) on some machine the clock synchronization put the tests in jeopardy when the clock has a negative "lag": the value returned "after" is before the value returned "before"! I think this is a detectable but unrecoverable error and, when detected, the tests should be skipped and repeated.  I have a fix for the two other cases, although (2) should probably handled at the secnds level.

In top of the report, I have started to prepare a program to do some testing on platforms I have no access to (I have used PPC Darwin, and AMD64 and Pentium Linux). I was planning to join it to the report (Encouragements are welcome !-)
Comment 2 kargls 2007-05-23 16:05:38 UTC
>
> Can't we use sleep here?
> 

No.
Comment 3 Dominique d'Humieres 2007-05-28 19:52:47 UTC
A partial fix can be found at

http://gcc.gnu.org/ml/gcc-patches/2007-05/msg01873.html

Some discussions can be found by following the thread.
Comment 4 Dominique d'Humieres 2007-05-29 20:25:09 UTC
Following the Steve Kargl's suggestion in

http://gcc.gnu.org/ml/gcc-patches/2007-05/msg01945.html

I have done the following test:

[archimede] test/fortran> cat > sec_prec_1.f90
implicit none
integer j, k, l, m, n
integer i, ifa, ifb, ifc
real a, b, c
i = 0 ; ifa = 0 ; ifb = 0 ; ifc = 0
do m = 0, 23
   do l = 0, 59
      do k = 0, 59
         do j = 0, 999
             i = i + 1
             a = 3600.0*real(m)+ 60.0*real(l) + real(k) + 0.001*real(j)
             b = 0.001*real(j) + real(k) + 60.0*real(l) + 3600.0*real(m)
             c = (3600000*m + 60000*l + 1000*k + j)/1000.0
             if (b /= c) ifa = ifa + 1
             if (a /= c) ifb = ifb + 1
             if (a /= b) ifc = ifc + 1
         end do
      end do
   end do
end do
print *, i, ifa, ifb, ifc
end

On PPC OSX and AMD64 Linux (with and without -O), I get:

    86400000    19312016    17523130     2683738

It seems hard to decide what is the right solution, expect that the same has to be used in the test cases. On a Pentium Linux, I get the same result if I use -ffloat-store. Without this flag I get:

    86400000        1934        1934           0

without -O and

    86400000    86313600    86313600           0

with -O.  Since I understand that in the later cases, the right hand side is computed as real(10), hence a and b are the same after rounding to real(4). However I don't see why the optimization has such a dramatic effect.

I have also run the following test:

[archimede] test/fortran> cat > sec_prec.f90
integer i, j, k, l, m, n
real a, b, c
k = 59
l = 59
m = 23
do i = 0, 9
   a = 86399.0 + 0.001*real(990+i)
   b = 0.001*real(990+i) + real(k) + 60.0*real(l) + 3600.0*real(m)
   c = (86399990 + i)/1000.0
   print '(I3,3(1PG20.10))', i, a, b, c
end do
end

which gives:

  0     86399.99219         86399.99219         86399.99219    
  1     86399.99219         86399.99219         86399.99219    
  2     86399.99219         86399.99219         86399.99219    
  3     86399.99219         86399.99219         86399.99219    
  4     86399.99219         86399.99219         86399.99219    
  5     86399.99219         86399.99219         86399.99219    
  6     86399.99219         86400.00000         86400.00000    
  7     86400.00000         86400.00000         86400.00000    
  8     86400.00000         86400.00000         86400.00000    
  9     86400.00000         86400.00000         86400.00000    

i.e., the last 3/4 ms gives 86400.0, hence the result of secnds(0.0) is inside the interval [0.0,86400.0] and not inside [0.0,86400.0[ as I naively expected. This is the cryptic point (2) in my comment #1.

A side effect is that if your are within this time frame and do

t1=secnds(0.0); t2=secnds(t1)

you get 86400.0 instead of 0.0. A possible solution is to add a t1=min(86399.996,t1) to force the output in the semiopen interval.  With the reversal of the time computation and putting everything in float, it leads to the following patch:

--- gcc-4.3-20070525/libgfortran/intrinsics/date_and_time.c     Fri Apr  6 18:47:23 2007
+++ gcc-4.3-20070526/libgfortran/intrinsics/date_and_time.c     Mon May 28 21:40:46 2007
@@ -341,12 +341,15 @@
 
   free_mem (avalues);
 
-  temp1 = 3600.0 * (GFC_REAL_4)values[4] +
-           60.0 * (GFC_REAL_4)values[5] +
-                  (GFC_REAL_4)values[6] +
-          0.001 * (GFC_REAL_4)values[7];
-  temp2 = fmod (*x, 86400.0);
-  temp2 = (temp1 - temp2 >= 0.0) ? temp2 : (temp2 - 86400.0);
+  temp1 =  0.001f * (GFC_REAL_4)values[7] +
+                   (GFC_REAL_4)values[6] +
+           60.0f * (GFC_REAL_4)values[5] +
+         3600.0f * (GFC_REAL_4)values[4];
+          
+  /* Fix the round-off errors for the 3ms before midnight. */
+  temp1= (86399.996f>=temp1)? temp1 : 86399.996f;
+  temp2 = fmod (*x, 86400.0f);
+  temp2 = (temp1 - temp2 >= 0.0f) ? temp2 : (temp2 - 86400.0f);
   return temp1 - temp2;
 }

I have implemented part of it (without the reversal for temp1) on my last build on OSX and it seems to work. For the other platforms I rely on binaries, so I need some outside help to test the patch on them!-)
Comment 5 Jerry DeLisle 2007-05-30 02:09:54 UTC
I would like to do a couple of tests here,  Then I will "oversee" this patch.
Comment 6 Jerry DeLisle 2007-05-30 03:37:08 UTC
I modified the program given in comment #4 to display the exact values at the end of the cycle along with the three methods.  Method "a" and method "c" give identical results.  Method "b" has a different phase and its the end point sooner.  The exact true value is shown in the fourth column of the output below.

-- snippet --

             a = 3600.0*real(m)+ 60.0*real(l) + real(k) + 0.001*real(j)
             b = 0.001*real(j) + real(k) + 60.0*real(l) + 3600.0*real(m)
             c = real(3600*m + 60*l + k)
	     c = c + real(j)/1000.0
	     
	     exact1 = 3600*m + 60*l + k
	     exact2 = j
	     write(6,100)a, b, c, exact1, exact2
         100 format (3(f11.5,2x),i5,".",i3)

Result:

86399.98438  86399.98438  86399.98438  86399.988
86399.99219  86399.99219  86399.99219  86399.989
86399.99219  86399.99219  86399.99219  86399.990
86399.99219  86399.99219  86399.99219  86399.991
86399.99219  86399.99219  86399.99219  86399.992
86399.99219  86399.99219  86399.99219  86399.993
86399.99219  86399.99219  86399.99219  86399.994
86399.99219  86399.99219  86399.99219  86399.995
86399.99219  86400.00000  86399.99219  86399.996
86400.00000  86400.00000  86400.00000  86399.997
86400.00000  86400.00000  86400.00000  86399.998
86400.00000  86400.00000  86400.00000  86399.999


a and c are actually a little more accurate at the end.  Its also interesting that this deviation between the methods appears to happen only near the end of the sequence.  I have visually scanned in quite a few places and all three are identical until this point.
Comment 7 Dominique d'Humieres 2007-05-30 21:35:48 UTC
> a and c are actually a little more accurate at the end. 
> Its also interesting that this deviation between the methods 
> appears to happen only near the end of the sequence.  
> I have visually scanned in quite a few places and all three are
> identical until this point.

What I expect is:
below 16384s ~4.5 hours, all the results are equal almost all the time,
below 32768s ~9 hours, all the results are equal +/- ~1 ms,
below 65536s ~18 hours, all the results are equal +/- ~2ms,
and for the following ~6 hours, all the results are equal +/- ~4ms (~spacing(65536.0)/2.0).

Does it fit with what you saw?
Comment 8 Andrew Pinski 2007-06-03 17:07:14 UTC
*** Bug 32196 has been marked as a duplicate of this bug. ***
Comment 9 Andrew Pinski 2007-06-04 08:37:46 UTC
*** Bug 32202 has been marked as a duplicate of this bug. ***
Comment 10 Andrew Pinski 2007-06-04 16:11:30 UTC
*** Bug 32206 has been marked as a duplicate of this bug. ***
Comment 11 Jerry DeLisle 2007-06-06 00:54:43 UTC
Subject: Bug 32057

Author: jvdelisle
Date: Wed Jun  6 00:54:30 2007
New Revision: 125352

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=125352
Log:
2007-06-05  Jerry DeLisle  <jvdelisle@gcc.gnu.org>

	PR testsuite/32057
	* gfortran.dg/secnds.f: Update test to avoid roundoff errors and
	midnight issues.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gfortran.dg/secnds.f

Comment 12 Jerry DeLisle 2007-06-06 01:26:38 UTC
Lets give this some time to test on other platforms and see.  If it works, we can port to 4.2 to get rid of the irritant factor. 
Comment 13 Rob 2007-06-19 17:11:34 UTC
>>The goal of the tests is not to measure some time, but to check that
>>intervals are properly ordered, i.e., t1<=dat1<=t1a and t2a<=dat2-dat1<= t2.

If that is the "goal" then could we eliminate all influence of time (midnight / leap seconds / end of year, and whatever random issues occur) by using a procedure similar to what is described in: gcc-4_3-trunk/libmudflap/testsuite/lib/mfdg.exp ?


# Indicate that this test case is to be rerun several times.  This
# is useful if it is nondeterministic.  This applies to rerunning the
# test program only, not rebuilding it.
# The embedded format is "{ dg-repetitions N }", where N is the number
# of repetitions.  It better be greater than zero.
#
proc dg-repetitions { line value } {
    upvar dg-repetitions repetitions
    set repetitions $value
}

#
# Indicate that this test case is to be run with a short timeout.
# The embedded format is "{ dg-timeout N }", where N is in seconds.
#
proc dg-timeout { line value } {
    global dg-timeout
    set dg-timeout $value
}

# dejagnu's config/unix.exp hard-codes 300 seconds as the timeout
# for any natively run executable.  That's too long for tests run
# multiple times and that may possibly hang.  So we override it here
# to provide some degree of control.


We could run the test 5 times and if we got at least one pass we could write "PASS:" and use "WARNING:" to write the number of failures. If we get 5 failures then we just write "FAIL:".

The last time I ran "make -i check" I got a pass on this test, the time before a fail; previously it's been hit and miss. The above technique will catch random failures better and avoid false positives.
Comment 14 Rob 2007-06-20 22:33:40 UTC
GCC version 4.3.0 20070620
http://gcc.gnu.org/ml/gcc-testresults/2007-06/msg00942.html
FAIL:  gfortran.dg/secnds-1.f  -O1  execution test
Comment 15 Jerry DeLisle 2007-06-21 01:06:25 UTC
I have not patched secnds-1.f yet
Comment 16 Jerry DeLisle 2007-06-25 03:02:25 UTC
I have now committed the patch to fix secnds-1.f.  If this clears it up on 4.3, I will backport to 4.2