This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Thread.sleep() and System.nanoTime() fixes


This patch is motivated by the intermittent Thread_Sleep failures on
darwin.

There were two interrelated problems.  Firstly, the system
pthread_cond_timedwait uses a different underlying clock than the
realtime one; the clock used appears to be uncorrected, so if you have
a system that runs fast, your timedwaits will be a little shorter than
you asked for (by as much as 200 parts per million or so).

Normally, no-one would care, because the overhead is greater than the
difference.  However, there was another bug in _Jv_CondWait that was
also shortening the time for a sleep.  _Jv_CondWait was taking the
requested sleep time in milliseconds, and adding it to the system time
in milliseconds to get the time to wake up.  The problem with this is
that if the system wakes a thread up at exactly the time computed, it
won't have slept for the full time requested; the sleep might be up to
a millisecond short.

The combination of these was that the following would happen:

1. Program asks for time.  Actual time is, say, 123.456700 seconds.
   Program gets time in milliseconds, 123.456 seconds.
2. Program asks to sleep for 50 milliseconds = 0.050 seconds.
3. _Jv_CondWait would turn this into a request to sleep until
   123.506000 seconds (instead of 123.506700 seconds).
4. Darwin would wake task up at time 123.505990 seconds.
5. Program asks for time, and gets time as 123.505 seconds, and says
   "hey, that was only a 0.049 second sleep!"

I fixed the _Jv_CondWait problem by (a) using a more accurate clock,
and (b) when only milliseconds are available (which might not actually
happen on any system using this routine) adding half a millisecond to
the current time to reduce the maximum size of the error.

Making this much more confusing was yet another bug, which is that on
Darwin System.nanoTime() was actually returning microseconds (10^-6
second) not nanoseconds (10^-9 second), and only doing it to
millisecond accuracy anyway.  On platforms where clock_gettime is
available, it was even worse: it was returning nanoseconds, plus
1/1000 of the number of seconds, so occasionally time would go
backwards.

So, the effect of this patch is that the Thread_Sleep failures become
much less common.  They now happen about 1/60 of the time on my
machine instead of 1/5 of the time.

The remaining problem on Darwin (that the system sometimes comes back
early) is being tracked as <rdar://problems/4728069>; I can confirm it
is a general platform problem because the system Java interpreter
fails the new testcase too.

The new testcase does fail every time for me on Darwin, with
output like:

failed, iteration 89, time 9981000ns

due to the Darwin bug.  If we feel this should be suppressed
because Java doesn't guarantee such exact timing, it's easy to tweak
the value in the testcase; it would probably pass every time if it was
comparing against, say, 9900000 instead of 10000000.

I'll commit after running a full testrun, unless someone has an objection.
-- 
- Geoffrey Keating <geoffk@apple.com>

===File ~/patches/libjava-microseconds.patch================
2006-09-13  Geoffrey Keating  <geoffk@apple.com>

	* posix.cc (_Jv_platform_nanotime): Return nanoseconds, not
	microseconds; use gettimeofday when available.
	* posix-threads.cc (_Jv_CondWait): Improve accuracy and range of
	timeout calculation.
	* testsuite/libjava.lang/Thread_Sleep_2.java: New.
	* testsuite/libjava.lang/Thread_Sleep_2.out: New.

Index: posix.cc
===================================================================
--- posix.cc	(revision 116922)
+++ posix.cc	(working copy)
@@ -87,12 +87,20 @@
   if (clock_gettime (id, &now) == 0)
     {
       jlong result = (jlong) now.tv_sec;
-      result = result * 1000 * 1000 + now.tv_nsec;
+      result = result * 1000000000LL + now.tv_nsec;
       return result;
     }
   // clock_gettime failed, but we can fall through.
 #endif // HAVE_CLOCK_GETTIME
-  return _Jv_platform_gettimeofday () * 1000LL;
+#if defined (HAVE_GETTIMEOFDAY)
+ {
+   timeval tv;
+   gettimeofday (&tv, NULL);
+   return (tv.tv_sec * 1000000000LL) + tv.tv_usec * 1000LL;
+ }
+#else
+  return _Jv_platform_gettimeofday () * 1000000LL;
+#endif
 }
 
 // Platform-specific VM initialization.
Index: testsuite/libjava.lang/Thread_Sleep_2.out
===================================================================
--- testsuite/libjava.lang/Thread_Sleep_2.out	(revision 0)
+++ testsuite/libjava.lang/Thread_Sleep_2.out	(revision 0)
@@ -0,0 +1 @@
+ok
Index: testsuite/libjava.lang/Thread_Sleep_2.java
===================================================================
--- testsuite/libjava.lang/Thread_Sleep_2.java	(revision 0)
+++ testsuite/libjava.lang/Thread_Sleep_2.java	(revision 0)
@@ -0,0 +1,34 @@
+// Test that Thread.sleep() is accurate
+// and that nanoTime actually measures in nanoseconds.
+
+public class Thread_Sleep_2
+{
+  public static void main(String args[])
+  {
+    try
+    {
+      boolean ok = true;
+      for (int i = 0; i < 100; i++)
+	{
+	  long start = System.nanoTime();
+	  Thread.sleep(10);
+	  long end = System.nanoTime();
+	  if ((end - start) < 10000000)
+	    {
+	      System.out.print ("failed, iteration ");
+	      System.out.print (i);
+	      System.out.print (", time ");
+	      System.out.print (end - start);
+	      System.out.println ("ns");
+	      ok = false;
+	    }
+	}
+      if (ok)
+	System.out.println ("ok");
+    }
+    catch (InterruptedException x)
+    {
+      System.out.println("error: Thread interrupted.");
+    }
+  }
+}
Index: posix-threads.cc
===================================================================
--- posix-threads.cc	(revision 116922)
+++ posix-threads.cc	(working copy)
@@ -96,17 +96,30 @@
   if (millis > 0 || nanos > 0)
     {
       // Calculate the abstime corresponding to the timeout.
-      // Everything is in milliseconds.
-      //
-      // We use `unsigned long long' rather than jlong because our
-      // caller may pass up to Long.MAX_VALUE millis.  This would
-      // overflow the range of a jlong when added to the current time.
-      
-      unsigned long long startTime 
-	= (unsigned long long)java::lang::System::currentTimeMillis();
-      unsigned long long m = (unsigned long long)millis + startTime;
-      unsigned long long seconds = m / 1000; 
+      unsigned long long seconds;
+      unsigned long usec;
 
+      // For better accuracy, should use pthread_condattr_setclock
+      // and clock_gettime.
+#ifdef HAVE_GETTIMEOFDAY
+      timeval tv;
+      gettimeofday (&tv, NULL);
+      usec = tv.tv_usec;
+      seconds = tv.tv_sec;
+#else
+      unsigned long long startTime = java::lang::System::currentTimeMillis();
+      seconds = startTime / 1000;
+      /* Assume we're about half-way through this millisecond.  */
+      usec = (startTime % 1000) * 1000 + 500;
+#endif
+      /* These next two statements cannot overflow.  */
+      usec += nanos / 1000;
+      usec += (millis % 1000) * 1000;
+      /* These two statements could overflow only if tv.tv_sec was
+	 insanely large.  */
+      seconds += millis / 1000;
+      seconds += usec / 1000000;
+
       ts.tv_sec = seconds;
       if (ts.tv_sec < 0 || (unsigned long long)ts.tv_sec != seconds)
         {
@@ -115,10 +128,8 @@
           millis = nanos = 0;
         }
       else
-        {
-          m %= 1000;
-          ts.tv_nsec = m * 1000000 + (unsigned long long)nanos;
-        }
+	/* This next statement also cannot overflow.  */
+	ts.tv_nsec = (usec % 1000000) * 1000 + (nanos % 1000);
     }
 
   _Jv_Thread_t *current = _Jv_ThreadCurrentData ();
============================================================


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]