Bug 48019 - Need to handle EINTR in libgo testsuite
Summary: Need to handle EINTR in libgo testsuite
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: go (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: ---
Assignee: Ian Lance Taylor
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-07 14:19 UTC by Rainer Orth
Modified: 2011-03-31 22:25 UTC (History)
1 user (show)

See Also:
Host: i386-pc-solaris2.11
Target: i386-pc-solaris2.11
Build: i386-pc-solaris2.11
Known to work:
Known to fail:
Last reconfirmed: 2011-03-17 02:58:56


Attachments
test (449 bytes, text/plain)
2011-03-16 20:33 UTC, Ian Lance Taylor
Details
accept testcase (636 bytes, text/plain)
2011-03-30 21:05 UTC, Ian Lance Taylor
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rainer Orth 2011-03-07 14:19:59 UTC
Several libgo tests randomly fail with unhandled EINTR on Solaris 2:

--- FAIL: net.TestTCPServer
	Test tcp 0.0.0.0 127.0.0.1
	Test tcp  127.0.0.1
	Test tcp [::] [::ffff:127.0.0.1]
	Test tcp [::] 127.0.0.1
	net.Dial("tcp", "", "127.0.0.1:41117") = _, dial tcp 127.0.0.1:41117: Interrupted system call

--- FAIL: http.TestServeFile
	http://127.0.0.1:56313/ServeFile send: dial tcp 127.0.0.1:56313: Interrupted system call

/vol/gcc/src/hg/trunk/solaris/libgo/runtime/thread.c:36: libgo assertion failure

The last can be handled like this:

diff -r 350a6b184ae7 libgo/runtime/thread.c
--- a/libgo/runtime/thread.c	Mon Mar 07 00:16:29 2011 +0100
+++ b/libgo/runtime/thread.c	Mon Mar 07 08:37:04 2011 +0100
@@ -1,7 +1,8 @@
-// Copyright 2010 The Go Authors. All rights reserved.
+// Copyright 2010, 2011 The Go Authors. All rights reserved.
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
+#include <errno.h>
 #include "runtime.h"
 #include "go-assert.h"
 
@@ -32,8 +33,16 @@
 static void
 runtime_lock_full(Lock *l)
 {
-	if(sem_wait(&l->sem) != 0)
+	int i;
+
+        for(;;) {
+	  	i = sem_wait(&l->sem);
+	    	if (i >= 0)
+			break;
+		if (i < 0 && errno == EINTR)
+	      		continue;
 		runtime_throw("sem_wait failed");
+	}
 }
 
 void

but I've yet to find out where the other unhandled EINTRs occur.  Running the
affected tests under truss distorts the timing so much the failures don't occur
anymore.
Comment 1 ian@gcc.gnu.org 2011-03-09 06:31:46 UTC
Author: ian
Date: Wed Mar  9 06:31:37 2011
New Revision: 170810

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=170810
Log:
	PR go/48019
Ignore EINTR in runtime_lock_full.

Modified:
    trunk/libgo/runtime/thread.c
Comment 2 ian@gcc.gnu.org 2011-03-09 06:57:08 UTC
Author: ian
Date: Wed Mar  9 06:57:04 2011
New Revision: 170811

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=170811
Log:
	PR go/48019
Ignore EINTR in socket connect.

Modified:
    trunk/libgo/go/net/sock.go
Comment 3 Ian Lance Taylor 2011-03-09 06:57:44 UTC
This should be fixed now.

Thanks.
Comment 4 Ian Lance Taylor 2011-03-09 18:21:44 UTC
Come to think of it, the code sets SA_RESTART when it calls sigaction.  Is SA_RESTART defined on Solaris?  Do I have to #define any special macros to get it to be defined?
Comment 5 ro@CeBiTec.Uni-Bielefeld.DE 2011-03-09 18:30:12 UTC
> --- Comment #4 from Ian Lance Taylor <ian at airs dot com> 2011-03-09 18:21:44 UTC ---
> Come to think of it, the code sets SA_RESTART when it calls sigaction.  Is
> SA_RESTART defined on Solaris?  Do I have to #define any special macros to get
> it to be defined?

It is defined, given the following conditions:

#if defined(__EXTENSIONS__) || defined(_KERNEL) || \
        (!defined(_STRICT_STDC) && !defined(_POSIX_C_SOURCE)) || \
        defined(_XPG4_2)

I.e. it is missing in a strict ANSI C/non-XPG 4.2+ environment.

	Rainer
Comment 6 Ian Lance Taylor 2011-03-09 19:30:54 UTC
Thanks, so it sounds like SA_RESTART should be defined when compiling libgo.  The relevant file is libgo/runtime/go-go.c.

The Solaris man page that I found on the web suggests that even when SA_RESTART is set, the connect and accept calls are not restarted.  That is, it lists the set of calls which are restarted, and that list does not include connect and accept.  Can you confirm or deny this behaviour?
Comment 7 Ian Lance Taylor 2011-03-16 20:32:38 UTC
I don't have access to a Solaris system.  I would be very interested in hearing whether connect and accept calls are restarted after a signal handler installed with SA_RESTART, or whether they return EINTR.  The current code in libgo seems likely to be wrong, and I'd like to fix it.  I will attach a program; can you run it on a Solaris system?
Comment 8 Ian Lance Taylor 2011-03-16 20:33:31 UTC
Created attachment 23683 [details]
test

On GNU/Linux this program prints

connect: Connection timed out
Comment 9 Sean McGovern 2011-03-16 23:05:56 UTC
Added sys/types.h and sys/socket.h to get AF_INET and SOCK_STREAM.

$ gcc -o foo2 foo2.c -lsocket
$ ./foo2
connect: Interrupted system call
Comment 10 Ian Lance Taylor 2011-03-17 02:58:56 UTC
Thanks.  It looks like the Solaris connect call does not honor SA_RESTART for some reason.
Comment 11 ro@CeBiTec.Uni-Bielefeld.DE 2011-03-18 17:21:42 UTC
> --- Comment #10 from Ian Lance Taylor <ian at airs dot com> 2011-03-17 02:58:56 UTC ---
> Thanks.  It looks like the Solaris connect call does not honor SA_RESTART for
> some reason.

It's not only Solaris, though:

Tru64 UNIX V5.1B:

connect: Interrupted system call

but:

IRIX 6.5:

connect: Connection timed out

	Rainer
Comment 12 Ian Lance Taylor 2011-03-30 21:05:39 UTC
Created attachment 23825 [details]
accept testcase

OK, so now we know that connect ignores SA_RESTART, but I forgot to ask about accept.  Can you try this program as well?

On GNU/Linux it prints "accept succeeded".

Thanks.
Comment 13 ro@CeBiTec.Uni-Bielefeld.DE 2011-03-31 16:00:53 UTC
> OK, so now we know that connect ignores SA_RESTART, but I forgot to ask about
> accept.  Can you try this program as well?
>
> On GNU/Linux it prints "accept succeeded".

Sure: on Solaris 8/x86 and Solaris 11/SPARC, I get

accept: Interrupted system call

On IRIX 6.5 and Tru64 UNIX V5.1, I get

accept succeeded

	Rainer
Comment 14 Ian Lance Taylor 2011-03-31 22:25:09 UTC
Thanks for running the tests.  Now that I look at the current library, the connect and accept system calls are only ever run on a nonblocking socket, so they will never wait (this was not true before).  So I think this problem should be all fixed with the current libgo sources.  Let me know if you discover otherwise.