Several libgo tests randomly fail with unhandled EINTR on Solaris 2: --- FAIL: net.TestTCPServer Test tcp 0.0.0.0 127.0.0.1 Test tcp 127.0.0.1 Test tcp [::] [::ffff:127.0.0.1] Test tcp [::] 127.0.0.1 net.Dial("tcp", "", "127.0.0.1:41117") = _, dial tcp 127.0.0.1:41117: Interrupted system call --- FAIL: http.TestServeFile http://127.0.0.1:56313/ServeFile send: dial tcp 127.0.0.1:56313: Interrupted system call /vol/gcc/src/hg/trunk/solaris/libgo/runtime/thread.c:36: libgo assertion failure The last can be handled like this: diff -r 350a6b184ae7 libgo/runtime/thread.c --- a/libgo/runtime/thread.c Mon Mar 07 00:16:29 2011 +0100 +++ b/libgo/runtime/thread.c Mon Mar 07 08:37:04 2011 +0100 @@ -1,7 +1,8 @@ -// Copyright 2010 The Go Authors. All rights reserved. +// Copyright 2010, 2011 The Go Authors. All rights reserved. // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE file. +#include <errno.h> #include "runtime.h" #include "go-assert.h" @@ -32,8 +33,16 @@ static void runtime_lock_full(Lock *l) { - if(sem_wait(&l->sem) != 0) + int i; + + for(;;) { + i = sem_wait(&l->sem); + if (i >= 0) + break; + if (i < 0 && errno == EINTR) + continue; runtime_throw("sem_wait failed"); + } } void but I've yet to find out where the other unhandled EINTRs occur. Running the affected tests under truss distorts the timing so much the failures don't occur anymore.
Author: ian Date: Wed Mar 9 06:31:37 2011 New Revision: 170810 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=170810 Log: PR go/48019 Ignore EINTR in runtime_lock_full. Modified: trunk/libgo/runtime/thread.c
Author: ian Date: Wed Mar 9 06:57:04 2011 New Revision: 170811 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=170811 Log: PR go/48019 Ignore EINTR in socket connect. Modified: trunk/libgo/go/net/sock.go
This should be fixed now. Thanks.
Come to think of it, the code sets SA_RESTART when it calls sigaction. Is SA_RESTART defined on Solaris? Do I have to #define any special macros to get it to be defined?
> --- Comment #4 from Ian Lance Taylor <ian at airs dot com> 2011-03-09 18:21:44 UTC --- > Come to think of it, the code sets SA_RESTART when it calls sigaction. Is > SA_RESTART defined on Solaris? Do I have to #define any special macros to get > it to be defined? It is defined, given the following conditions: #if defined(__EXTENSIONS__) || defined(_KERNEL) || \ (!defined(_STRICT_STDC) && !defined(_POSIX_C_SOURCE)) || \ defined(_XPG4_2) I.e. it is missing in a strict ANSI C/non-XPG 4.2+ environment. Rainer
Thanks, so it sounds like SA_RESTART should be defined when compiling libgo. The relevant file is libgo/runtime/go-go.c. The Solaris man page that I found on the web suggests that even when SA_RESTART is set, the connect and accept calls are not restarted. That is, it lists the set of calls which are restarted, and that list does not include connect and accept. Can you confirm or deny this behaviour?
I don't have access to a Solaris system. I would be very interested in hearing whether connect and accept calls are restarted after a signal handler installed with SA_RESTART, or whether they return EINTR. The current code in libgo seems likely to be wrong, and I'd like to fix it. I will attach a program; can you run it on a Solaris system?
Created attachment 23683 [details] test On GNU/Linux this program prints connect: Connection timed out
Added sys/types.h and sys/socket.h to get AF_INET and SOCK_STREAM. $ gcc -o foo2 foo2.c -lsocket $ ./foo2 connect: Interrupted system call
Thanks. It looks like the Solaris connect call does not honor SA_RESTART for some reason.
> --- Comment #10 from Ian Lance Taylor <ian at airs dot com> 2011-03-17 02:58:56 UTC --- > Thanks. It looks like the Solaris connect call does not honor SA_RESTART for > some reason. It's not only Solaris, though: Tru64 UNIX V5.1B: connect: Interrupted system call but: IRIX 6.5: connect: Connection timed out Rainer
Created attachment 23825 [details] accept testcase OK, so now we know that connect ignores SA_RESTART, but I forgot to ask about accept. Can you try this program as well? On GNU/Linux it prints "accept succeeded". Thanks.
> OK, so now we know that connect ignores SA_RESTART, but I forgot to ask about > accept. Can you try this program as well? > > On GNU/Linux it prints "accept succeeded". Sure: on Solaris 8/x86 and Solaris 11/SPARC, I get accept: Interrupted system call On IRIX 6.5 and Tru64 UNIX V5.1, I get accept succeeded Rainer
Thanks for running the tests. Now that I look at the current library, the connect and accept system calls are only ever run on a nonblocking socket, so they will never wait (this was not true before). So I think this problem should be all fixed with the current libgo sources. Let me know if you discover otherwise.