Many (most) 64-bit Go tests now FAIL with a SEGV on Solaris, both SPARC and x86, here shown on the example of the bufio test: * i386-pc-solaris2.10: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 2 (LWP 2)] runtime_netpoll (block=block@entry=0 '\000') at /vol/gcc/src/hg/trunk/solaris/libgo/runtime/netpoll_select.c:143 143 __builtin_memcpy(&rfds, &fds, sizeof fds); (gdb) where #0 runtime_netpoll (block=block@entry=0 '\000') at /vol/gcc/src/hg/trunk/solaris/libgo/runtime/netpoll_select.c:143 #1 0xfffffd7ffec0e95a in sysmon () at /vol/gcc/src/hg/trunk/solaris/libgo/runtime/proc.c:2707 #2 0xfffffd7ffec0d378 in runtime_mstart (mp=0xc210212000) at /vol/gcc/src/hg/trunk/solaris/libgo/runtime/proc.c:1016 #3 0xfffffd7ffe4dd9db in _thr_setup () from /lib/64/libc.so.1 #4 0xfffffd7ffe4ddc10 in ?? () from /lib/64/libc.so.1 #5 0x0000000000000000 in ?? () (gdb) p rfds Cannot access memory at address 0xfffffd7ffe0f9f00 (gdb) p fds $1 = {fds_bits = {0 <repeats 1024 times>}} * sparc-sun-solaris2.11: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 2 (LWP 2)] runtime_netpoll (block=block@entry=0 '\000') at /vol/gcc/src/hg/trunk/solaris/libgo/runtime/netpoll_select.c:153 153 __builtin_memset(&timeout, 0, sizeof timeout); (gdb) where #0 runtime_netpoll (block=block@entry=0 '\000') at /vol/gcc/src/hg/trunk/solaris/libgo/runtime/netpoll_select.c:153 #1 0xfffffffd591bcd6c in sysmon () at /vol/gcc/src/hg/trunk/solaris/libgo/runtime/proc.c:2707 #2 0xfffffffd591bb3a8 in runtime_mstart (mp=0xc210212000) at /vol/gcc/src/hg/trunk/solaris/libgo/runtime/proc.c:1016 #3 0xffffffff7ede276c in _lwp_start () from /lib/64/libc.so.1 #4 0xffffffff7ede276c in _lwp_start () from /lib/64/libc.so.1 Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) p timeout Cannot access memory at address 0xffffffff71cfbd50 I have no idea what might be wrong; the same tests works perfectly fine for 32-bit. Rainer
I've found what's going on: when I look at the failing bufio test, gdb prints gdb) p rfds Cannot access memory at address 0xfffffd7ffe0f9f00 With pmap, I see the following mappings: FFFFFD7FFDE00000 2048K rw--- [ anon ] FFFFFD7FFE101000 4K rw--R [ stack tid=2 ] FFFFFD7FFE110000 64K rw--- [ anon ] I.e. the thread stack starts off with just 4 kB, but rfds is 0x7100 bytes from the top of the stack, way beyond the initial allocation and thus unmapped. Each fd_set is 8 kB for 64-bit, so the stack consumption in netpoll_select.c (runtime_netpoll) is way out of bounds. As a quick hack, I've increased the initial stack size to StackMin: diff --git a/libgo/runtime/proc.c b/libgo/runtime/proc.c --- a/libgo/runtime/proc.c +++ b/libgo/runtime/proc.c @@ -185,7 +185,7 @@ runtime_newosproc(M *mp) if(pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) != 0) runtime_throw("pthread_attr_setdetachstate"); - stacksize = PTHREAD_STACK_MIN; + stacksize = StackMin /* PTHREAD_STACK_MIN */; // With glibc before version 2.16 the static TLS size is taken // out of the stack size, and we get an error or a crash if which lets all but os/user PASS on i386-pc-solaris2.10 and sparc-sun-solaris2.11. Rainer
Author: ian Date: Wed Jan 8 00:42:45 2014 New Revision: 206411 URL: http://gcc.gnu.org/viewcvs?rev=206411&root=gcc&view=rev Log: PR go/59433 net: Don't use stack space for fd_sets when using select. Modified: trunk/libgo/runtime/netpoll_select.c
Should be fixed now. Thanks for the analysis.
> --- Comment #3 from Ian Lance Taylor <ian at airs dot com> --- > Should be fixed now. I'm seeing a massive improvement, but now some 32-bit tests that used to work before are failing: Running target unix +FAIL: net FAIL: runtime -FAIL: os/user +FAIL: log/syslog +FAIL: net/http FAIL: sync/atomic The net failure is another instance of PR go/59431, which I still need to analyse, but the log/syslog and net/http failures are different. They both SEGV like this: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 3 (LWP 3)] 0xfe0f8a9f in runtime_netpoll (block=block@entry=1 '\001') at /vol/gcc/src/hg/trunk/local/libgo/runtime/netpoll_select.c:163 163 __builtin_memcpy(prfds, &fds, sizeof fds); (gdb) where #0 0xfe0f8a9f in runtime_netpoll (block=block@entry=1 '\001') at /vol/gcc/src/hg/trunk/local/libgo/runtime/netpoll_select.c:163 #1 0xfe0fd0ef in findrunnable () at /vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c:1653 #2 schedule () at /vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c:1751 #3 0xfe0fd38a in runtime_mstart (mp=0x18511800) at /vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c:1000 #4 0xfdd462fc in _thrp_setup () from /lib/libc.so.1 #5 0xfdd465a0 in ?? () from /lib/libc.so.1 #6 0x00000000 in ?? () (gdb) p prfds $1 = (fd_set *) 0x0 (gdb) p fds $2 = {fds_bits = {352, 0 <repeats 31 times>}} I suspect they are related to PR go/59431, too: this should only happen if runtime_SysAlloc returned NULL, which only happens for unhandled mmap return value, although I don't see that in truss. Need to investigate in more detail. Rainer
It seems this is a 32-bit issue: the failure is very fragile to reproduce: I easily get it if running manually or under gdb, but it vanishes if run under truss. Adding assertions in runtime_netpoll to check how prfds turns NULL, I find that runtime_SysAlloc indeed returns NULL, but similar assertions there don't show that. Investigating the SEGV with pmap, I find this: 14198: /var/gcc/regression/trunk/11-gcc/build/i386-pc-solaris2.11/libgo/log-s 08050000 48K r-x-- /var/gcc/regression/trunk/11-gcc/build/i386-pc-solaris2.11/libgo/log-syslog-check/test/a.out 0806B000 12K rwx-- /var/gcc/regression/trunk/11-gcc/build/i386-pc-solaris2.11/libgo/log-syslog-check/test/a.out 0806E000 8K rwx-- [ heap ] 08080000 4K rw--- [ anon ] 08090000 4K rw--- [ anon ] and many many more anon mappings, too many for the 32-bit address space, it seems. Perhaps a missing munmap somewhere? Rainer