The intermittent failures on Darwin are due to a kernel bug tripped by
java.lang.Process.waitFor().
The bug appears to be that if:
- the program is multithreaded
- it is blocking SIGCHLD
- it receives a SIGCHLD due to a process terminating
- later it calls sigsuspend (but not sigwait)
then the SIGCHLD may never be delivered, and so the process will wait
for one forever.
It's intermittent because it works fine if the sigsuspend starts before
the SIGCHLD is sent. This also explains why it happens more often with
gij.
I've filed this as <rdar://problem/4736203>. We could work around it
by using a timeout of some kind; for example, creating a new thread
which sends a SIGCHLD manually after some period of time. (Obviously,
only on Darwin, and maybe only on versions with the bug.) Do we think
this is a good idea?