There is a race condition in ServerSocket.accept() as noted in comment #11 of PR15430: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15430#c11 As well as here: http://developer.classpath.org/pipermail/classpath/2006-October/001583.html The same problem probably exists in classpath, but there are seperate low level implementations in classpath and libgcj so it would have to be fixed in both places. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15430#c11
As noted in Comment #11 of PR15430, I think that all accesses to native_fd need to be synchronized. That is my plan of attack for this.
This is trickier than I initially thought. You cannot remained blocked in accept() while holding a lock that protects closing the native_fd. Otherwise you will be blocked until a connection arrives (perhaps forever). If we can shutdown the socket without holding the lock, but not close it, there may be a path that will not deadlock. Not closing the native_fd is key as doing that allows the OS to open a new resource with the same fd. Which is the problem to be solved.
Instead of directly calling accept we could select or poll on the fd. This would let us have a timeout or an interrupt or something. For all I know poll would react more gracefully to another thread closing the fd -- but I haven't tried that.
Subject: Re: Race condition in ServerSocket.accept() tromey at gcc dot gnu dot org wrote: > ------- Comment #3 from tromey at gcc dot gnu dot org 2006-10-26 17:58 ------- > Instead of directly calling accept we could select or poll on the fd. > This would let us have a timeout or an interrupt or something. > For all I know poll would react more gracefully to another thread > closing the fd -- but I haven't tried that. > The essence of the problem is that you can not temporarily drop a lock (as Object.wait() does) in a race free manner while entering the accept() system call. Sticking a poll before the accept may in some cases narrow the window of the race, but it does not close it. I was going to fix the problem in 15 Minutes, but now I think I need a day to mull it over. This is the type of problem that keeps us sharp... David Daney
UNIX/Linux sucks. There are likely races with read/write and close all over libgcj: ================================================================= Thread 1 Thread 2 Thread 3 load fd from memory (a field). close fd Open new fd with same # read from fd ================================================================= The main problem being that file descriptors are reused as soon as possible. This makes multi-thread safe file operations very difficult. One solution: At libgcj initialization time do: ===================================== int pe[2]; pipe(pe); close(pe[0]); int global_always_error_fd = pe[1]; ===================================== All Classes that use descriptors have: ====================== int fileDesInUse; int fileDes; int cleanupFD = -1; ====================== A read operation is like this: =============================== synchronized { fileDesInUse++; } // use the descriptor read(fileDes, ....); synchronized { fileDesInUse--; if (cleanupFD != -1 && fileDesInUse == 0) { close(cleanupFD); cleanupFD = -1; } } ======================== And close is like this: ======================== cleanupFD = dup(fileDes) dup2(global_always_error_fd, fileDes); synchronized { if (fileDesInUse == 0) { close(cleanupFD); cleanupFD = -1; } } ======================== dup2 is an atomic operation that assures that the file descriptor can not be reused by another thread while we might be using it. The real close is done on a descriptor that is not in use by any other object. The extra descriptors created with all the duping are closed when it is certian that nothing is using them. Yuck.
That didn't come out quite right :(. The close part would be: ======================== int t = dup(fileDes) dup2(global_always_error_fd, fileDes); synchronized { cleanupFD = fileDes; } close (t); synchronized { if (fileDesInUse == 0) { close(cleanupFD); cleanupFD = -1; } } ======================== Yeah, that should work.
Unassigning as I am not working on this.
Closing as won't fix as libgcj (and the java front-end) has been removed from the trunk.