Bug 29604 - Race condition in ServerSocket.accept()
Summary: Race condition in ServerSocket.accept()
Status: RESOLVED WONTFIX
Alias: None
Product: gcc
Classification: Unclassified
Component: libgcj (show other bugs)
Version: 4.2.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-10-26 16:45 UTC by David Daney
Modified: 2016-09-30 22:53 UTC (History)
2 users (show)

See Also:
Host: *-*-*
Target: *-*-*
Build: *-*-*
Known to work:
Known to fail:
Last reconfirmed: 2006-10-26 17:21:03


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Daney 2006-10-26 16:45:38 UTC
There is a race condition in ServerSocket.accept() as noted in comment #11 of PR15430:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15430#c11

As well as here:
http://developer.classpath.org/pipermail/classpath/2006-October/001583.html

The same problem probably exists in classpath, but there are seperate low level implementations in classpath and libgcj so it would have to be fixed in both places.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15430#c11
Comment 1 David Daney 2006-10-26 17:21:03 UTC
As noted in Comment #11 of PR15430, I think that all accesses to native_fd need to be synchronized.  That is my plan of attack for this.
Comment 2 David Daney 2006-10-26 17:56:10 UTC
This is trickier than I initially thought.

You cannot remained blocked in accept() while holding a lock that protects closing the native_fd.  Otherwise you will be blocked until a connection arrives (perhaps forever).  If we can shutdown the socket without holding the lock, but not close it, there may be a path that will not deadlock.  Not closing the native_fd is key as doing that allows the OS to open a new resource with the same fd.  Which is the problem to be solved. 
Comment 3 Tom Tromey 2006-10-26 17:58:34 UTC
Instead of directly calling accept we could select or poll on the fd.
This would let us have a timeout or an interrupt or something.
For all I know poll would react more gracefully to another thread
closing the fd -- but I haven't tried that.
Comment 4 ddaney 2006-10-26 18:03:29 UTC
Subject: Re:  Race condition in ServerSocket.accept()

tromey at gcc dot gnu dot org wrote:
> ------- Comment #3 from tromey at gcc dot gnu dot org  2006-10-26 17:58 -------
> Instead of directly calling accept we could select or poll on the fd.
> This would let us have a timeout or an interrupt or something.
> For all I know poll would react more gracefully to another thread
> closing the fd -- but I haven't tried that.
> 

The essence of the problem is that you can not temporarily drop a lock 
(as Object.wait() does) in a race free manner while entering the 
accept() system call.  Sticking a poll before the accept may in some 
cases narrow the window of the race, but it does not close it.

I was going to fix the problem in 15 Minutes, but now I think I need a 
day to mull it over.

This is the type of problem that keeps us sharp...

David Daney
Comment 5 David Daney 2007-05-19 18:12:05 UTC
UNIX/Linux sucks.

There are likely races with read/write and close all over libgcj:
=================================================================
Thread 1                      Thread 2             Thread 3

load fd from memory (a field).
                              close fd
                                                   Open new fd with same #
read from fd
=================================================================

The main problem being that file descriptors are reused as soon as possible.  This makes multi-thread safe file operations very difficult.

One solution:

At libgcj initialization time do:
=====================================
int pe[2];
pipe(pe);
close(pe[0]);
int global_always_error_fd = pe[1];
=====================================

All Classes that use descriptors have:
======================
int fileDesInUse;
int fileDes;
int cleanupFD = -1;
======================

A read operation is like this:
===============================
synchronized {
   fileDesInUse++;
}
// use the descriptor
read(fileDes, ....);
synchronized {
   fileDesInUse--;
   if (cleanupFD != -1 && fileDesInUse == 0) {
      close(cleanupFD);
      cleanupFD = -1;
   }
}
========================

And close is like this:
========================
cleanupFD = dup(fileDes)
dup2(global_always_error_fd, fileDes);
synchronized {
   if (fileDesInUse == 0) {
      close(cleanupFD);
      cleanupFD = -1;
   }
}
========================

dup2 is an atomic operation that assures that the file descriptor can not be reused by another thread while we might be using it.

The real close is done on a descriptor that is not in use by any other object.

The extra descriptors created with all the duping are closed when it is certian that nothing is using them.

Yuck.
Comment 6 David Daney 2007-05-19 18:20:33 UTC
That didn't come out quite right :(.

The close part would be:

========================
int t = dup(fileDes)
dup2(global_always_error_fd, fileDes);
synchronized {
   cleanupFD = fileDes;
}
close (t);
synchronized {
   if (fileDesInUse == 0) {
      close(cleanupFD);
      cleanupFD = -1;
   }
}
========================

Yeah, that should work.
Comment 7 David Daney 2008-02-27 18:11:00 UTC
Unassigning as I am not working on this.
Comment 8 Andrew Pinski 2016-09-30 22:53:00 UTC
Closing as won't fix as libgcj (and the java front-end) has been removed from the trunk.