This is the mail archive of the java-patches@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Patch: fix non-blocking socket connect


>>>>> "Anthony" == Anthony Green <green@redhat.com> writes:

Anthony> For non-blocking socket connects, we throw a timeout
Anthony> exception before we have a chance to record the target host
Anthony> address and port in the implementation socket.  Currently,
Anthony> when select finally notifies us that the connection was
Anthony> successful, our implementation socket has no idea who it
Anthony> connected with.  We need to record this information while we
Anthony> still have it, before the failed connect.  SocketImpl is
Anthony> still smart enough to return null and 0 for these values when
Anthony> the socket is not connected (after our timed-out connect()
Anthony> and prior to the actual connection).

To recap our irc conversation here:

First, the nonblocking path in natPlainSocketImplPosix.cc:connect() is
just one way of implementing connect()-with-timeout.  And, as it turns
out, it is a fairly buggy way.  One bug is readily apparent: the code
as written can leave the fd non-blocking even on error.  Oops.

What I think is happening here is that we set non-blocking on the fd,
then we try to connect (which returns immediately), and then we enter
select() in order to wait for the timeout period.

In Azureus what happens is that we don't connect within the timeout
period, so we throw an exception.  However, the OS continues to try to
connect, since we haven't told it to stop.  Then later some other code
sees that we've connected and gets confused since the socket's address
is not set... hence this patch, which sets the address.  But I think
this patch is most likely incorrect.

Now, as far as I know, there actually isn't a way to tell the OS to
stop a nonblocking connect that is in progress.  One idea is to put
the fd back into blocking mode, and hope the kernel realizes that this
means to abort the connect.  I'm doubtful that this will actually
work, but I suppose it is worth a try... though then we have to face
the question of how portable this is.  The existence of ECONNABORTED
suggests that there may be a way to request this, but so far google
only shows this as being related to accept().  The documentation
situation is pretty bad here :-(

A second approach would be to set the SO_SNDTIMEO socket option first,
and then do a blocking connect (and then reset the timeout to 0).
(One nice side effect of this is that it would eliminate all the
nonblocking code and unify the two cases that are currently there.)
It isn't clear to me whether this option will affect a connect.  I
hope it will, you'd have to test this.

A third approach would be to use a blocking connect and have some
other thread interrupt us at the timeout.  However, the connect man
page on Linux only mentions EINTR as an svr4 thing, so I suspect this
may not actually work.

One final option is that I'm misinterpreting the Java API and that we
really are supposed to set nonblocking mode and magically connect
after Socket.connect has returned.  This seems crazy, but I suppose it
isn't completely impossible.  A test is needed.


One page I found notes the EINTR problem and seems to suggest that on
Solaris at least the reset-to-blocking approach would work:

http://www.cl.cam.ac.uk/Research/DTG/attarchive/omniORB/doc/2.8/omnithread/node5.html

This API seems pretty unfortunate overall.

FWIW my copy of Stevens is pretty old and I didn't see anything
relevant.  If there is an updated edition perhaps someone could check
it.

Tom


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]