This is the mail archive of the java@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: HTT URLConnection.getInputStream seems broken in cvs


Phil Shaw wrote:
On 4 Mar 2005, at 19:59, Fabio Roger wrote:

yes. the problem happens when the gnu.java.net.protocol.http.Headers
tries to parse the response.

in fact the bug only happens when trying to download one specific
file. I put this file here http://fabio.k2infinity.com/test.hdr and
this code triggers the bug:

I would guess it's a problem with the declared character set. This has a charset=GB2312 property on the content type:

HTTP/1.1 200 OK
Date: Fri, 04 Mar 2005 12:11:48 GMT
Server: Apache/2.0.52 (Fedora)
Last-Modified: Fri, 04 Mar 2005 11:27:49 GMT
ETag: "ccbed-b291-9463740"
Accept-Ranges: bytes
Content-Length: 45713
Connection: close
Content-Type: text/plain; charset=GB2312

I tried putting the file here http://dinavis.net/test.hdr but
downloading from that site the bug does not come up.

This does not have the same charset declaration:


HTTP/1.1 200 OK
Date: Fri, 04 Mar 2005 12:11:34 GMT
Server: Apache/2.0.50 (Debian GNU/Linux) mod_perl/1.99_14 Perl/v5.8.4 mod_ssl/2.0.50 OpenSSL/0.9.7d mod_auth_pgsql/2.0.1
Last-Modified: Fri, 04 Mar 2005 11:30:07 GMT
ETag: "3e8f3a-b291-117fedc0"
Accept-Ranges: bytes
Content-Length: 45713
Connection: close
Content-Type: text/plain

The charset should not make a difference, since at the point the error occurs we are still parsing the headers, which are always ASCII. The error is during LineInputStream.readLine, although without a more detailed stacktrace I couldn't say why.


I have failed to reproduce this bug using Sun's java with Classpath, and gij and kaffe from HEAD. All of these implementations successfully return an InputStream from which I can read 45713 bytes.

Please note that the dinavis.net URL returns an invalid Content-Type: the default charset for HTTP is ASCII, and the returned content contains non-ASCII characters.
--
Chris Burdess



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]