Absolute URL parsing bug

Per Bothner per@bothner.com
Tue Jul 5 21:54:00 GMT 2005


Andrew Haley wrote:
> The case that we get wrong is this one:
> 
> $ java TestURLs 'jar:file:/ejbjars/ws.jar!/META-INF/wsdl/ssbEndpoint.wsdl' 'jar:file:ejbjars/ws.jar!/META-INF/wsdl/ssbEndpoint.wsdl'
> jar:file:ejbjars/ws.jar!/META-INF/wsdl/ssbEndpoint.wsdl

None of these are valid URLs as documented in the javadoc.  It matches 
the URI specification if you view everything after the scheme (i.e. 
"file:/ejbjars/ws.jar!/META-INF/wsdl/ssbEndpoint.wsdl") as an 
'opaque_part'.  There is no concept of a "nested URL" and "jar:file:" is 
not a valid scheme.

>  $ gij TestURLs 'jar:file:/ejbjars/ws.jar!/META-INF/wsdl/ssbEndpoint.wsdl' 'jar:file:ejbjars/ws.jar!/META-INF/wsdl/ssbEndpoint.wsdl'
> jar:file:/ejbjars/ws.jar!/META-INF/wsdl/file:ejbjars/ws.jar!/META-INF/wsdl/ssbEndpoint.wsdl
> 
> 
> So, in that case the context should be ignored.
> In fact, it seems like the context is ignored for all "jar:file" and "file:" URLs:

We should ignore the context for *all* absolute URLs - i.e. any URL that 
has a scheme - accordingto the URI spec.  But the Javadoc doc URL(URL, 
String) says "If the scheme component is defined in the given spec and 
does not match the scheme of the context, then the new URL is created as 
an absolute URL based on the spec alone."    I assume the "and does not 
match" clause is documenting a hack (bug) in the implementation - which 
doesn't match the URI spec.  (Hacks like this may be one reason why they 
decided to start from scratch with a new URI class.)

I.e. java TestURLs http:/a/b/c http:d/e
prints:
http:/a/b/d/e
but according to the RFC spec it should be:
http:d/e
The URI.resolve method gets this right.

A suggested (untested) fix might be something like:

    int colon = spec.indexOf(':');
    int slash = spec.indexOf('/');
    if (colon > 0
        && (colon < slash || slash < 0)
        && (protocol == null
            || protocol.length() <= colon
            || ! protocol.equalsIgnoreCase(spec.substring(0, colon))))
      context = null;

> But in the case of a URL that is not qualified with any protocol at
> all we need the context:

Er, no.  See the 'TestURLs  http:/a/b/c http:d/e' example.

>  $ java TestURLs 'jar:file:/ejbjars/ws.jar!/META-INF/wsdl/ssbEndpoint.wsdl' foo
> jar:file:/ejbjars/ws.jar!/META-INF/wsdl/foo
> 
> Classpath gets that right too.  So, the only thing we seem to get
> wrong is the parsing of a 'jar:file:' URL when a 'jar:' context is
> supplied.

Do you have any reason to believe that "jar:" or "file:" URLs get any 
special treatment?  I see none.
-- 
	--Per Bothner
per@bothner.com   http://per.bothner.com/



More information about the Java-patches mailing list