#12 Improper resolving of relative URLs


Using the base URL http://mydomain.com/bla/, the
relative URL ../pics/fasel.gif is resolved to http:
//mydomain.com/bla/pics/fasel.gif instead of to the
correct http://mydomain.com/pics/fasel.gif. This is due to
a condition in net.javacoding.jspider.core.util.html.
URLFinder.findURLs(URLFinderCallback, String, String)
saying that if there is no dot after the last slash in the
path part of the URL, an extra slash is added ("to avoid
buggy relative refs"). This, however, causes correct
relative refs (like the one above) to fail. Also, it doesn't
work if content negotiation is used (like if I ask for http:
//mydomain.com/blupp and let the server decide if I get
an html4 or an xhtml document depending on what my
client accepts).
The point is, that we can't talk about files when
discussing URLs (or URIs). We talk about resources and
the resource http://mydomain.com/boring.html *may*
have something to do with a file called boring.html but it
doesn't *have to*. It can be a document created on the
fly, or even a picture (gif/jpeg/png) of a boring html
document and the user agent decides *only through the
content type* what to do with the received data. In this
context it cannot be considered correct to determine if
an extra slash should be added only based on the fact
that there is a dot after the last slash.
Solution: delete the condition as suggested in the
attached patch.
Important: This fix causes one of the tests to break
(since the test checks for an improper behaviour). A
patch removing the test is attached, too)


Log in to post a comment.