Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#41 ScaneRSS: Local (file) based XML uses incorrect path

open
nobody
None
5
2013-01-31
2007-05-15
No

When using the file: protocol to access an RSS feed (that is found locally on the machine running Azureus), the path to the XML file (RSS feed) is interpreted incorrectly. Specifically, the file: protocol requires a double-solidus to begin the location, and then a machine address, which is omitted, and then a final solidus to begin the file description.

For example:

file:///path/to/file.xml

on Unix or MacOSX or:

file:///c:/path/to/file.xml

Under MS Windows.

This should give an absolute path location to the file in question.

Currently, there are 2 errors occuring.

1) The 3 solidus (///) are being deleted and replaced with a single solidus (/). Normally, multiple solidus should be replaced with a single solidus, /except/ when following a protocol:

file:///is/okay

http://is.also.okay/

2) After file:///path/to/file.xml is replaced (incorrectly) with file:/path/to/file, internally, the first 2 characters are stripped ("/p"), and the given path is tacked onto the current working directory for Azureus. Thus, if Azureus is running from folder "/path/to/Azureus/" (Azureus.app / Azureus.exe is located in directory "/path/to/Azureus"), and "file:///path/to/file.xml" is specified for the RSS feed, then the path ScaneRSS uses is "/path/to/Azureus/ath/to/file.xml", and not "/path/to/file.xml", as one would expect.

AFAICT, there is no way in the file: protocol to specify a path starting from the Current Working Directory (CWD / PWD). file: is always given by an absolute path.

Also, I have never seen a machine address between the second and third solidus, so I would recommend always enforcing exactly 3 solidus following the file: protocol (file:///).

Discussion

  • Logged In: YES
    user_id=836867
    Originator: YES

    Workaround:

    Specify something other than a solidus for the second (middle) solidus:

    file:/x/path/to/file.xml

    "x" will be dropped and the leading solidus will force an absolute path (/path/to/file.xml)

     
  • Logged In: YES
    user_id=836867
    Originator: YES

    Initial analysis indicates this may in fact be a bug in Java. The file scheme is not specifically addressed in RFC 2396 , upon which Sun bases the Java classes for URL and URI, but it does indicate that the file scheme generally conforms to a URL due to the fact that it provides addressing information (i.e. a path); it is definitely a URI as well since all URLs are URIs. RFC 1738, which deals specifically with URLs, in section 3.10, specifically addresses files as follows:

    The file URL scheme is used to designate files accessible on a
    particular host computer. This scheme, unlike most other URL schemes,
    does not designate a resource that is universally accessible over the
    Internet.

    A file URL takes the form:

    file://<host>/<path>

    where <host> is the fully qualified domain name of the system on
    which the <path> is accessible, and <path> is a hierarchical
    directory path of the form <directory>/<directory>/.../<name>.

    For example, a VMS file

    DISK$USER:[MY.NOTES]NOTE123456.TXT

    might become

    <URL:file://vms.host.edu/disk$user/my/notes/note12345.txt>

    As a special case, <host> can be the string "localhost" or the empty
    string; this is interpreted as `the machine from which the URL is
    being interpreted'.

    The file URL scheme is unusual in that it does not specify an
    Internet protocol or access method for such files; as such, its
    utility in network protocols between hosts is limited.

    I will continue to investigate to see if Java is truly at fault for this problem.

     
  • Logged In: YES
    user_id=836867
    Originator: YES

    Although there seems to be a bug in Java's implementation of the URL class (non-conformance to URL specification in RFC 1738), it seems that the getFile() method would be more effective in RSSFeed::read when extracting the path to file rather than getting the path as a substring of the result of a call to URL::toExternalForm(). I have found one instance where this code change would be useful and fix the RSS loading problem for files. I will search for others and then attach a patch. This would also solve the problem of some specifying the file path canonically as file://localhost/path/to/file.xml since file:///path/to/file.xml, where the authority is an empty string, is equivalent to a host authority of the localhost.

     
  • Logged In: YES
    user_id=966071
    Originator: NO

    Well the File support was pretty hacked in.
    It just tests

    isFile = sourceURL.getProtocol().equalsIgnoreCase("file");

    And then strips the first 7 chars
    File f = new File(sourceURL.toExternalForm().substring(7));

    If it is really necessary I will make it .substring(8) thus ignoring the third solidus.

    >AFAICT, there is no way in the file: protocol to specify a path starting
    from the Current Working Directory (CWD / PWD). file: is always given by
    an absolute path.

    Wrong, just user file://path, if you use file:///path it is seen as realpath = /path