#4 No replacement of non html links

weblech-0.0.3
open
Spider (4)
5
2002-10-06
2002-10-06
Anonymous
No

When you download a website that uses url-variables,
for instance based on php, cfm, jsp, etc., Weblech
doesn't replace the original link inside the downloaded
files (e.g. http://slashdot.org/article.jsp?
articleID=3&method=teaser) with the newly created
filenames (e.g. article.jsp%3FarticleID=3%
26method=teaser)

Also, Weblech should add the extension .html to these
downloaded files to make them understandable to a
browser.

Discussion

  • Brian Pitcher

    Brian Pitcher - 2002-10-15

    Logged In: YES
    user_id=353870

    I'm not sure I quite understand - sorry!

    Do you mean that when downloading a URL with special
    characters in it, weblech fails to replace ? with %3F in the
    saved file, or that it re-downloads the file even when
    there's one on disk, or something else?

    I get your point about making the eventual filename end with
    .html to a browser can view it - I hadn't thought of that!

     
  • scruf

    scruf - 2005-07-13

    Logged In: YES
    user_id=1311727

    Simple solution for this problem is to add php, cfm, jsp,
    etc to the htmlExtensions properties in the
    Spider.properties file. Furthermore, adding non-parsed
    filetypes to the imageExtensions works great for sucking up
    doc, xls, ppt, css, etc. file types.

    I realize this does not result in the renaming of files, but
    it does work well.

     

Log in to post a comment.