When you download a website that uses url-variables,
for instance based on php, cfm, jsp, etc., Weblech
doesn't replace the original link inside the downloaded
files (e.g. http://slashdot.org/article.jsp?
articleID=3&method=teaser) with the newly created
filenames (e.g. article.jsp%3FarticleID=3%
26method=teaser)
Also, Weblech should add the extension .html to these
downloaded files to make them understandable to a
browser.
Logged In: YES
user_id=353870
I'm not sure I quite understand - sorry!
Do you mean that when downloading a URL with special
characters in it, weblech fails to replace ? with %3F in the
saved file, or that it re-downloads the file even when
there's one on disk, or something else?
I get your point about making the eventual filename end with
.html to a browser can view it - I hadn't thought of that!
Logged In: YES
user_id=1311727
Simple solution for this problem is to add php, cfm, jsp,
etc to the htmlExtensions properties in the
Spider.properties file. Furthermore, adding non-parsed
filetypes to the imageExtensions works great for sucking up
doc, xls, ppt, css, etc. file types.
I realize this does not result in the renaming of files, but
it does work well.