List,
I've recently started using htmlparser as part of a webspidering tool
that I have written and I've run into a small problem.
My spider downloads files from webservers using HttpClient from the
Apache Commons project. These files are then stored locally in a
temporary location. If a file contains HTML it is then parsed by htmlparser.
During parsing the parser resolves relative links to other files by
adding the location of the file to the relative link. Which of course
completely screws up the links. Is there any way to turn this feature
off or some way of telling the parser that the location of the data is
not where it gets the data from.
thanks
Jurgen Voorneveld
|