Re: [Htmlparser-user] Link Location resolving
Brought to you by:
derrickoswald
From: Derrick O. <der...@ro...> - 2007-11-23 17:33:23
|
You should be able to use the Page.setBaseUrl (string base) method to set the URL used as a prefix for relative links, i.e. parser.getLexer ().getPage ().setBaseUrl ("http://yadda.yadda"); ----- Original Message ---- From: Jurgen Voorneveld <j.e...@st...> To: htm...@li... Sent: Friday, November 23, 2007 11:13:33 AM Subject: [Htmlparser-user] Link Location resolving List, I've recently started using htmlparser as part of a webspidering tool that I have written and I've run into a small problem. My spider downloads files from webservers using HttpClient from the Apache Commons project. These files are then stored locally in a temporary location. If a file contains HTML it is then parsed by htmlparser. During parsing the parser resolves relative links to other files by adding the location of the file to the relative link. Which of course completely screws up the links. Is there any way to turn this feature off or some way of telling the parser that the location of the data is not where it gets the data from. thanks Jurgen Voorneveld ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |