Hi,
I want to extract links from a specific url and
save his html to a file.
I can't find a way to do that with downloading only one time the html.
Can someone please tell me how to do that?
I want that the program will download only one time
and than save the html and extract the links
(with the prefix of the url from i downloaded it).
Thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There is an example of what you want to do in org.htmlparser.parserapplications.SiteCapturer
That example uses custom tags, but the principal of printing out a list of nodes with toHtml() is the same in any case:
// get a node list somehow, either iterating or with a filter
NodeList list = parser.parse (null);
try
{
out = new PrintWriter (new FileOutputStream (file));
for (int i = 0; i < list.size (); i++)
out.print (list.elementAt (i).toHtml ());
out.close ();
}
catch (FileNotFoundException fnfe)
{
fnfe.printStackTrace ();
}
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I want to extract links from a specific url and
save his html to a file.
I can't find a way to do that with downloading only one time the html.
Can someone please tell me how to do that?
I want that the program will download only one time
and than save the html and extract the links
(with the prefix of the url from i downloaded it).
Thanks
There is an example of what you want to do in org.htmlparser.parserapplications.SiteCapturer
That example uses custom tags, but the principal of printing out a list of nodes with toHtml() is the same in any case:
// get a node list somehow, either iterating or with a filter
NodeList list = parser.parse (null);
try
{
out = new PrintWriter (new FileOutputStream (file));
for (int i = 0; i < list.size (); i++)
out.print (list.elementAt (i).toHtml ());
out.close ();
}
catch (FileNotFoundException fnfe)
{
fnfe.printStackTrace ();
}
Can I parse a file but give to the parser or to a linkTag
the URL prefix I want?
Thanks
I believe the SiteCapturer example program does what you want, except the prefix is set for local storage. You should be able to modify this.